| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741 |
- .. SPDX-License-Identifier: GPL-2.0
- ====================
- Utilization Clamping
- ====================
- 1. Introduction
- ===============
- Utilization clamping, also known as util clamp or uclamp, is a scheduler
- feature that allows user space to help in managing the performance requirement
- of tasks. It was introduced in v5.3 release. The CGroup support was merged in
- v5.4.
- Uclamp is a hinting mechanism that allows the scheduler to understand the
- performance requirements and restrictions of the tasks, thus it helps the
- scheduler to make a better decision. And when schedutil cpufreq governor is
- used, util clamp will influence the CPU frequency selection as well.
- Since the scheduler and schedutil are both driven by PELT (util_avg) signals,
- util clamp acts on that to achieve its goal by clamping the signal to a certain
- point; hence the name. That is, by clamping utilization we are making the
- system run at a certain performance point.
- The right way to view util clamp is as a mechanism to make request or hint on
- performance constraints. It consists of two tunables:
- * UCLAMP_MIN, which sets the lower bound.
- * UCLAMP_MAX, which sets the upper bound.
- These two bounds will ensure a task will operate within this performance range
- of the system. UCLAMP_MIN implies boosting a task, while UCLAMP_MAX implies
- capping a task.
- One can tell the system (scheduler) that some tasks require a minimum
- performance point to operate at to deliver the desired user experience. Or one
- can tell the system that some tasks should be restricted from consuming too
- much resources and should not go above a specific performance point. Viewing
- the uclamp values as performance points rather than utilization is a better
- abstraction from user space point of view.
- As an example, a game can use util clamp to form a feedback loop with its
- perceived Frames Per Second (FPS). It can dynamically increase the minimum
- performance point required by its display pipeline to ensure no frame is
- dropped. It can also dynamically 'prime' up these tasks if it knows in the
- coming few hundred milliseconds a computationally intensive scene is about to
- happen.
- On mobile hardware where the capability of the devices varies a lot, this
- dynamic feedback loop offers a great flexibility to ensure best user experience
- given the capabilities of any system.
- Of course a static configuration is possible too. The exact usage will depend
- on the system, application and the desired outcome.
- Another example is in Android where tasks are classified as background,
- foreground, top-app, etc. Util clamp can be used to constrain how much
- resources background tasks are consuming by capping the performance point they
- can run at. This constraint helps reserve resources for important tasks, like
- the ones belonging to the currently active app (top-app group). Beside this
- helps in limiting how much power they consume. This can be more obvious in
- heterogeneous systems (e.g. Arm big.LITTLE); the constraint will help bias the
- background tasks to stay on the little cores which will ensure that:
- 1. The big cores are free to run top-app tasks immediately. top-app
- tasks are the tasks the user is currently interacting with, hence
- the most important tasks in the system.
- 2. They don't run on a power hungry core and drain battery even if they
- are CPU intensive tasks.
- .. note::
- **little cores**:
- CPUs with capacity < 1024
- **big cores**:
- CPUs with capacity = 1024
- By making these uclamp performance requests, or rather hints, user space can
- ensure system resources are used optimally to deliver the best possible user
- experience.
- Another use case is to help with **overcoming the ramp up latency inherit in
- how scheduler utilization signal is calculated**.
- On the other hand, a busy task for instance that requires to run at maximum
- performance point will suffer a delay of ~200ms (PELT HALFIFE = 32ms) for the
- scheduler to realize that. This is known to affect workloads like gaming on
- mobile devices where frames will drop due to slow response time to select the
- higher frequency required for the tasks to finish their work in time. Setting
- UCLAMP_MIN=1024 will ensure such tasks will always see the highest performance
- level when they start running.
- The overall visible effect goes beyond better perceived user
- experience/performance and stretches to help achieve a better overall
- performance/watt if used effectively.
- User space can form a feedback loop with the thermal subsystem too to ensure
- the device doesn't heat up to the point where it will throttle.
- Both SCHED_NORMAL/OTHER and SCHED_FIFO/RR honour uclamp requests/hints.
- In the SCHED_FIFO/RR case, uclamp gives the option to run RT tasks at any
- performance point rather than being tied to MAX frequency all the time. Which
- can be useful on general purpose systems that run on battery powered devices.
- Note that by design RT tasks don't have per-task PELT signal and must always
- run at a constant frequency to combat undeterministic DVFS rampup delays.
- Note that using schedutil always implies a single delay to modify the frequency
- when an RT task wakes up. This cost is unchanged by using uclamp. Uclamp only
- helps picking what frequency to request instead of schedutil always requesting
- MAX for all RT tasks.
- See :ref:`section 3.4 <uclamp-default-values>` for default values and
- :ref:`3.4.1 <sched-util-clamp-min-rt-default>` on how to change RT tasks
- default value.
- 2. Design
- =========
- Util clamp is a property of every task in the system. It sets the boundaries of
- its utilization signal; acting as a bias mechanism that influences certain
- decisions within the scheduler.
- The actual utilization signal of a task is never clamped in reality. If you
- inspect PELT signals at any point of time you should continue to see them as
- they are intact. Clamping happens only when needed, e.g: when a task wakes up
- and the scheduler needs to select a suitable CPU for it to run on.
- Since the goal of util clamp is to allow requesting a minimum and maximum
- performance point for a task to run on, it must be able to influence the
- frequency selection as well as task placement to be most effective. Both of
- which have implications on the utilization value at CPU runqueue (rq for short)
- level, which brings us to the main design challenge.
- When a task wakes up on an rq, the utilization signal of the rq will be
- affected by the uclamp settings of all the tasks enqueued on it. For example if
- a task requests to run at UTIL_MIN = 512, then the util signal of the rq needs
- to respect to this request as well as all other requests from all of the
- enqueued tasks.
- To be able to aggregate the util clamp value of all the tasks attached to the
- rq, uclamp must do some housekeeping at every enqueue/dequeue, which is the
- scheduler hot path. Hence care must be taken since any slow down will have
- significant impact on a lot of use cases and could hinder its usability in
- practice.
- The way this is handled is by dividing the utilization range into buckets
- (struct uclamp_bucket) which allows us to reduce the search space from every
- task on the rq to only a subset of tasks on the top-most bucket.
- When a task is enqueued, the counter in the matching bucket is incremented,
- and on dequeue it is decremented. This makes keeping track of the effective
- uclamp value at rq level a lot easier.
- As tasks are enqueued and dequeued, we keep track of the current effective
- uclamp value of the rq. See :ref:`section 2.1 <uclamp-buckets>` for details on
- how this works.
- Later at any path that wants to identify the effective uclamp value of the rq,
- it will simply need to read this effective uclamp value of the rq at that exact
- moment of time it needs to take a decision.
- For task placement case, only Energy Aware and Capacity Aware Scheduling
- (EAS/CAS) make use of uclamp for now, which implies that it is applied on
- heterogeneous systems only.
- When a task wakes up, the scheduler will look at the current effective uclamp
- value of every rq and compare it with the potential new value if the task were
- to be enqueued there. Favoring the rq that will end up with the most energy
- efficient combination.
- Similarly in schedutil, when it needs to make a frequency update it will look
- at the current effective uclamp value of the rq which is influenced by the set
- of tasks currently enqueued there and select the appropriate frequency that
- will satisfy constraints from requests.
- Other paths like setting overutilization state (which effectively disables EAS)
- make use of uclamp as well. Such cases are considered necessary housekeeping to
- allow the 2 main use cases above and will not be covered in detail here as they
- could change with implementation details.
- .. _uclamp-buckets:
- 2.1. Buckets
- ------------
- ::
- [struct rq]
- (bottom) (top)
- 0 1024
- | |
- +-----------+-----------+-----------+---- ----+-----------+
- | Bucket 0 | Bucket 1 | Bucket 2 | ... | Bucket N |
- +-----------+-----------+-----------+---- ----+-----------+
- : : :
- +- p0 +- p3 +- p4
- : :
- +- p1 +- p5
- :
- +- p2
- .. note::
- The diagram above is an illustration rather than a true depiction of the
- internal data structure.
- To reduce the search space when trying to decide the effective uclamp value of
- an rq as tasks are enqueued/dequeued, the whole utilization range is divided
- into N buckets where N is configured at compile time by setting
- CONFIG_UCLAMP_BUCKETS_COUNT. By default it is set to 5.
- The rq has a bucket for each uclamp_id tunables: [UCLAMP_MIN, UCLAMP_MAX].
- The range of each bucket is 1024/N. For example, for the default value of
- 5 there will be 5 buckets, each of which will cover the following range:
- ::
- DELTA = round_closest(1024/5) = 204.8 = 205
- Bucket 0: [0:204]
- Bucket 1: [205:409]
- Bucket 2: [410:614]
- Bucket 3: [615:819]
- Bucket 4: [820:1024]
- When a task p with following tunable parameters
- ::
- p->uclamp[UCLAMP_MIN] = 300
- p->uclamp[UCLAMP_MAX] = 1024
- is enqueued into the rq, bucket 1 will be incremented for UCLAMP_MIN and bucket
- 4 will be incremented for UCLAMP_MAX to reflect the fact the rq has a task in
- this range.
- The rq then keeps track of its current effective uclamp value for each
- uclamp_id.
- When a task p is enqueued, the rq value changes to:
- ::
- // update bucket logic goes here
- rq->uclamp[UCLAMP_MIN] = max(rq->uclamp[UCLAMP_MIN], p->uclamp[UCLAMP_MIN])
- // repeat for UCLAMP_MAX
- Similarly, when p is dequeued the rq value changes to:
- ::
- // update bucket logic goes here
- rq->uclamp[UCLAMP_MIN] = search_top_bucket_for_highest_value()
- // repeat for UCLAMP_MAX
- When all buckets are empty, the rq uclamp values are reset to system defaults.
- See :ref:`section 3.4 <uclamp-default-values>` for details on default values.
- 2.2. Max aggregation
- --------------------
- Util clamp is tuned to honour the request for the task that requires the
- highest performance point.
- When multiple tasks are attached to the same rq, then util clamp must make sure
- the task that needs the highest performance point gets it even if there's
- another task that doesn't need it or is disallowed from reaching this point.
- For example, if there are multiple tasks attached to an rq with the following
- values:
- ::
- p0->uclamp[UCLAMP_MIN] = 300
- p0->uclamp[UCLAMP_MAX] = 900
- p1->uclamp[UCLAMP_MIN] = 500
- p1->uclamp[UCLAMP_MAX] = 500
- then assuming both p0 and p1 are enqueued to the same rq, both UCLAMP_MIN
- and UCLAMP_MAX become:
- ::
- rq->uclamp[UCLAMP_MIN] = max(300, 500) = 500
- rq->uclamp[UCLAMP_MAX] = max(900, 500) = 900
- As we shall see in :ref:`section 5.1 <uclamp-capping-fail>`, this max
- aggregation is the cause of one of limitations when using util clamp, in
- particular for UCLAMP_MAX hint when user space would like to save power.
- 2.3. Hierarchical aggregation
- -----------------------------
- As stated earlier, util clamp is a property of every task in the system. But
- the actual applied (effective) value can be influenced by more than just the
- request made by the task or another actor on its behalf (middleware library).
- The effective util clamp value of any task is restricted as follows:
- 1. By the uclamp settings defined by the cgroup CPU controller it is attached
- to, if any.
- 2. The restricted value in (1) is then further restricted by the system wide
- uclamp settings.
- :ref:`Section 3 <uclamp-interfaces>` discusses the interfaces and will expand
- further on that.
- For now suffice to say that if a task makes a request, its actual effective
- value will have to adhere to some restrictions imposed by cgroup and system
- wide settings.
- The system will still accept the request even if effectively will be beyond the
- constraints, but as soon as the task moves to a different cgroup or a sysadmin
- modifies the system settings, the request will be satisfied only if it is
- within new constraints.
- In other words, this aggregation will not cause an error when a task changes
- its uclamp values, but rather the system may not be able to satisfy requests
- based on those factors.
- 2.4. Range
- ----------
- Uclamp performance request has the range of 0 to 1024 inclusive.
- For cgroup interface percentage is used (that is 0 to 100 inclusive).
- Just like other cgroup interfaces, you can use 'max' instead of 100.
- .. _uclamp-interfaces:
- 3. Interfaces
- =============
- 3.1. Per task interface
- -----------------------
- sched_setattr() syscall was extended to accept two new fields:
- * sched_util_min: requests the minimum performance point the system should run
- at when this task is running. Or lower performance bound.
- * sched_util_max: requests the maximum performance point the system should run
- at when this task is running. Or upper performance bound.
- For example, the following scenario have 40% to 80% utilization constraints:
- ::
- attr->sched_util_min = 40% * 1024;
- attr->sched_util_max = 80% * 1024;
- When task @p is running, **the scheduler should try its best to ensure it
- starts at 40% performance level**. If the task runs for a long enough time so
- that its actual utilization goes above 80%, the utilization, or performance
- level, will be capped.
- The special value -1 is used to reset the uclamp settings to the system
- default.
- Note that resetting the uclamp value to system default using -1 is not the same
- as manually setting uclamp value to system default. This distinction is
- important because as we shall see in system interfaces, the default value for
- RT could be changed. SCHED_NORMAL/OTHER might gain similar knobs too in the
- future.
- 3.2. cgroup interface
- ---------------------
- There are two uclamp related values in the CPU cgroup controller:
- * cpu.uclamp.min
- * cpu.uclamp.max
- When a task is attached to a CPU controller, its uclamp values will be impacted
- as follows:
- * cpu.uclamp.min is a protection as described in :ref:`section 3-3 of cgroup
- v2 documentation <cgroupv2-protections-distributor>`.
- If a task uclamp_min value is lower than cpu.uclamp.min, then the task will
- inherit the cgroup cpu.uclamp.min value.
- In a cgroup hierarchy, effective cpu.uclamp.min is the max of (child,
- parent).
- * cpu.uclamp.max is a limit as described in :ref:`section 3-2 of cgroup v2
- documentation <cgroupv2-limits-distributor>`.
- If a task uclamp_max value is higher than cpu.uclamp.max, then the task will
- inherit the cgroup cpu.uclamp.max value.
- In a cgroup hierarchy, effective cpu.uclamp.max is the min of (child,
- parent).
- For example, given following parameters:
- ::
- p0->uclamp[UCLAMP_MIN] = // system default;
- p0->uclamp[UCLAMP_MAX] = // system default;
- p1->uclamp[UCLAMP_MIN] = 40% * 1024;
- p1->uclamp[UCLAMP_MAX] = 50% * 1024;
- cgroup0->cpu.uclamp.min = 20% * 1024;
- cgroup0->cpu.uclamp.max = 60% * 1024;
- cgroup1->cpu.uclamp.min = 60% * 1024;
- cgroup1->cpu.uclamp.max = 100% * 1024;
- when p0 and p1 are attached to cgroup0, the values become:
- ::
- p0->uclamp[UCLAMP_MIN] = cgroup0->cpu.uclamp.min = 20% * 1024;
- p0->uclamp[UCLAMP_MAX] = cgroup0->cpu.uclamp.max = 60% * 1024;
- p1->uclamp[UCLAMP_MIN] = 40% * 1024; // intact
- p1->uclamp[UCLAMP_MAX] = 50% * 1024; // intact
- when p0 and p1 are attached to cgroup1, these instead become:
- ::
- p0->uclamp[UCLAMP_MIN] = cgroup1->cpu.uclamp.min = 60% * 1024;
- p0->uclamp[UCLAMP_MAX] = cgroup1->cpu.uclamp.max = 100% * 1024;
- p1->uclamp[UCLAMP_MIN] = cgroup1->cpu.uclamp.min = 60% * 1024;
- p1->uclamp[UCLAMP_MAX] = 50% * 1024; // intact
- Note that cgroup interfaces allows cpu.uclamp.max value to be lower than
- cpu.uclamp.min. Other interfaces don't allow that.
- 3.3. System interface
- ---------------------
- 3.3.1 sched_util_clamp_min
- --------------------------
- System wide limit of allowed UCLAMP_MIN range. By default it is set to 1024,
- which means that permitted effective UCLAMP_MIN range for tasks is [0:1024].
- By changing it to 512 for example the range reduces to [0:512]. This is useful
- to restrict how much boosting tasks are allowed to acquire.
- Requests from tasks to go above this knob value will still succeed, but
- they won't be satisfied until it is more than p->uclamp[UCLAMP_MIN].
- The value must be smaller than or equal to sched_util_clamp_max.
- 3.3.2 sched_util_clamp_max
- --------------------------
- System wide limit of allowed UCLAMP_MAX range. By default it is set to 1024,
- which means that permitted effective UCLAMP_MAX range for tasks is [0:1024].
- By changing it to 512 for example the effective allowed range reduces to
- [0:512]. This means is that no task can run above 512, which implies that all
- rqs are restricted too. IOW, the whole system is capped to half its performance
- capacity.
- This is useful to restrict the overall maximum performance point of the system.
- For example, it can be handy to limit performance when running low on battery
- or when the system wants to limit access to more energy hungry performance
- levels when it's in idle state or screen is off.
- Requests from tasks to go above this knob value will still succeed, but they
- won't be satisfied until it is more than p->uclamp[UCLAMP_MAX].
- The value must be greater than or equal to sched_util_clamp_min.
- .. _uclamp-default-values:
- 3.4. Default values
- -------------------
- By default all SCHED_NORMAL/SCHED_OTHER tasks are initialized to:
- ::
- p_fair->uclamp[UCLAMP_MIN] = 0
- p_fair->uclamp[UCLAMP_MAX] = 1024
- That is, by default they're boosted to run at the maximum performance point of
- changed at boot or runtime. No argument was made yet as to why we should
- provide this, but can be added in the future.
- For SCHED_FIFO/SCHED_RR tasks:
- ::
- p_rt->uclamp[UCLAMP_MIN] = 1024
- p_rt->uclamp[UCLAMP_MAX] = 1024
- That is by default they're boosted to run at the maximum performance point of
- the system which retains the historical behavior of the RT tasks.
- RT tasks default uclamp_min value can be modified at boot or runtime via
- sysctl. See below section.
- .. _sched-util-clamp-min-rt-default:
- 3.4.1 sched_util_clamp_min_rt_default
- -------------------------------------
- Running RT tasks at maximum performance point is expensive on battery powered
- devices and not necessary. To allow system developer to offer good performance
- guarantees for these tasks without pushing it all the way to maximum
- performance point, this sysctl knob allows tuning the best boost value to
- address the system requirement without burning power running at maximum
- performance point all the time.
- Application developer are encouraged to use the per task util clamp interface
- to ensure they are performance and power aware. Ideally this knob should be set
- to 0 by system designers and leave the task of managing performance
- requirements to the apps.
- 4. How to use util clamp
- ========================
- Util clamp promotes the concept of user space assisted power and performance
- management. At the scheduler level there is no info required to make the best
- decision. However, with util clamp user space can hint to the scheduler to make
- better decision about task placement and frequency selection.
- Best results are achieved by not making any assumptions about the system the
- application is running on and to use it in conjunction with a feedback loop to
- dynamically monitor and adjust. Ultimately this will allow for a better user
- experience at a better perf/watt.
- For some systems and use cases, static setup will help to achieve good results.
- Portability will be a problem in this case. How much work one can do at 100,
- 200 or 1024 is different for each system. Unless there's a specific target
- system, static setup should be avoided.
- There are enough possibilities to create a whole framework based on util clamp
- or self contained app that makes use of it directly.
- 4.1. Boost important and DVFS-latency-sensitive tasks
- -----------------------------------------------------
- A GUI task might not be busy to warrant driving the frequency high when it
- wakes up. However, it requires to finish its work within a specific time window
- to deliver the desired user experience. The right frequency it requires at
- wakeup will be system dependent. On some underpowered systems it will be high,
- on other overpowered ones it will be low or 0.
- This task can increase its UCLAMP_MIN value every time it misses the deadline
- to ensure on next wake up it runs at a higher performance point. It should try
- to approach the lowest UCLAMP_MIN value that allows to meet its deadline on any
- particular system to achieve the best possible perf/watt for that system.
- On heterogeneous systems, it might be important for this task to run on
- a faster CPU.
- **Generally it is advised to perceive the input as performance level or point
- which will imply both task placement and frequency selection**.
- 4.2. Cap background tasks
- -------------------------
- Like explained for Android case in the introduction. Any app can lower
- UCLAMP_MAX for some background tasks that don't care about performance but
- could end up being busy and consume unnecessary system resources on the system.
- 4.3. Powersave mode
- -------------------
- sched_util_clamp_max system wide interface can be used to limit all tasks from
- operating at the higher performance points which are usually energy
- inefficient.
- This is not unique to uclamp as one can achieve the same by reducing max
- frequency of the cpufreq governor. It can be considered a more convenient
- alternative interface.
- 4.4. Per-app performance restriction
- ------------------------------------
- Middleware/Utility can provide the user an option to set UCLAMP_MIN/MAX for an
- app every time it is executed to guarantee a minimum performance point and/or
- limit it from draining system power at the cost of reduced performance for
- these apps.
- If you want to prevent your laptop from heating up while on the go from
- compiling the kernel and happy to sacrifice performance to save power, but
- still would like to keep your browser performance intact, uclamp makes it
- possible.
- 5. Limitations
- ==============
- .. _uclamp-capping-fail:
- 5.1. Capping frequency with uclamp_max fails under certain conditions
- ---------------------------------------------------------------------
- If task p0 is capped to run at 512:
- ::
- p0->uclamp[UCLAMP_MAX] = 512
- and it shares the rq with p1 which is free to run at any performance point:
- ::
- p1->uclamp[UCLAMP_MAX] = 1024
- then due to max aggregation the rq will be allowed to reach max performance
- point:
- ::
- rq->uclamp[UCLAMP_MAX] = max(512, 1024) = 1024
- Assuming both p0 and p1 have UCLAMP_MIN = 0, then the frequency selection for
- the rq will depend on the actual utilization value of the tasks.
- If p1 is a small task but p0 is a CPU intensive task, then due to the fact that
- both are running at the same rq, p1 will cause the frequency capping to be left
- from the rq although p1, which is allowed to run at any performance point,
- doesn't actually need to run at that frequency.
- 5.2. UCLAMP_MAX can break PELT (util_avg) signal
- ------------------------------------------------
- PELT assumes that frequency will always increase as the signals grow to ensure
- there's always some idle time on the CPU. But with UCLAMP_MAX, this frequency
- increase will be prevented which can lead to no idle time in some
- circumstances. When there's no idle time, a task will stuck in a busy loop,
- which would result in util_avg being 1024.
- Combing with issue described below, this can lead to unwanted frequency spikes
- when severely capped tasks share the rq with a small non capped task.
- As an example if task p, which have:
- ::
- p0->util_avg = 300
- p0->uclamp[UCLAMP_MAX] = 0
- wakes up on an idle CPU, then it will run at min frequency (Fmin) this
- CPU is capable of. The max CPU frequency (Fmax) matters here as well,
- since it designates the shortest computational time to finish the task's
- work on this CPU.
- ::
- rq->uclamp[UCLAMP_MAX] = 0
- If the ratio of Fmax/Fmin is 3, then maximum value will be:
- ::
- 300 * (Fmax/Fmin) = 900
- which indicates the CPU will still see idle time since 900 is < 1024. The
- _actual_ util_avg will not be 900 though, but somewhere between 300 and 900. As
- long as there's idle time, p->util_avg updates will be off by a some margin,
- but not proportional to Fmax/Fmin.
- ::
- p0->util_avg = 300 + small_error
- Now if the ratio of Fmax/Fmin is 4, the maximum value becomes:
- ::
- 300 * (Fmax/Fmin) = 1200
- which is higher than 1024 and indicates that the CPU has no idle time. When
- this happens, then the _actual_ util_avg will become:
- ::
- p0->util_avg = 1024
- If task p1 wakes up on this CPU, which have:
- ::
- p1->util_avg = 200
- p1->uclamp[UCLAMP_MAX] = 1024
- then the effective UCLAMP_MAX for the CPU will be 1024 according to max
- aggregation rule. But since the capped p0 task was running and throttled
- severely, then the rq->util_avg will be:
- ::
- p0->util_avg = 1024
- p1->util_avg = 200
- rq->util_avg = 1024
- rq->uclamp[UCLAMP_MAX] = 1024
- Hence lead to a frequency spike since if p0 wasn't throttled we should get:
- ::
- p0->util_avg = 300
- p1->util_avg = 200
- rq->util_avg = 500
- and run somewhere near mid performance point of that CPU, not the Fmax we get.
- 5.3. Schedutil response time issues
- -----------------------------------
- schedutil has three limitations:
- 1. Hardware takes non-zero time to respond to any frequency change
- request. On some platforms can be in the order of few ms.
- 2. Non fast-switch systems require a worker deadline thread to wake up
- and perform the frequency change, which adds measurable overhead.
- 3. schedutil rate_limit_us drops any requests during this rate_limit_us
- window.
- If a relatively small task is doing critical job and requires a certain
- performance point when it wakes up and starts running, then all these
- limitations will prevent it from getting what it wants in the time scale it
- expects.
- This limitation is not only impactful when using uclamp, but will be more
- prevalent as we no longer gradually ramp up or down. We could easily be
- jumping between frequencies depending on the order tasks wake up, and their
- respective uclamp values.
- We regard that as a limitation of the capabilities of the underlying system
- itself.
- There is room to improve the behavior of schedutil rate_limit_us, but not much
- to be done for 1 or 2. They are considered hard limitations of the system.
|