CPU Governors and the cpupower tool

System tuning tools are either about better understanding the system’s performance, or applying such knowledge to improve it. See our common system tuning thoughts for the general reasons for that.

CPU governors

The kernel provides several CPU governors which can be configured, per core, to optimise for different needs.

Governor

Design philosophy

ondemand

This sets the CPU frequency depending on the current system load. This behavior is usually a good balance between the more extreme options.

conservative

Similar to ondemand, but adapting CPU speed more gracefully rather than jumping to max speed the moment there is any load on the CPU. This behaviour is more suitable in a battery-powered environment.

performance

This sets the CPU statically to the highest frequency. This behaviour is best to optimise for speed and latency, but might waste power if being under-used.

powersave

Sets the CPU statically to the lowest frequency, essentially locking it to P2. This behavior is suitable to save power without compromises.

userspace

Allows a user-space program to control the CPU frequency.

See the Linux CPUFreq Governors Documentation for a more extensive discussion and explanation of the available Linux CPU governors.

While these governors can be checked and changed directly in sysfs at /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, the command cpupower which comes with the package linux-tools-common makes this easier by providing a commandline interface and providing access to several related values.

Monitor CPU frequency

Before changing anything, look at the current frequencies via cpupower monitor. Many systems have various potential monitors, and by default one sees all of them which can be quite confusing. Therefore start with looking at the available power monitors.

Command (list all available cpupower monitors available on the system):

sudo cpupower monitor -l

Output (An example from a common consumer laptop):

Monitor "Nehalem" (4 states) - Might overflow after 922000000 s
  C3	[C] -> Processor Core C3
  C6	[C] -> Processor Core C6
  PC3	[P] -> Processor Package C3
  PC6	[P] -> Processor Package C6
Monitor "Mperf" (3 states) - Might overflow after 922000000 s
  C0	[T] -> Processor Core not idle
  Cx	[T] -> Processor Core in an idle state
  Freq	[T] -> Average Frequency (including boost) in MHz
Monitor "RAPL" (4 states) - Might overflow after 8640000 s
  pack	[M] -> 
  dram	[M] -> 
  core	[M] -> 
  unco	[M] -> 
Monitor "Idle_Stats" (9 states) - Might overflow after 4294967295 s
  POLL	[T] -> CPUIDLE CORE POLL IDLE
  C1	[T] -> MWAIT 0x00
  C1E	[T] -> MWAIT 0x01
  C3	[T] -> MWAIT 0x10
  C6	[T] -> MWAIT 0x20
  C7s	[T] -> MWAIT 0x33
  C8	[T] -> MWAIT 0x40
  C9	[T] -> MWAIT 0x50
  C10	[T] -> MWAIT 0x60

Here we can see that the machine has four available monitors shown in ".

  • Nehalem - Hardware specific C states.

  • Mperf - Average of frequencies and time in active (C0) or sleep (Cx) states.

  • RAPL - Running Average Power Limit covering different system elements.

  • Idle_Stats - Statistics of the cpuidle kernel subsystem (software based).

Those counters can represent different system units:

  • [T] -> Thread

  • [C] -> Core

  • [P] -> Processor Package (Socket)

  • [M] -> Machine/Platform wide counter

So if we want to know what frequency the CPU threads were in (Mperf) and what was consumed at the different system levels of package, dram, core and uncore (RAPL) averages over a minute (-i <seconds>) we would run:

Command:

sudo cpupower monitor -i 60 -m Mperf,RAPL

Output:

    | Mperf              || RAPL
 CPU| C0   | Cx   | Freq || pack    | dram    | core    | unco
   0| 61,83| 38,17|  1850||616950936|145911797|375373063|71556823
   1| 62,03| 37,97|  1848||616950936|145911797|375373063|71556823
   2| 65,51| 34,49|  1852||616950936|145911797|375373063|71556823
   3| 62,04| 37,96|  1852||616950936|145911797|375373063|71556823

Get details about the boundaries for the CPU frequency

There are more details influencing the CPU frequency, such as the driver used to control the hardware, the min and max frequencies, and potential boost states. These can be collected with cpupower frequency-info

Command:

cpupower frequency-info

Output:

analyzing CPU 3:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 3
  CPUs which need to have their frequency coordinated by software: 3
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 4.00 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 1.80 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes

By default this checks the CPU it is executed on. The argument -c can be set to either a number representing a core or all to get the info for all available CPUs.

Get details about the idle states

Idle states represent situations when a CPU enters a state of suspension to save power. The tool cpupower idle-info reports about the available idle states, their description and attributes. These can be useful when debugging CPU performance if one is curious about the details of a given state after running cpupower monitor above.

Command:

cpupower idle-info

Output:

CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 9
Available idle states: POLL C1 C1E C3 C6 C7s C8 C9 C10
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 26053898
Duration: 695768311
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 263751626
Duration: 21296361635
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 1071864698
Duration: 122465703132
C3:
Flags/Description: MWAIT 0x10
Latency: 70
Usage: 941753727
Duration: 117177626397
C6:
Flags/Description: MWAIT 0x20
Latency: 85
Usage: 2580936435
Duration: 1258804567087
C7s:
Flags/Description: MWAIT 0x33
Latency: 124
Usage: 2946723
Duration: 1783856599
C8:
Flags/Description: MWAIT 0x40
Latency: 200
Usage: 1580297534
Duration: 1234136981613
C9:
Flags/Description: MWAIT 0x50
Latency: 480
Usage: 2015405
Duration: 3198208930
C10:
Flags/Description: MWAIT 0x60
Latency: 890
Usage: 511786893
Duration: 1546264384800

After reading a bit (much more in the Further reading section) into C-states, P-states and Idle states we can also re-run cpupower monitor without filtering as now the further columns can be related to the above output.

Command:

sudo cpupower monitor

Output:

    | Nehalem                   || Mperf              || RAPL                           || Idle_Stats
 CPU| C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || pack | dram | core | unco      || POLL | C1   | C1E  | C3   | C6   | C7s  | C8   | C9   | C10
   0|  2,99| 11,92|  0,00|  0,00|| 70,98| 29,02|  1991||13733058|2706597|7438396|3080986||  0,05|  1,84|  5,01|  3,87| 14,05|  0,06|  3,81|  0,00|  0,04
   1|  3,58| 14,84|  0,00|  0,00|| 67,65| 32,35|  1991||13733058|2706597|7438396|3080986||  0,07|  1,87|  5,42|  4,46| 17,21|  0,36|  2,73|  0,00|  0,00
   2|  3,99|  7,15|  0,00|  0,00|| 73,25| 26,75|  1990||13733058|2706597|7438396|3080986||  0,09|  1,95|  8,76|  5,20|  9,44|  0,01|  1,12|  0,04|  0,00
   3|  3,86| 13,68|  0,00|  0,00|| 68,40| 31,60|  1990||13733058|2706597|7438396|3080986||  0,03|  2,52|  6,35|  4,92| 15,97|  0,00|  1,52|  0,00|  0,00

What should I do with all of this?

All this information is usually only data without any insight until you either:

  • compare them with historical data (it is generally recommended to gather performance and power metrics regularly to be able to compare them to the healthy state in case of any debugging scenario), or

  • compare them with your expectations and act on any mismatch

Does it match what you expect?

One might have expectations about the behaviour of a system. Examples are:

  • I’m not doing much – it should be idling most of the time

  • I have a very busy workload, I expect it to run at highest frequency

  • I do not expect my workload to allow the system to go into low power states

You can hold any of these assumptions against the output of cpupower monitor and verify that they are true. If they are not, use cpupower frequency-info to check if the current constraints match what you think. And use cpupower frequency-set (below) to set a different governor if needed.

Control the CPU governors and CPU frequency

An administrator can execute the cpupower command to set the CPU governor.

Command (set the CPU governor to Performance mode on all CPUs):

cpupower frequency-set -g performance

Since all commands of cpupower can be for a sub-set of CPUs, one can use -c here as well if that matches what is needed for more complex scenarios.

Command (Set conservative on the first 8 cores in a system):

cpupower -c 0-7 frequency-set -g conservative

Powertop

powertop supports the user in identifying reasons for unexpected high power consumption by listing reasons to wake up from low power states. The look and feel aligns with the well known top. powertop is not installed by default, before trying run sudo apt install powertop. This command needs elevated permissions, so run it with sudo.

sudo powertop

It has six tabs for the various areas of interest:

  • Overview - frequency and reason for activity

  • Idle stats - time spent in the different idle states

  • Frequency stats - current frequency per core

  • Device stats - activity of devices

  • Tunables - a list of system tunables related to power (ratings are to save power, you might have some Bad for that being considered better for performance)

  • WakeUp - device wake-up status

Further reading