intel-speed-select.rst 31 KB


  1. .. SPDX-License-Identifier: GPL-2.0
  2. ============================================================
  3. Intel(R) Speed Select Technology User Guide
  4. ============================================================
  5. The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new
  6. collection of features that give more granular control over CPU performance.
  7. With Intel(R) SST, one server can be configured for power and performance for a
  8. variety of diverse workload requirements.
  9. Refer to the links below for an overview of the technology:
  10. - https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html
  11. - https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf
  12. These capabilities are further enhanced in some of the newer generations of
  13. server platforms where these features can be enumerated and controlled
  14. dynamically without pre-configuring via BIOS setup options. This dynamic
  15. configuration is done via mailbox commands to the hardware. One way to enumerate
  16. and configure these features is by using the Intel Speed Select utility.
  17. This document explains how to use the Intel Speed Select tool to enumerate and
  18. control Intel(R) SST features. This document gives example commands and explains
  19. how these commands change the power and performance profile of the system under
  20. test. Using this tool as an example, customers can replicate the messaging
  21. implemented in the tool in their production software.
  22. intel-speed-select configuration tool
  23. ======================================
  24. Most Linux distribution packages may include the "intel-speed-select" tool. If not,
  25. it can be built by downloading the Linux kernel tree from kernel.org. Once
  26. downloaded, the tool can be built without building the full kernel.
  27. From the kernel tree, run the following commands::
  28. # cd tools/power/x86/intel-speed-select/
  29. # make
  30. # make install
  31. Getting Help
  32. ------------
  33. To get help with the tool, execute the command below::
  34. # intel-speed-select --help
  35. The top-level help describes arguments and features. Notice that there is a
  36. multi-level help structure in the tool. For example, to get help for the feature "perf-profile"::
  37. # intel-speed-select perf-profile --help
  38. To get help on a command, another level of help is provided. For example for the command info "info"::
  39. # intel-speed-select perf-profile info --help
  40. Summary of platform capability
  41. ------------------------------
  42. To check the current platform and driver capabilities, execute::
  43. #intel-speed-select --info
  44. For example on a test system::
  45. # intel-speed-select --info
  46. Intel(R) Speed Select Technology
  47. Executing on CPU model: X
  48. Platform: API version : 1
  49. Platform: Driver version : 1
  50. Platform: mbox supported : 1
  51. Platform: mmio supported : 1
  52. Intel(R) SST-PP (feature perf-profile) is supported
  53. TDP level change control is unlocked, max level: 4
  54. Intel(R) SST-TF (feature turbo-freq) is supported
  55. Intel(R) SST-BF (feature base-freq) is not supported
  56. Intel(R) SST-CP (feature core-power) is supported
  57. Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP)
  58. ------------------------------------------------------------------------
  59. This feature allows configuration of a server dynamically based on workload
  60. performance requirements. This helps users during deployment as they do not have
  61. to choose a specific server configuration statically. This Intel(R) Speed Select
  62. Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism
  63. that allows multiple optimized performance profiles per system. Each profile
  64. defines a set of CPUs that need to be online and rest offline to sustain a
  65. guaranteed base frequency. Once the user issues a command to use a specific
  66. performance profile and meet CPU online/offline requirement, the user can expect
  67. a change in the base frequency dynamically. This feature is called
  68. "perf-profile" when using the Intel Speed Select tool.
  69. Number or performance levels
  70. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  71. There can be multiple performance profiles on a system. To get the number of
  72. profiles, execute the command below::
  73. # intel-speed-select perf-profile get-config-levels
  74. Intel(R) Speed Select Technology
  75. Executing on CPU model: X
  76. package-0
  77. die-0
  78. cpu-0
  79. get-config-levels:4
  80. package-1
  81. die-0
  82. cpu-14
  83. get-config-levels:4
  84. On this system under test, there are 4 performance profiles in addition to the
  85. base performance profile (which is performance level 0).
  86. Lock/Unlock status
  87. ~~~~~~~~~~~~~~~~~~
  88. Even if there are multiple performance profiles, it is possible that they
  89. are locked. If they are locked, users cannot issue a command to change the
  90. performance state. It is possible that there is a BIOS setup to unlock or check
  91. with your system vendor.
  92. To check if the system is locked, execute the following command::
  93. # intel-speed-select perf-profile get-lock-status
  94. Intel(R) Speed Select Technology
  95. Executing on CPU model: X
  96. package-0
  97. die-0
  98. cpu-0
  99. get-lock-status:0
  100. package-1
  101. die-0
  102. cpu-14
  103. get-lock-status:0
  104. In this case, lock status is 0, which means that the system is unlocked.
  105. Properties of a performance level
  106. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  107. To get properties of a specific performance level (For example for the level 0, below), execute the command below::
  108. # intel-speed-select perf-profile info -l 0
  109. Intel(R) Speed Select Technology
  110. Executing on CPU model: X
  111. package-0
  112. die-0
  113. cpu-0
  114. perf-profile-level-0
  115. cpu-count:28
  116. enable-cpu-mask:000003ff,f0003fff
  117. enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41
  118. thermal-design-power-ratio:26
  119. base-frequency(MHz):2600
  120. speed-select-turbo-freq:disabled
  121. speed-select-base-freq:disabled
  122. ...
  123. ...
  124. Here -l option is used to specify a performance level.
  125. If the option -l is omitted, then this command will print information about all
  126. the performance levels. The above command is printing properties of the
  127. performance level 0.
  128. For this performance profile, the list of CPUs displayed by the
  129. "enable-cpu-mask/enable-cpu-list" at the max can be "online." When that
  130. condition is met, then base frequency of 2600 MHz can be maintained. To
  131. understand more, execute "intel-speed-select perf-profile info" for performance
  132. level 4::
  133. # intel-speed-select perf-profile info -l 4
  134. Intel(R) Speed Select Technology
  135. Executing on CPU model: X
  136. package-0
  137. die-0
  138. cpu-0
  139. perf-profile-level-4
  140. cpu-count:28
  141. enable-cpu-mask:000000fa,f0000faf
  142. enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39
  143. thermal-design-power-ratio:28
  144. base-frequency(MHz):2800
  145. speed-select-turbo-freq:disabled
  146. speed-select-base-freq:unsupported
  147. ...
  148. ...
  149. There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if
  150. the user only keeps these CPUs online and the rest "offline," then the base
  151. frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0.
  152. Get current performance level
  153. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  154. To get the current performance level, execute::
  155. # intel-speed-select perf-profile get-config-current-level
  156. Intel(R) Speed Select Technology
  157. Executing on CPU model: X
  158. package-0
  159. die-0
  160. cpu-0
  161. get-config-current_level:0
  162. First verify that the base_frequency displayed by the cpufreq sysfs is correct::
  163. # cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
  164. 2600000
  165. This matches the base-frequency (MHz) field value displayed from the
  166. "perf-profile info" command for performance level 0(cpufreq frequency is in
  167. KHz).
  168. To check if the average frequency is equal to the base frequency for a 100% busy
  169. workload, disable turbo::
  170. # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
  171. Then runs a busy workload on all CPUs, for example::
  172. #stress -c 64
  173. To verify the base frequency, run turbostat::
  174. #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
  175. Package Core CPU Bzy_MHz
  176. - - 2600
  177. 0 0 0 2600
  178. 0 1 1 2600
  179. 0 2 2 2600
  180. 0 3 3 2600
  181. 0 4 4 2600
  182. . . . .
  183. Changing performance level
  184. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  185. To the change the performance level to 4, execute::
  186. # intel-speed-select -d perf-profile set-config-level -l 4 -o
  187. Intel(R) Speed Select Technology
  188. Executing on CPU model: X
  189. package-0
  190. die-0
  191. cpu-0
  192. perf-profile
  193. set_tdp_level:success
  194. In the command above, "-o" is optional. If it is specified, then it will also
  195. offline CPUs which are not present in the enable_cpu_mask for this performance
  196. level.
  197. Now if the base_frequency is checked::
  198. #cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
  199. 2800000
  200. Which shows that the base frequency now increased from 2600 MHz at performance
  201. level 0 to 2800 MHz at performance level 4. As a result, any workload, which can
  202. use fewer CPUs, can see a boost of 200 MHz compared to performance level 0.
  203. Changing performance level via BMC Interface
  204. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  205. It is possible to change SST-PP level using out of band (OOB) agent (Via some
  206. remote management console, through BMC "Baseboard Management Controller"
  207. interface). This mode is supported from the Sapphire Rapids processor
  208. generation. The kernel and tool change to support this mode is added to Linux
  209. kernel version 5.18. To enable this feature, kernel config
  210. "CONFIG_INTEL_HFI_THERMAL" is required. The minimum version of the tool
  211. is "v1.12" to support this feature, which is part of Linux kernel version 5.18.
  212. To support such configuration, this tool can be used as a daemon. Add
  213. a command line option --oob::
  214. # intel-speed-select --oob
  215. Intel(R) Speed Select Technology
  216. Executing on CPU model:143[0x8f]
  217. OOB mode is enabled and will run as daemon
  218. In this mode the tool will online/offline CPUs based on the new performance
  219. level.
  220. Check presence of other Intel(R) SST features
  221. ---------------------------------------------
  222. Each of the performance profiles also specifies weather there is support of
  223. other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency
  224. (Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel
  225. SST-TF)).
  226. For example, from the output of "perf-profile info" above, for level 0 and level
  227. 4:
  228. For level 0::
  229. speed-select-turbo-freq:disabled
  230. speed-select-base-freq:disabled
  231. For level 4::
  232. speed-select-turbo-freq:disabled
  233. speed-select-base-freq:unsupported
  234. Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4
  235. changed from "disabled" to "unsupported" compared to performance level 0.
  236. This means that at performance level 4, the "speed-select-base-freq" feature is
  237. not supported. However, at performance level 0, this feature is "supported", but
  238. currently "disabled", meaning the user has not activated this feature. Whereas
  239. "speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance
  240. levels, but currently not activated by the user.
  241. The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation
  242. technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP).
  243. The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF
  244. is supported on a platform.
  245. Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP)
  246. ---------------------------------------------------------------
  247. Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that
  248. allows users to define per core priority. This defines a mechanism to distribute
  249. power among cores when there is a power constrained scenario. This defines a
  250. class of service (CLOS) configuration.
  251. The user can configure up to 4 class of service configurations. Each CLOS group
  252. configuration allows definitions of parameters, which affects how the frequency
  253. can be limited and power is distributed. Each CPU core can be tied to a class of
  254. service and hence an associated priority. The granularity is at core level not
  255. at per CPU level.
  256. Enable CLOS based prioritization
  257. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  258. To use CLOS based prioritization feature, firmware must be informed to enable
  259. and use a priority type. There is a default per platform priority type, which
  260. can be changed with optional command line parameter.
  261. To enable and check the options, execute::
  262. # intel-speed-select core-power enable --help
  263. Intel(R) Speed Select Technology
  264. Executing on CPU model: X
  265. Enable core-power for a package/die
  266. Clos Enable: Specify priority type with [--priority|-p]
  267. 0: Proportional, 1: Ordered
  268. There are two types of priority types:
  269. - Ordered
  270. Priority for ordered throttling is defined based on the index of the assigned
  271. CLOS group. Where CLOS0 gets highest priority (throttled last).
  272. Priority order is:
  273. CLOS0 > CLOS1 > CLOS2 > CLOS3.
  274. - Proportional
  275. When proportional priority is used, there is an additional parameter called
  276. frequency_weight, which can be specified per CLOS group. The goal of
  277. proportional priority is to provide each core with the requested min., then
  278. distribute all remaining (excess/deficit) budgets in proportion to a defined
  279. weight. This proportional priority can be configured using "core-power config"
  280. command.
  281. To enable with the platform default priority type, execute::
  282. # intel-speed-select core-power enable
  283. Intel(R) Speed Select Technology
  284. Executing on CPU model: X
  285. package-0
  286. die-0
  287. cpu-0
  288. core-power
  289. enable:success
  290. package-1
  291. die-0
  292. cpu-6
  293. core-power
  294. enable:success
  295. The scope of this enable is per package or die scoped when a package contains
  296. multiple dies. To check if CLOS is enabled and get priority type, "core-power
  297. info" command can be used. For example to check the status of core-power feature
  298. on CPU 0, execute::
  299. # intel-speed-select -c 0 core-power info
  300. Intel(R) Speed Select Technology
  301. Executing on CPU model: X
  302. package-0
  303. die-0
  304. cpu-0
  305. core-power
  306. support-status:supported
  307. enable-status:enabled
  308. clos-enable-status:enabled
  309. priority-type:proportional
  310. package-1
  311. die-0
  312. cpu-24
  313. core-power
  314. support-status:supported
  315. enable-status:enabled
  316. clos-enable-status:enabled
  317. priority-type:proportional
  318. Configuring CLOS groups
  319. ~~~~~~~~~~~~~~~~~~~~~~~
  320. Each CLOS group has its own attributes including min, max, freq_weight and
  321. desired. These parameters can be configured with "core-power config" command.
  322. Defaults will be used if user skips setting a parameter except clos id, which is
  323. mandatory. To check core-power config options, execute::
  324. # intel-speed-select core-power config --help
  325. Intel(R) Speed Select Technology
  326. Executing on CPU model: X
  327. Set core-power configuration for one of the four clos ids
  328. Specify targeted clos id with [--clos|-c]
  329. Specify clos Proportional Priority [--weight|-w]
  330. Specify clos min in MHz with [--min|-n]
  331. Specify clos max in MHz with [--max|-m]
  332. For example::
  333. # intel-speed-select core-power config -c 0
  334. Intel(R) Speed Select Technology
  335. Executing on CPU model: X
  336. clos epp is not specified, default: 0
  337. clos frequency weight is not specified, default: 0
  338. clos min is not specified, default: 0 MHz
  339. clos max is not specified, default: 25500 MHz
  340. clos desired is not specified, default: 0
  341. package-0
  342. die-0
  343. cpu-0
  344. core-power
  345. config:success
  346. package-1
  347. die-0
  348. cpu-6
  349. core-power
  350. config:success
  351. The user has the option to change defaults. For example, the user can change the
  352. "min" and set the base frequency to always get guaranteed base frequency.
  353. Get the current CLOS configuration
  354. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  355. To check the current configuration, "core-power get-config" can be used. For
  356. example, to get the configuration of CLOS 0::
  357. # intel-speed-select core-power get-config -c 0
  358. Intel(R) Speed Select Technology
  359. Executing on CPU model: X
  360. package-0
  361. die-0
  362. cpu-0
  363. core-power
  364. clos:0
  365. epp:0
  366. clos-proportional-priority:0
  367. clos-min:0 MHz
  368. clos-max:Max Turbo frequency
  369. clos-desired:0 MHz
  370. package-1
  371. die-0
  372. cpu-24
  373. core-power
  374. clos:0
  375. epp:0
  376. clos-proportional-priority:0
  377. clos-min:0 MHz
  378. clos-max:Max Turbo frequency
  379. clos-desired:0 MHz
  380. Associating a CPU with a CLOS group
  381. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  382. To associate a CPU to a CLOS group "core-power assoc" command can be used::
  383. # intel-speed-select core-power assoc --help
  384. Intel(R) Speed Select Technology
  385. Executing on CPU model: X
  386. Associate a clos id to a CPU
  387. Specify targeted clos id with [--clos|-c]
  388. For example to associate CPU 10 to CLOS group 3, execute::
  389. # intel-speed-select -c 10 core-power assoc -c 3
  390. Intel(R) Speed Select Technology
  391. Executing on CPU model: X
  392. package-0
  393. die-0
  394. cpu-10
  395. core-power
  396. assoc:success
  397. Once a CPU is associated, its sibling CPUs are also associated to a CLOS group.
  398. Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency
  399. limits.
  400. To check the existing association for a CPU, "core-power get-assoc" command can
  401. be used. For example, to get association of CPU 10, execute::
  402. # intel-speed-select -c 10 core-power get-assoc
  403. Intel(R) Speed Select Technology
  404. Executing on CPU model: X
  405. package-1
  406. die-0
  407. cpu-10
  408. get-assoc
  409. clos:3
  410. This shows that CPU 10 is part of a CLOS group 3.
  411. Disable CLOS based prioritization
  412. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  413. To disable, execute::
  414. # intel-speed-select core-power disable
  415. Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization
  416. is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause
  417. Intel(R) SST-TF to fail. This will cause the "disable" command to display an error
  418. if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF
  419. feature must be disabled first.
  420. Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF)
  421. -------------------------------------------------------------------
  422. The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets
  423. the user control base frequency. If some critical workload threads demand
  424. constant high guaranteed performance, then this feature can be used to execute
  425. the thread at higher base frequency on specific sets of CPUs (high priority
  426. CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs.
  427. This feature does not require offline of the low priority CPUs.
  428. The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology -
  429. Performance Profile (Intel(R) SST-PP) performance level configuration. It is
  430. possible that only certain performance levels support Intel(R) SST-BF. It is also
  431. possible that only base performance level (level = 0) has support of Intel
  432. SST-BF. Consequently, first select the desired performance level to enable this
  433. feature.
  434. In the system under test here, Intel(R) SST-BF is supported at the base
  435. performance level 0, but currently disabled. For example for the level 0::
  436. # intel-speed-select -c 0 perf-profile info -l 0
  437. Intel(R) Speed Select Technology
  438. Executing on CPU model: X
  439. package-0
  440. die-0
  441. cpu-0
  442. perf-profile-level-0
  443. ...
  444. speed-select-base-freq:disabled
  445. ...
  446. Before enabling Intel(R) SST-BF and measuring its impact on a workload
  447. performance, execute some workload and measure performance and get a baseline
  448. performance to compare against.
  449. Here the user wants more guaranteed performance. For this reason, it is likely
  450. that turbo is disabled. To disable turbo, execute::
  451. #echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
  452. Based on the output of the "intel-speed-select perf-profile info -l 0" base
  453. frequency of guaranteed frequency 2600 MHz.
  454. Measure baseline performance for comparison
  455. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  456. To compare, pick a multi-threaded workload where each thread can be scheduled on
  457. separate CPUs. "Hackbench pipe" test is a good example on how to improve
  458. performance using Intel(R) SST-BF.
  459. Below, the workload is measuring average scheduler wakeup latency, so a lower
  460. number means better performance::
  461. # taskset -c 3,4 perf bench -r 100 sched pipe
  462. # Running 'sched/pipe' benchmark:
  463. # Executed 1000000 pipe operations between two processes
  464. Total time: 6.102 [sec]
  465. 6.102445 usecs/op
  466. 163868 ops/sec
  467. While running the above test, if we take turbostat output, it will show us that
  468. 2 of the CPUs are busy and reaching max. frequency (which would be the base
  469. frequency as the turbo is disabled). The turbostat output::
  470. #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
  471. Package Core CPU Bzy_MHz
  472. 0 0 0 1000
  473. 0 1 1 1005
  474. 0 2 2 1000
  475. 0 3 3 2600
  476. 0 4 4 2600
  477. 0 5 5 1000
  478. 0 6 6 1000
  479. 0 7 7 1005
  480. 0 8 8 1005
  481. 0 9 9 1000
  482. 0 10 10 1000
  483. 0 11 11 995
  484. 0 12 12 1000
  485. 0 13 13 1000
  486. From the above turbostat output, both CPU 3 and 4 are very busy and reaching
  487. full guaranteed frequency of 2600 MHz.
  488. Intel(R) SST-BF Capabilities
  489. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  490. To get capabilities of Intel(R) SST-BF for the current performance level 0,
  491. execute::
  492. # intel-speed-select base-freq info -l 0
  493. Intel(R) Speed Select Technology
  494. Executing on CPU model: X
  495. package-0
  496. die-0
  497. cpu-0
  498. speed-select-base-freq
  499. high-priority-base-frequency(MHz):3000
  500. high-priority-cpu-mask:00000216,00002160
  501. high-priority-cpu-list:5,6,8,13,33,34,36,41
  502. low-priority-base-frequency(MHz):2400
  503. tjunction-temperature(C):125
  504. thermal-design-power(W):205
  505. The above capabilities show that there are some CPUs on this system that can
  506. offer base frequency of 3000 MHz compared to the standard base frequency at this
  507. performance levels. Nevertheless, these CPUs are fixed, and they are presented
  508. via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF
  509. feature is selected, the low priorities CPUs (which are not in
  510. high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this
  511. clipping of low priority CPUs is acceptable, then the user can enable Intel
  512. SST-BF feature particularly for the above "sched pipe" workload since only two
  513. CPUs are used, they can be scheduled on high priority CPUs and can get boost of
  514. 400 MHz.
  515. Enable Intel(R) SST-BF
  516. ~~~~~~~~~~~~~~~~~~~~~~
  517. To enable Intel(R) SST-BF feature, execute::
  518. # intel-speed-select base-freq enable -a
  519. Intel(R) Speed Select Technology
  520. Executing on CPU model: X
  521. package-0
  522. die-0
  523. cpu-0
  524. base-freq
  525. enable:success
  526. package-1
  527. die-0
  528. cpu-14
  529. base-freq
  530. enable:success
  531. In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it
  532. also adjusts the priority of cores using Intel(R) Speed Select Technology Core
  533. Power (Intel(R) SST-CP) features. This option sets the minimum performance of each
  534. Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to
  535. maximum performance so that the hardware will give maximum performance possible
  536. for each CPU.
  537. If -a option is not used, then the following steps are required before enabling
  538. Intel(R) SST-BF:
  539. - Discover Intel(R) SST-BF and note low and high priority base frequency
  540. - Note the high priority CPU list
  541. - Enable CLOS using core-power feature set
  542. - Configure CLOS parameters. Use CLOS.min to set to minimum performance
  543. - Subscribe desired CPUs to CLOS groups
  544. With this configuration, if the same workload is executed by pinning the
  545. workload to high priority CPUs (CPU 5 and 6 in this case)::
  546. #taskset -c 5,6 perf bench -r 100 sched pipe
  547. # Running 'sched/pipe' benchmark:
  548. # Executed 1000000 pipe operations between two processes
  549. Total time: 5.627 [sec]
  550. 5.627922 usecs/op
  551. 177685 ops/sec
  552. This way, by enabling Intel(R) SST-BF, the performance of this benchmark is
  553. improved (latency reduced) by 7.79%. From the turbostat output, it can be
  554. observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz.
  555. The turbostat output::
  556. #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
  557. Package Core CPU Bzy_MHz
  558. 0 0 0 2151
  559. 0 1 1 2166
  560. 0 2 2 2175
  561. 0 3 3 2175
  562. 0 4 4 2175
  563. 0 5 5 3000
  564. 0 6 6 3000
  565. 0 7 7 2180
  566. 0 8 8 2662
  567. 0 9 9 2176
  568. 0 10 10 2175
  569. 0 11 11 2176
  570. 0 12 12 2176
  571. 0 13 13 2661
  572. Disable Intel(R) SST-BF
  573. ~~~~~~~~~~~~~~~~~~~~~~~
  574. To disable the Intel(R) SST-BF feature, execute::
  575. # intel-speed-select base-freq disable -a
  576. Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
  577. --------------------------------------------------------------------
  578. This feature enables the ability to set different "All core turbo ratio limits"
  579. to cores based on the priority. By using this feature, some cores can be
  580. configured to get higher turbo frequency by designating them as high priority at
  581. the cost of lower or no turbo frequency on the low priority cores.
  582. For this reason, this feature is only useful when system is busy utilizing all
  583. CPUs, but the user wants some configurable option to get high performance on
  584. some CPUs.
  585. The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
  586. depends on the Intel(R) Speed Select Technology - Performance Profile (Intel
  587. SST-PP) performance level configuration. It is possible that only a certain
  588. performance level supports Intel(R) SST-TF. It is also possible that only the base
  589. performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first
  590. select the desired performance level to enable this feature.
  591. In the system under test here, Intel(R) SST-TF is supported at the base
  592. performance level 0, but currently disabled::
  593. # intel-speed-select -c 0 perf-profile info -l 0
  594. Intel(R) Speed Select Technology
  595. package-0
  596. die-0
  597. cpu-0
  598. perf-profile-level-0
  599. ...
  600. ...
  601. speed-select-turbo-freq:disabled
  602. ...
  603. ...
  604. To check if performance can be improved using Intel(R) SST-TF feature, get the turbo
  605. frequency properties with Intel(R) SST-TF enabled and compare to the base turbo
  606. capability of this system.
  607. Get Base turbo capability
  608. ~~~~~~~~~~~~~~~~~~~~~~~~~
  609. To get the base turbo capability of performance level 0, execute::
  610. # intel-speed-select perf-profile info -l 0
  611. Intel(R) Speed Select Technology
  612. Executing on CPU model: X
  613. package-0
  614. die-0
  615. cpu-0
  616. perf-profile-level-0
  617. ...
  618. ...
  619. turbo-ratio-limits-sse
  620. bucket-0
  621. core-count:2
  622. max-turbo-frequency(MHz):3200
  623. bucket-1
  624. core-count:4
  625. max-turbo-frequency(MHz):3100
  626. bucket-2
  627. core-count:6
  628. max-turbo-frequency(MHz):3100
  629. bucket-3
  630. core-count:8
  631. max-turbo-frequency(MHz):3100
  632. bucket-4
  633. core-count:10
  634. max-turbo-frequency(MHz):3100
  635. bucket-5
  636. core-count:12
  637. max-turbo-frequency(MHz):3100
  638. bucket-6
  639. core-count:14
  640. max-turbo-frequency(MHz):3100
  641. bucket-7
  642. core-count:16
  643. max-turbo-frequency(MHz):3100
  644. Based on the data above, when all the CPUS are busy, the max. frequency of 3100
  645. MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress)
  646. and on CPU 12 and 13, execute "hackbench pipe" workload::
  647. # taskset -c 12,13 perf bench -r 100 sched pipe
  648. # Running 'sched/pipe' benchmark:
  649. # Executed 1000000 pipe operations between two processes
  650. Total time: 5.705 [sec]
  651. 5.705488 usecs/op
  652. 175269 ops/sec
  653. The turbostat output::
  654. #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
  655. Package Core CPU Bzy_MHz
  656. 0 0 0 3000
  657. 0 1 1 3000
  658. 0 2 2 3000
  659. 0 3 3 3000
  660. 0 4 4 3000
  661. 0 5 5 3100
  662. 0 6 6 3100
  663. 0 7 7 3000
  664. 0 8 8 3100
  665. 0 9 9 3000
  666. 0 10 10 3000
  667. 0 11 11 3000
  668. 0 12 12 3100
  669. 0 13 13 3100
  670. Based on turbostat output, the performance is limited by frequency cap of 3100
  671. MHz. To check if the hackbench performance can be improved for CPU 12 and CPU
  672. 13, first check the capability of the Intel(R) SST-TF feature for this performance
  673. level.
  674. Get Intel(R) SST-TF Capability
  675. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  676. To get the capability, the "turbo-freq info" command can be used::
  677. # intel-speed-select turbo-freq info -l 0
  678. Intel(R) Speed Select Technology
  679. Executing on CPU model: X
  680. package-0
  681. die-0
  682. cpu-0
  683. speed-select-turbo-freq
  684. bucket-0
  685. high-priority-cores-count:2
  686. high-priority-max-frequency(MHz):3200
  687. high-priority-max-avx2-frequency(MHz):3200
  688. high-priority-max-avx512-frequency(MHz):3100
  689. bucket-1
  690. high-priority-cores-count:4
  691. high-priority-max-frequency(MHz):3100
  692. high-priority-max-avx2-frequency(MHz):3000
  693. high-priority-max-avx512-frequency(MHz):2900
  694. bucket-2
  695. high-priority-cores-count:6
  696. high-priority-max-frequency(MHz):3100
  697. high-priority-max-avx2-frequency(MHz):3000
  698. high-priority-max-avx512-frequency(MHz):2900
  699. speed-select-turbo-freq-clip-frequencies
  700. low-priority-max-frequency(MHz):2600
  701. low-priority-max-avx2-frequency(MHz):2400
  702. low-priority-max-avx512-frequency(MHz):2100
  703. Based on the output above, there is an Intel(R) SST-TF bucket for which there are
  704. two high priority cores. If only two high priority cores are set, then max.
  705. turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz
  706. more than the base turbo capability for all cores.
  707. In turn, for the hackbench workload, two CPUs can be set as high priority and
  708. rest as low priority. One side effect is that once enabled, the low priority
  709. cores will be clipped to a lower frequency of 2600 MHz.
  710. Enable Intel(R) SST-TF
  711. ~~~~~~~~~~~~~~~~~~~~~~
  712. To enable Intel(R) SST-TF, execute::
  713. # intel-speed-select -c 12,13 turbo-freq enable -a
  714. Intel(R) Speed Select Technology
  715. Executing on CPU model: X
  716. package-0
  717. die-0
  718. cpu-12
  719. turbo-freq
  720. enable:success
  721. package-0
  722. die-0
  723. cpu-13
  724. turbo-freq
  725. enable:success
  726. package--1
  727. die-0
  728. cpu-63
  729. turbo-freq --auto
  730. enable:success
  731. In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF
  732. feature and also sets the CPUs to high and low priority using Intel Speed
  733. Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed
  734. with "-c" arguments are marked as high priority, including its siblings.
  735. If -a option is not used, then the following steps are required before enabling
  736. Intel(R) SST-TF:
  737. - Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency
  738. - Enable CLOS using core-power feature set - Configure CLOS parameters
  739. - Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency
  740. If the same hackbench workload is executed, schedule hackbench threads on high
  741. priority CPUs::
  742. #taskset -c 12,13 perf bench -r 100 sched pipe
  743. # Running 'sched/pipe' benchmark:
  744. # Executed 1000000 pipe operations between two processes
  745. Total time: 5.510 [sec]
  746. 5.510165 usecs/op
  747. 180826 ops/sec
  748. This improved performance by around 3.3% improvement on a busy system. Here the
  749. turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost.
  750. The turbostat output::
  751. #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
  752. Package Core CPU Bzy_MHz
  753. ...
  754. 0 12 12 3200
  755. 0 13 13 3200