blkio-controller.rst 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301
  1. ===================
  2. Block IO Controller
  3. ===================
  4. Overview
  5. ========
  6. cgroup subsys "blkio" implements the block io controller. There seems to be
  7. a need of various kinds of IO control policies (like proportional BW, max BW)
  8. both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
  9. Plan is to use the same cgroup based management interface for blkio controller
  10. and based on user options switch IO policies in the background.
  11. One IO control policy is throttling policy which can be used to
  12. specify upper IO rate limits on devices. This policy is implemented in
  13. generic block layer and can be used on leaf nodes as well as higher
  14. level logical devices like device mapper.
  15. HOWTO
  16. =====
  17. Throttling/Upper Limit policy
  18. -----------------------------
  19. Enable Block IO controller::
  20. CONFIG_BLK_CGROUP=y
  21. Enable throttling in block layer::
  22. CONFIG_BLK_DEV_THROTTLING=y
  23. Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
  24. mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
  25. Specify a bandwidth rate on particular device for root group. The format
  26. for policy is "<major>:<minor> <bytes_per_second>"::
  27. echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
  28. This will put a limit of 1MB/second on reads happening for root group
  29. on device having major/minor number 8:16.
  30. Run dd to read a file and see if rate is throttled to 1MB/s or not::
  31. # dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
  32. 1024+0 records in
  33. 1024+0 records out
  34. 4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
  35. Limits for writes can be put using blkio.throttle.write_bps_device file.
  36. Hierarchical Cgroups
  37. ====================
  38. Throttling implements hierarchy support; however,
  39. throttling's hierarchy support is enabled iff "sane_behavior" is
  40. enabled from cgroup side, which currently is a development option and
  41. not publicly available.
  42. If somebody created a hierarchy like as follows::
  43. root
  44. / \
  45. test1 test2
  46. |
  47. test3
  48. Throttling with "sane_behavior" will handle the
  49. hierarchy correctly. For throttling, all limits apply
  50. to the whole subtree while all statistics are local to the IOs
  51. directly generated by tasks in that cgroup.
  52. Throttling without "sane_behavior" enabled from cgroup side will
  53. practically treat all groups at same level as if it looks like the
  54. following::
  55. pivot
  56. / / \ \
  57. root test1 test2 test3
  58. Various user visible config options
  59. ===================================
  60. CONFIG_BLK_CGROUP
  61. Block IO controller.
  62. CONFIG_BFQ_CGROUP_DEBUG
  63. Debug help. Right now some additional stats file show up in cgroup
  64. if this option is enabled.
  65. CONFIG_BLK_DEV_THROTTLING
  66. Enable block device throttling support in block layer.
  67. Details of cgroup files
  68. =======================
  69. Proportional weight policy files
  70. --------------------------------
  71. blkio.bfq.weight
  72. Specifies per cgroup weight. This is default weight of the group
  73. on all the devices until and unless overridden by per device rule
  74. (see `blkio.bfq.weight_device` below).
  75. Currently allowed range of weights is from 1 to 1000. For more details,
  76. see Documentation/block/bfq-iosched.rst.
  77. blkio.bfq.weight_device
  78. Specifies per cgroup per device weights, overriding the default group
  79. weight. For more details, see Documentation/block/bfq-iosched.rst.
  80. Following is the format::
  81. # echo dev_maj:dev_minor weight > blkio.bfq.weight_device
  82. Configure weight=300 on /dev/sdb (8:16) in this cgroup::
  83. # echo 8:16 300 > blkio.bfq.weight_device
  84. # cat blkio.bfq.weight_device
  85. dev weight
  86. 8:16 300
  87. Configure weight=500 on /dev/sda (8:0) in this cgroup::
  88. # echo 8:0 500 > blkio.bfq.weight_device
  89. # cat blkio.bfq.weight_device
  90. dev weight
  91. 8:0 500
  92. 8:16 300
  93. Remove specific weight for /dev/sda in this cgroup::
  94. # echo 8:0 0 > blkio.bfq.weight_device
  95. # cat blkio.bfq.weight_device
  96. dev weight
  97. 8:16 300
  98. blkio.time
  99. Disk time allocated to cgroup per device in milliseconds. First
  100. two fields specify the major and minor number of the device and
  101. third field specifies the disk time allocated to group in
  102. milliseconds.
  103. blkio.sectors
  104. Number of sectors transferred to/from disk by the group. First
  105. two fields specify the major and minor number of the device and
  106. third field specifies the number of sectors transferred by the
  107. group to/from the device.
  108. blkio.io_service_bytes
  109. Number of bytes transferred to/from the disk by the group. These
  110. are further divided by the type of operation - read or write, sync
  111. or async. First two fields specify the major and minor number of the
  112. device, third field specifies the operation type and the fourth field
  113. specifies the number of bytes.
  114. blkio.io_serviced
  115. Number of IOs (bio) issued to the disk by the group. These
  116. are further divided by the type of operation - read or write, sync
  117. or async. First two fields specify the major and minor number of the
  118. device, third field specifies the operation type and the fourth field
  119. specifies the number of IOs.
  120. blkio.io_service_time
  121. Total amount of time between request dispatch and request completion
  122. for the IOs done by this cgroup. This is in nanoseconds to make it
  123. meaningful for flash devices too. For devices with queue depth of 1,
  124. this time represents the actual service time. When queue_depth > 1,
  125. that is no longer true as requests may be served out of order. This
  126. may cause the service time for a given IO to include the service time
  127. of multiple IOs when served out of order which may result in total
  128. io_service_time > actual time elapsed. This time is further divided by
  129. the type of operation - read or write, sync or async. First two fields
  130. specify the major and minor number of the device, third field
  131. specifies the operation type and the fourth field specifies the
  132. io_service_time in ns.
  133. blkio.io_wait_time
  134. Total amount of time the IOs for this cgroup spent waiting in the
  135. scheduler queues for service. This can be greater than the total time
  136. elapsed since it is cumulative io_wait_time for all IOs. It is not a
  137. measure of total time the cgroup spent waiting but rather a measure of
  138. the wait_time for its individual IOs. For devices with queue_depth > 1
  139. this metric does not include the time spent waiting for service once
  140. the IO is dispatched to the device but till it actually gets serviced
  141. (there might be a time lag here due to re-ordering of requests by the
  142. device). This is in nanoseconds to make it meaningful for flash
  143. devices too. This time is further divided by the type of operation -
  144. read or write, sync or async. First two fields specify the major and
  145. minor number of the device, third field specifies the operation type
  146. and the fourth field specifies the io_wait_time in ns.
  147. blkio.io_merged
  148. Total number of bios/requests merged into requests belonging to this
  149. cgroup. This is further divided by the type of operation - read or
  150. write, sync or async.
  151. blkio.io_queued
  152. Total number of requests queued up at any given instant for this
  153. cgroup. This is further divided by the type of operation - read or
  154. write, sync or async.
  155. blkio.avg_queue_size
  156. Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
  157. The average queue size for this cgroup over the entire time of this
  158. cgroup's existence. Queue size samples are taken each time one of the
  159. queues of this cgroup gets a timeslice.
  160. blkio.group_wait_time
  161. Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
  162. This is the amount of time the cgroup had to wait since it became busy
  163. (i.e., went from 0 to 1 request queued) to get a timeslice for one of
  164. its queues. This is different from the io_wait_time which is the
  165. cumulative total of the amount of time spent by each IO in that cgroup
  166. waiting in the scheduler queue. This is in nanoseconds. If this is
  167. read when the cgroup is in a waiting (for timeslice) state, the stat
  168. will only report the group_wait_time accumulated till the last time it
  169. got a timeslice and will not include the current delta.
  170. blkio.empty_time
  171. Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
  172. This is the amount of time a cgroup spends without any pending
  173. requests when not being served, i.e., it does not include any time
  174. spent idling for one of the queues of the cgroup. This is in
  175. nanoseconds. If this is read when the cgroup is in an empty state,
  176. the stat will only report the empty_time accumulated till the last
  177. time it had a pending request and will not include the current delta.
  178. blkio.idle_time
  179. Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
  180. This is the amount of time spent by the IO scheduler idling for a
  181. given cgroup in anticipation of a better request than the existing ones
  182. from other queues/cgroups. This is in nanoseconds. If this is read
  183. when the cgroup is in an idling state, the stat will only report the
  184. idle_time accumulated till the last idle period and will not include
  185. the current delta.
  186. blkio.dequeue
  187. Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
  188. gives the statistics about how many a times a group was dequeued
  189. from service tree of the device. First two fields specify the major
  190. and minor number of the device and third field specifies the number
  191. of times a group was dequeued from a particular device.
  192. blkio.*_recursive
  193. Recursive version of various stats. These files show the
  194. same information as their non-recursive counterparts but
  195. include stats from all the descendant cgroups.
  196. Throttling/Upper limit policy files
  197. -----------------------------------
  198. blkio.throttle.read_bps_device
  199. Specifies upper limit on READ rate from the device. IO rate is
  200. specified in bytes per second. Rules are per device. Following is
  201. the format::
  202. echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
  203. blkio.throttle.write_bps_device
  204. Specifies upper limit on WRITE rate to the device. IO rate is
  205. specified in bytes per second. Rules are per device. Following is
  206. the format::
  207. echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
  208. blkio.throttle.read_iops_device
  209. Specifies upper limit on READ rate from the device. IO rate is
  210. specified in IO per second. Rules are per device. Following is
  211. the format::
  212. echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
  213. blkio.throttle.write_iops_device
  214. Specifies upper limit on WRITE rate to the device. IO rate is
  215. specified in io per second. Rules are per device. Following is
  216. the format::
  217. echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
  218. Note: If both BW and IOPS rules are specified for a device, then IO is
  219. subjected to both the constraints.
  220. blkio.throttle.io_serviced
  221. Number of IOs (bio) issued to the disk by the group. These
  222. are further divided by the type of operation - read or write, sync
  223. or async. First two fields specify the major and minor number of the
  224. device, third field specifies the operation type and the fourth field
  225. specifies the number of IOs.
  226. blkio.throttle.io_service_bytes
  227. Number of bytes transferred to/from the disk by the group. These
  228. are further divided by the type of operation - read or write, sync
  229. or async. First two fields specify the major and minor number of the
  230. device, third field specifies the operation type and the fourth field
  231. specifies the number of bytes.
  232. Common files among various policies
  233. -----------------------------------
  234. blkio.reset_stats
  235. Writing an int to this file will result in resetting all the stats
  236. for that cgroup.