vdo.rst 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412
  1. .. SPDX-License-Identifier: GPL-2.0-only
  2. dm-vdo
  3. ======
  4. The dm-vdo (virtual data optimizer) device mapper target provides
  5. block-level deduplication, compression, and thin provisioning. As a device
  6. mapper target, it can add these features to the storage stack, compatible
  7. with any file system. The vdo target does not protect against data
  8. corruption, relying instead on integrity protection of the storage below
  9. it. It is strongly recommended that lvm be used to manage vdo volumes. See
  10. lvmvdo(7).
  11. Userspace component
  12. ===================
  13. Formatting a vdo volume requires the use of the 'vdoformat' tool, available
  14. at:
  15. https://github.com/dm-vdo/vdo/
  16. In most cases, a vdo target will recover from a crash automatically the
  17. next time it is started. In cases where it encountered an unrecoverable
  18. error (either during normal operation or crash recovery) the target will
  19. enter or come up in read-only mode. Because read-only mode is indicative of
  20. data-loss, a positive action must be taken to bring vdo out of read-only
  21. mode. The 'vdoforcerebuild' tool, available from the same repo, is used to
  22. prepare a read-only vdo to exit read-only mode. After running this tool,
  23. the vdo target will rebuild its metadata the next time it is
  24. started. Although some data may be lost, the rebuilt vdo's metadata will be
  25. internally consistent and the target will be writable again.
  26. The repo also contains additional userspace tools which can be used to
  27. inspect a vdo target's on-disk metadata. Fortunately, these tools are
  28. rarely needed except by dm-vdo developers.
  29. Metadata requirements
  30. =====================
  31. Each vdo volume reserves 3GB of space for metadata, or more depending on
  32. its configuration. It is helpful to check that the space saved by
  33. deduplication and compression is not cancelled out by the metadata
  34. requirements. An estimation of the space saved for a specific dataset can
  35. be computed with the vdo estimator tool, which is available at:
  36. https://github.com/dm-vdo/vdoestimator/
  37. Target interface
  38. ================
  39. Table line
  40. ----------
  41. ::
  42. <offset> <logical device size> vdo V4 <storage device>
  43. <storage device size> <minimum I/O size> <block map cache size>
  44. <block map era length> [optional arguments]
  45. Required parameters:
  46. offset:
  47. The offset, in sectors, at which the vdo volume's logical
  48. space begins.
  49. logical device size:
  50. The size of the device which the vdo volume will service,
  51. in sectors. Must match the current logical size of the vdo
  52. volume.
  53. storage device:
  54. The device holding the vdo volume's data and metadata.
  55. storage device size:
  56. The size of the device holding the vdo volume, as a number
  57. of 4096-byte blocks. Must match the current size of the vdo
  58. volume.
  59. minimum I/O size:
  60. The minimum I/O size for this vdo volume to accept, in
  61. bytes. Valid values are 512 or 4096. The recommended value
  62. is 4096.
  63. block map cache size:
  64. The size of the block map cache, as a number of 4096-byte
  65. blocks. The minimum and recommended value is 32768 blocks.
  66. If the logical thread count is non-zero, the cache size
  67. must be at least 4096 blocks per logical thread.
  68. block map era length:
  69. The speed with which the block map cache writes out
  70. modified block map pages. A smaller era length is likely to
  71. reduce the amount of time spent rebuilding, at the cost of
  72. increased block map writes during normal operation. The
  73. maximum and recommended value is 16380; the minimum value
  74. is 1.
  75. Optional parameters:
  76. --------------------
  77. Some or all of these parameters may be specified as <key> <value> pairs.
  78. Thread related parameters:
  79. Different categories of work are assigned to separate thread groups, and
  80. the number of threads in each group can be configured separately.
  81. If <hash>, <logical>, and <physical> are all set to 0, the work handled by
  82. all three thread types will be handled by a single thread. If any of these
  83. values are non-zero, all of them must be non-zero.
  84. ack:
  85. The number of threads used to complete bios. Since
  86. completing a bio calls an arbitrary completion function
  87. outside the vdo volume, threads of this type allow the vdo
  88. volume to continue processing requests even when bio
  89. completion is slow. The default is 1.
  90. bio:
  91. The number of threads used to issue bios to the underlying
  92. storage. Threads of this type allow the vdo volume to
  93. continue processing requests even when bio submission is
  94. slow. The default is 4.
  95. bioRotationInterval:
  96. The number of bios to enqueue on each bio thread before
  97. switching to the next thread. The value must be greater
  98. than 0 and not more than 1024; the default is 64.
  99. cpu:
  100. The number of threads used to do CPU-intensive work, such
  101. as hashing and compression. The default is 1.
  102. hash:
  103. The number of threads used to manage data comparisons for
  104. deduplication based on the hash value of data blocks. The
  105. default is 0.
  106. logical:
  107. The number of threads used to manage caching and locking
  108. based on the logical address of incoming bios. The default
  109. is 0; the maximum is 60.
  110. physical:
  111. The number of threads used to manage administration of the
  112. underlying storage device. At format time, a slab size for
  113. the vdo is chosen; the vdo storage device must be large
  114. enough to have at least 1 slab per physical thread. The
  115. default is 0; the maximum is 16.
  116. Miscellaneous parameters:
  117. maxDiscard:
  118. The maximum size of discard bio accepted, in 4096-byte
  119. blocks. I/O requests to a vdo volume are normally split
  120. into 4096-byte blocks, and processed up to 2048 at a time.
  121. However, discard requests to a vdo volume can be
  122. automatically split to a larger size, up to <maxDiscard>
  123. 4096-byte blocks in a single bio, and are limited to 1500
  124. at a time. Increasing this value may provide better overall
  125. performance, at the cost of increased latency for the
  126. individual discard requests. The default and minimum is 1;
  127. the maximum is UINT_MAX / 4096.
  128. deduplication:
  129. Whether deduplication is enabled. The default is 'on'; the
  130. acceptable values are 'on' and 'off'.
  131. compression:
  132. Whether compression is enabled. The default is 'off'; the
  133. acceptable values are 'on' and 'off'.
  134. Device modification
  135. -------------------
  136. A modified table may be loaded into a running, non-suspended vdo volume.
  137. The modifications will take effect when the device is next resumed. The
  138. modifiable parameters are <logical device size>, <physical device size>,
  139. <maxDiscard>, <compression>, and <deduplication>.
  140. If the logical device size or physical device size are changed, upon
  141. successful resume vdo will store the new values and require them on future
  142. startups. These two parameters may not be decreased. The logical device
  143. size may not exceed 4 PB. The physical device size must increase by at
  144. least 32832 4096-byte blocks if at all, and must not exceed the size of the
  145. underlying storage device. Additionally, when formatting the vdo device, a
  146. slab size is chosen: the physical device size may never increase above the
  147. size which provides 8192 slabs, and each increase must be large enough to
  148. add at least one new slab.
  149. Examples:
  150. Start a previously-formatted vdo volume with 1 GB logical space and 1 GB
  151. physical space, storing to /dev/dm-1 which has more than 1 GB of space.
  152. ::
  153. dmsetup create vdo0 --table \
  154. "0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380"
  155. Grow the logical size to 4 GB.
  156. ::
  157. dmsetup reload vdo0 --table \
  158. "0 8388608 vdo V4 /dev/dm-1 262144 4096 32768 16380"
  159. dmsetup resume vdo0
  160. Grow the physical size to 2 GB.
  161. ::
  162. dmsetup reload vdo0 --table \
  163. "0 8388608 vdo V4 /dev/dm-1 524288 4096 32768 16380"
  164. dmsetup resume vdo0
  165. Grow the physical size by 1 GB more and increase max discard sectors.
  166. ::
  167. dmsetup reload vdo0 --table \
  168. "0 10485760 vdo V4 /dev/dm-1 786432 4096 32768 16380 maxDiscard 8"
  169. dmsetup resume vdo0
  170. Stop the vdo volume.
  171. ::
  172. dmsetup remove vdo0
  173. Start the vdo volume again. Note that the logical and physical device sizes
  174. must still match, but other parameters can change.
  175. ::
  176. dmsetup create vdo1 --table \
  177. "0 10485760 vdo V4 /dev/dm-1 786432 512 65550 5000 hash 1 logical 3 physical 2"
  178. Messages
  179. --------
  180. All vdo devices accept messages in the form:
  181. ::
  182. dmsetup message <target-name> 0 <message-name> <message-parameters>
  183. The messages are:
  184. stats:
  185. Outputs the current view of the vdo statistics. Mostly used
  186. by the vdostats userspace program to interpret the output
  187. buffer.
  188. config:
  189. Outputs useful vdo configuration information. Mostly used
  190. by users who want to recreate a similar VDO volume and
  191. want to know the creation configuration used.
  192. dump:
  193. Dumps many internal structures to the system log. This is
  194. not always safe to run, so it should only be used to debug
  195. a hung vdo. Optional parameters to specify structures to
  196. dump are:
  197. viopool: The pool of I/O requests incoming bios
  198. pools: A synonym of 'viopool'
  199. vdo: Most of the structures managing on-disk data
  200. queues: Basic information about each vdo thread
  201. threads: A synonym of 'queues'
  202. default: Equivalent to 'queues vdo'
  203. all: All of the above.
  204. dump-on-shutdown:
  205. Perform a default dump next time vdo shuts down.
  206. Status
  207. ------
  208. ::
  209. <device> <operating mode> <in recovery> <index state>
  210. <compression state> <physical blocks used> <total physical blocks>
  211. device:
  212. The name of the vdo volume.
  213. operating mode:
  214. The current operating mode of the vdo volume; values may be
  215. 'normal', 'recovering' (the volume has detected an issue
  216. with its metadata and is attempting to repair itself), and
  217. 'read-only' (an error has occurred that forces the vdo
  218. volume to only support read operations and not writes).
  219. in recovery:
  220. Whether the vdo volume is currently in recovery mode;
  221. values may be 'recovering' or '-' which indicates not
  222. recovering.
  223. index state:
  224. The current state of the deduplication index in the vdo
  225. volume; values may be 'closed', 'closing', 'error',
  226. 'offline', 'online', 'opening', and 'unknown'.
  227. compression state:
  228. The current state of compression in the vdo volume; values
  229. may be 'offline' and 'online'.
  230. used physical blocks:
  231. The number of physical blocks in use by the vdo volume.
  232. total physical blocks:
  233. The total number of physical blocks the vdo volume may use;
  234. the difference between this value and the
  235. <used physical blocks> is the number of blocks the vdo
  236. volume has left before being full.
  237. Memory Requirements
  238. ===================
  239. A vdo target requires a fixed 38 MB of RAM along with the following amounts
  240. that scale with the target:
  241. - 1.15 MB of RAM for each 1 MB of configured block map cache size. The
  242. block map cache requires a minimum of 150 MB.
  243. - 1.6 MB of RAM for each 1 TB of logical space.
  244. - 268 MB of RAM for each 1 TB of physical storage managed by the volume.
  245. The deduplication index requires additional memory which scales with the
  246. size of the deduplication window. For dense indexes, the index requires 1
  247. GB of RAM per 1 TB of window. For sparse indexes, the index requires 1 GB
  248. of RAM per 10 TB of window. The index configuration is set when the target
  249. is formatted and may not be modified.
  250. Module Parameters
  251. =================
  252. The vdo driver has a numeric parameter 'log_level' which controls the
  253. verbosity of logging from the driver. The default setting is 6
  254. (LOGLEVEL_INFO and more severe messages).
  255. Run-time Usage
  256. ==============
  257. When using dm-vdo, it is important to be aware of the ways in which its
  258. behavior differs from other storage targets.
  259. - There is no guarantee that over-writes of existing blocks will succeed.
  260. Because the underlying storage may be multiply referenced, over-writing
  261. an existing block generally requires a vdo to have a free block
  262. available.
  263. - When blocks are no longer in use, sending a discard request for those
  264. blocks lets the vdo release references for those blocks. If the vdo is
  265. thinly provisioned, discarding unused blocks is essential to prevent the
  266. target from running out of space. However, due to the sharing of
  267. duplicate blocks, no discard request for any given logical block is
  268. guaranteed to reclaim space.
  269. - Assuming the underlying storage properly implements flush requests, vdo
  270. is resilient against crashes, however, unflushed writes may or may not
  271. persist after a crash.
  272. - Each write to a vdo target entails a significant amount of processing.
  273. However, much of the work is paralellizable. Therefore, vdo targets
  274. achieve better throughput at higher I/O depths, and can support up 2048
  275. requests in parallel.
  276. Tuning
  277. ======
  278. The vdo device has many options, and it can be difficult to make optimal
  279. choices without perfect knowledge of the workload. Additionally, most
  280. configuration options must be set when a vdo target is started, and cannot
  281. be changed without shutting it down completely; the configuration cannot be
  282. changed while the target is active. Ideally, tuning with simulated
  283. workloads should be performed before deploying vdo in production
  284. environments.
  285. The most important value to adjust is the block map cache size. In order to
  286. service a request for any logical address, a vdo must load the portion of
  287. the block map which holds the relevant mapping. These mappings are cached.
  288. Performance will suffer when the working set does not fit in the cache. By
  289. default, a vdo allocates 128 MB of metadata cache in RAM to support
  290. efficient access to 100 GB of logical space at a time. It should be scaled
  291. up proportionally for larger working sets.
  292. The logical and physical thread counts should also be adjusted. A logical
  293. thread controls a disjoint section of the block map, so additional logical
  294. threads increase parallelism and can increase throughput. Physical threads
  295. control a disjoint section of the data blocks, so additional physical
  296. threads can also increase throughput. However, excess threads can waste
  297. resources and increase contention.
  298. Bio submission threads control the parallelism involved in sending I/O to
  299. the underlying storage; fewer threads mean there is more opportunity to
  300. reorder I/O requests for performance benefit, but also that each I/O
  301. request has to wait longer before being submitted.
  302. Bio acknowledgment threads are used for finishing I/O requests. This is
  303. done on dedicated threads since the amount of work required to execute a
  304. bio's callback can not be controlled by the vdo itself. Usually one thread
  305. is sufficient but additional threads may be beneficial, particularly when
  306. bios have CPU-heavy callbacks.
  307. CPU threads are used for hashing and for compression; in workloads with
  308. compression enabled, more threads may result in higher throughput.
  309. Hash threads are used to sort active requests by hash and determine whether
  310. they should deduplicate; the most CPU intensive actions done by these
  311. threads are comparison of 4096-byte data blocks. In most cases, a single
  312. hash thread is sufficient.