| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412 |
- .. SPDX-License-Identifier: GPL-2.0-only
- dm-vdo
- ======
- The dm-vdo (virtual data optimizer) device mapper target provides
- block-level deduplication, compression, and thin provisioning. As a device
- mapper target, it can add these features to the storage stack, compatible
- with any file system. The vdo target does not protect against data
- corruption, relying instead on integrity protection of the storage below
- it. It is strongly recommended that lvm be used to manage vdo volumes. See
- lvmvdo(7).
- Userspace component
- ===================
- Formatting a vdo volume requires the use of the 'vdoformat' tool, available
- at:
- https://github.com/dm-vdo/vdo/
- In most cases, a vdo target will recover from a crash automatically the
- next time it is started. In cases where it encountered an unrecoverable
- error (either during normal operation or crash recovery) the target will
- enter or come up in read-only mode. Because read-only mode is indicative of
- data-loss, a positive action must be taken to bring vdo out of read-only
- mode. The 'vdoforcerebuild' tool, available from the same repo, is used to
- prepare a read-only vdo to exit read-only mode. After running this tool,
- the vdo target will rebuild its metadata the next time it is
- started. Although some data may be lost, the rebuilt vdo's metadata will be
- internally consistent and the target will be writable again.
- The repo also contains additional userspace tools which can be used to
- inspect a vdo target's on-disk metadata. Fortunately, these tools are
- rarely needed except by dm-vdo developers.
- Metadata requirements
- =====================
- Each vdo volume reserves 3GB of space for metadata, or more depending on
- its configuration. It is helpful to check that the space saved by
- deduplication and compression is not cancelled out by the metadata
- requirements. An estimation of the space saved for a specific dataset can
- be computed with the vdo estimator tool, which is available at:
- https://github.com/dm-vdo/vdoestimator/
- Target interface
- ================
- Table line
- ----------
- ::
- <offset> <logical device size> vdo V4 <storage device>
- <storage device size> <minimum I/O size> <block map cache size>
- <block map era length> [optional arguments]
- Required parameters:
- offset:
- The offset, in sectors, at which the vdo volume's logical
- space begins.
- logical device size:
- The size of the device which the vdo volume will service,
- in sectors. Must match the current logical size of the vdo
- volume.
- storage device:
- The device holding the vdo volume's data and metadata.
- storage device size:
- The size of the device holding the vdo volume, as a number
- of 4096-byte blocks. Must match the current size of the vdo
- volume.
- minimum I/O size:
- The minimum I/O size for this vdo volume to accept, in
- bytes. Valid values are 512 or 4096. The recommended value
- is 4096.
- block map cache size:
- The size of the block map cache, as a number of 4096-byte
- blocks. The minimum and recommended value is 32768 blocks.
- If the logical thread count is non-zero, the cache size
- must be at least 4096 blocks per logical thread.
- block map era length:
- The speed with which the block map cache writes out
- modified block map pages. A smaller era length is likely to
- reduce the amount of time spent rebuilding, at the cost of
- increased block map writes during normal operation. The
- maximum and recommended value is 16380; the minimum value
- is 1.
- Optional parameters:
- --------------------
- Some or all of these parameters may be specified as <key> <value> pairs.
- Thread related parameters:
- Different categories of work are assigned to separate thread groups, and
- the number of threads in each group can be configured separately.
- If <hash>, <logical>, and <physical> are all set to 0, the work handled by
- all three thread types will be handled by a single thread. If any of these
- values are non-zero, all of them must be non-zero.
- ack:
- The number of threads used to complete bios. Since
- completing a bio calls an arbitrary completion function
- outside the vdo volume, threads of this type allow the vdo
- volume to continue processing requests even when bio
- completion is slow. The default is 1.
- bio:
- The number of threads used to issue bios to the underlying
- storage. Threads of this type allow the vdo volume to
- continue processing requests even when bio submission is
- slow. The default is 4.
- bioRotationInterval:
- The number of bios to enqueue on each bio thread before
- switching to the next thread. The value must be greater
- than 0 and not more than 1024; the default is 64.
- cpu:
- The number of threads used to do CPU-intensive work, such
- as hashing and compression. The default is 1.
- hash:
- The number of threads used to manage data comparisons for
- deduplication based on the hash value of data blocks. The
- default is 0.
- logical:
- The number of threads used to manage caching and locking
- based on the logical address of incoming bios. The default
- is 0; the maximum is 60.
- physical:
- The number of threads used to manage administration of the
- underlying storage device. At format time, a slab size for
- the vdo is chosen; the vdo storage device must be large
- enough to have at least 1 slab per physical thread. The
- default is 0; the maximum is 16.
- Miscellaneous parameters:
- maxDiscard:
- The maximum size of discard bio accepted, in 4096-byte
- blocks. I/O requests to a vdo volume are normally split
- into 4096-byte blocks, and processed up to 2048 at a time.
- However, discard requests to a vdo volume can be
- automatically split to a larger size, up to <maxDiscard>
- 4096-byte blocks in a single bio, and are limited to 1500
- at a time. Increasing this value may provide better overall
- performance, at the cost of increased latency for the
- individual discard requests. The default and minimum is 1;
- the maximum is UINT_MAX / 4096.
- deduplication:
- Whether deduplication is enabled. The default is 'on'; the
- acceptable values are 'on' and 'off'.
- compression:
- Whether compression is enabled. The default is 'off'; the
- acceptable values are 'on' and 'off'.
- Device modification
- -------------------
- A modified table may be loaded into a running, non-suspended vdo volume.
- The modifications will take effect when the device is next resumed. The
- modifiable parameters are <logical device size>, <physical device size>,
- <maxDiscard>, <compression>, and <deduplication>.
- If the logical device size or physical device size are changed, upon
- successful resume vdo will store the new values and require them on future
- startups. These two parameters may not be decreased. The logical device
- size may not exceed 4 PB. The physical device size must increase by at
- least 32832 4096-byte blocks if at all, and must not exceed the size of the
- underlying storage device. Additionally, when formatting the vdo device, a
- slab size is chosen: the physical device size may never increase above the
- size which provides 8192 slabs, and each increase must be large enough to
- add at least one new slab.
- Examples:
- Start a previously-formatted vdo volume with 1 GB logical space and 1 GB
- physical space, storing to /dev/dm-1 which has more than 1 GB of space.
- ::
- dmsetup create vdo0 --table \
- "0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380"
- Grow the logical size to 4 GB.
- ::
- dmsetup reload vdo0 --table \
- "0 8388608 vdo V4 /dev/dm-1 262144 4096 32768 16380"
- dmsetup resume vdo0
- Grow the physical size to 2 GB.
- ::
- dmsetup reload vdo0 --table \
- "0 8388608 vdo V4 /dev/dm-1 524288 4096 32768 16380"
- dmsetup resume vdo0
- Grow the physical size by 1 GB more and increase max discard sectors.
- ::
- dmsetup reload vdo0 --table \
- "0 10485760 vdo V4 /dev/dm-1 786432 4096 32768 16380 maxDiscard 8"
- dmsetup resume vdo0
- Stop the vdo volume.
- ::
- dmsetup remove vdo0
- Start the vdo volume again. Note that the logical and physical device sizes
- must still match, but other parameters can change.
- ::
- dmsetup create vdo1 --table \
- "0 10485760 vdo V4 /dev/dm-1 786432 512 65550 5000 hash 1 logical 3 physical 2"
- Messages
- --------
- All vdo devices accept messages in the form:
- ::
- dmsetup message <target-name> 0 <message-name> <message-parameters>
- The messages are:
- stats:
- Outputs the current view of the vdo statistics. Mostly used
- by the vdostats userspace program to interpret the output
- buffer.
- config:
- Outputs useful vdo configuration information. Mostly used
- by users who want to recreate a similar VDO volume and
- want to know the creation configuration used.
- dump:
- Dumps many internal structures to the system log. This is
- not always safe to run, so it should only be used to debug
- a hung vdo. Optional parameters to specify structures to
- dump are:
- viopool: The pool of I/O requests incoming bios
- pools: A synonym of 'viopool'
- vdo: Most of the structures managing on-disk data
- queues: Basic information about each vdo thread
- threads: A synonym of 'queues'
- default: Equivalent to 'queues vdo'
- all: All of the above.
- dump-on-shutdown:
- Perform a default dump next time vdo shuts down.
- Status
- ------
- ::
- <device> <operating mode> <in recovery> <index state>
- <compression state> <physical blocks used> <total physical blocks>
- device:
- The name of the vdo volume.
- operating mode:
- The current operating mode of the vdo volume; values may be
- 'normal', 'recovering' (the volume has detected an issue
- with its metadata and is attempting to repair itself), and
- 'read-only' (an error has occurred that forces the vdo
- volume to only support read operations and not writes).
- in recovery:
- Whether the vdo volume is currently in recovery mode;
- values may be 'recovering' or '-' which indicates not
- recovering.
- index state:
- The current state of the deduplication index in the vdo
- volume; values may be 'closed', 'closing', 'error',
- 'offline', 'online', 'opening', and 'unknown'.
- compression state:
- The current state of compression in the vdo volume; values
- may be 'offline' and 'online'.
- used physical blocks:
- The number of physical blocks in use by the vdo volume.
- total physical blocks:
- The total number of physical blocks the vdo volume may use;
- the difference between this value and the
- <used physical blocks> is the number of blocks the vdo
- volume has left before being full.
- Memory Requirements
- ===================
- A vdo target requires a fixed 38 MB of RAM along with the following amounts
- that scale with the target:
- - 1.15 MB of RAM for each 1 MB of configured block map cache size. The
- block map cache requires a minimum of 150 MB.
- - 1.6 MB of RAM for each 1 TB of logical space.
- - 268 MB of RAM for each 1 TB of physical storage managed by the volume.
- The deduplication index requires additional memory which scales with the
- size of the deduplication window. For dense indexes, the index requires 1
- GB of RAM per 1 TB of window. For sparse indexes, the index requires 1 GB
- of RAM per 10 TB of window. The index configuration is set when the target
- is formatted and may not be modified.
- Module Parameters
- =================
- The vdo driver has a numeric parameter 'log_level' which controls the
- verbosity of logging from the driver. The default setting is 6
- (LOGLEVEL_INFO and more severe messages).
- Run-time Usage
- ==============
- When using dm-vdo, it is important to be aware of the ways in which its
- behavior differs from other storage targets.
- - There is no guarantee that over-writes of existing blocks will succeed.
- Because the underlying storage may be multiply referenced, over-writing
- an existing block generally requires a vdo to have a free block
- available.
- - When blocks are no longer in use, sending a discard request for those
- blocks lets the vdo release references for those blocks. If the vdo is
- thinly provisioned, discarding unused blocks is essential to prevent the
- target from running out of space. However, due to the sharing of
- duplicate blocks, no discard request for any given logical block is
- guaranteed to reclaim space.
- - Assuming the underlying storage properly implements flush requests, vdo
- is resilient against crashes, however, unflushed writes may or may not
- persist after a crash.
- - Each write to a vdo target entails a significant amount of processing.
- However, much of the work is paralellizable. Therefore, vdo targets
- achieve better throughput at higher I/O depths, and can support up 2048
- requests in parallel.
- Tuning
- ======
- The vdo device has many options, and it can be difficult to make optimal
- choices without perfect knowledge of the workload. Additionally, most
- configuration options must be set when a vdo target is started, and cannot
- be changed without shutting it down completely; the configuration cannot be
- changed while the target is active. Ideally, tuning with simulated
- workloads should be performed before deploying vdo in production
- environments.
- The most important value to adjust is the block map cache size. In order to
- service a request for any logical address, a vdo must load the portion of
- the block map which holds the relevant mapping. These mappings are cached.
- Performance will suffer when the working set does not fit in the cache. By
- default, a vdo allocates 128 MB of metadata cache in RAM to support
- efficient access to 100 GB of logical space at a time. It should be scaled
- up proportionally for larger working sets.
- The logical and physical thread counts should also be adjusted. A logical
- thread controls a disjoint section of the block map, so additional logical
- threads increase parallelism and can increase throughput. Physical threads
- control a disjoint section of the data blocks, so additional physical
- threads can also increase throughput. However, excess threads can waste
- resources and increase contention.
- Bio submission threads control the parallelism involved in sending I/O to
- the underlying storage; fewer threads mean there is more opportunity to
- reorder I/O requests for performance benefit, but also that each I/O
- request has to wait longer before being submitted.
- Bio acknowledgment threads are used for finishing I/O requests. This is
- done on dedicated threads since the amount of work required to execute a
- bio's callback can not be controlled by the vdo itself. Usually one thread
- is sufficient but additional threads may be beneficial, particularly when
- bios have CPU-heavy callbacks.
- CPU threads are used for hashing and for compression; in workloads with
- compression enabled, more threads may result in higher throughput.
- Hash threads are used to sort active requests by hash and determine whether
- they should deduplicate; the most CPU intensive actions done by these
- threads are comparison of 4096-byte data blocks. In most cases, a single
- hash thread is sufficient.
|