| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270 |
- ========
- zsmalloc
- ========
- This allocator is designed for use with zram. Thus, the allocator is
- supposed to work well under low memory conditions. In particular, it
- never attempts higher order page allocation which is very likely to
- fail under memory pressure. On the other hand, if we just use single
- (0-order) pages, it would suffer from very high fragmentation --
- any object of size PAGE_SIZE/2 or larger would occupy an entire page.
- This was one of the major issues with its predecessor (xvmalloc).
- To overcome these issues, zsmalloc allocates a bunch of 0-order pages
- and links them together using various 'struct page' fields. These linked
- pages act as a single higher-order page i.e. an object can span 0-order
- page boundaries. The code refers to these linked pages as a single entity
- called zspage.
- For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
- since this satisfies the requirements of all its current users (in the
- worst case, page is incompressible and is thus stored "as-is" i.e. in
- uncompressed form). For allocation requests larger than this size, failure
- is returned (see zs_malloc).
- Additionally, zs_malloc() does not return a dereferenceable pointer.
- Instead, it returns an opaque handle (unsigned long) which encodes actual
- location of the allocated object. The reason for this indirection is that
- zsmalloc does not keep zspages permanently mapped since that would cause
- issues on 32-bit systems where the VA region for kernel space mappings
- is very small. So, before using the allocating memory, the object has to
- be mapped using zs_map_object() to get a usable pointer and subsequently
- unmapped using zs_unmap_object().
- stat
- ====
- With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
- ``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
- # cat /sys/kernel/debug/zsmalloc/zram0/classes
- class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable
- ...
- ...
- 30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14
- 31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44
- 32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26
- ...
- ...
- class
- index
- size
- object size zspage stores
- 10%
- the number of zspages with usage ratio less than 10% (see below)
- 20%
- the number of zspages with usage ratio between 10% and 20%
- 30%
- the number of zspages with usage ratio between 20% and 30%
- 40%
- the number of zspages with usage ratio between 30% and 40%
- 50%
- the number of zspages with usage ratio between 40% and 50%
- 60%
- the number of zspages with usage ratio between 50% and 60%
- 70%
- the number of zspages with usage ratio between 60% and 70%
- 80%
- the number of zspages with usage ratio between 70% and 80%
- 90%
- the number of zspages with usage ratio between 80% and 90%
- 99%
- the number of zspages with usage ratio between 90% and 99%
- 100%
- the number of zspages with usage ratio 100%
- obj_allocated
- the number of objects allocated
- obj_used
- the number of objects allocated to the user
- pages_used
- the number of pages allocated for the class
- pages_per_zspage
- the number of 0-order pages to make a zspage
- freeable
- the approximate number of pages class compaction can free
- Each zspage maintains inuse counter which keeps track of the number of
- objects stored in the zspage. The inuse counter determines the zspage's
- "fullness group" which is calculated as the ratio of the "inuse" objects to
- the total number of objects the zspage can hold (objs_per_zspage). The
- closer the inuse counter is to objs_per_zspage, the better.
- Internals
- =========
- zsmalloc has 255 size classes, each of which can hold a number of zspages.
- Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
- The optimal zspage chain size for each size class is calculated during the
- creation of the zsmalloc pool (see calculate_zspage_chain_size()).
- As an optimization, zsmalloc merges size classes that have similar
- characteristics in terms of the number of pages per zspage and the number
- of objects that each zspage can store.
- For instance, consider the following size classes:::
- class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
- ...
- 94 1536 0 .... 0 0 0 0 3 0
- 100 1632 0 .... 0 0 0 0 2 0
- ...
- Size classes #95-99 are merged with size class #100. This means that when we
- need to store an object of size, say, 1568 bytes, we end up using size class
- #100 instead of size class #96. Size class #100 is meant for objects of size
- 1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
- Size class #100 consists of zspages with 2 physical pages each, which can
- hold a total of 5 objects. If we need to store 13 objects of size 1568, we
- end up allocating three zspages, or 6 physical pages.
- However, if we take a closer look at size class #96 (which is meant for
- objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
- find that the most optimal zspage configuration for this class is a chain
- of 5 physical pages:::
- pages per zspage wasted bytes used%
- 1 960 76
- 2 352 95
- 3 1312 89
- 4 704 95
- 5 96 99
- This means that a class #96 configuration with 5 physical pages can store 13
- objects of size 1568 in a single zspage, using a total of 5 physical pages.
- This is more efficient than the class #100 configuration, which would use 6
- physical pages to store the same number of objects.
- As the zspage chain size for class #96 increases, its key characteristics
- such as pages per-zspage and objects per-zspage also change. This leads to
- dewer class mergers, resulting in a more compact grouping of classes, which
- reduces memory wastage.
- Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
- class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
- ...
- 202 3264 0 .. 0 0 0 0 4 0
- 254 4096 0 .. 0 0 0 0 1 0
- ...
- Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
- per zspage. Any object larger than 3264 bytes is considered huge and belongs
- to size class #254, which stores each object in its own physical page (objects
- in huge classes do not share pages).
- Increasing the size of the chain of zspages also results in a higher watermark
- for the huge size class and fewer huge classes overall. This allows for more
- efficient storage of large objects.
- For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
- class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
- ...
- 202 3264 0 .. 0 0 0 0 4 0
- 211 3408 0 .. 0 0 0 0 5 0
- 217 3504 0 .. 0 0 0 0 6 0
- 222 3584 0 .. 0 0 0 0 7 0
- 225 3632 0 .. 0 0 0 0 8 0
- 254 4096 0 .. 0 0 0 0 1 0
- ...
- For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
- class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
- ...
- 202 3264 0 .. 0 0 0 0 4 0
- 206 3328 0 .. 0 0 0 0 13 0
- 207 3344 0 .. 0 0 0 0 9 0
- 208 3360 0 .. 0 0 0 0 14 0
- 211 3408 0 .. 0 0 0 0 5 0
- 212 3424 0 .. 0 0 0 0 16 0
- 214 3456 0 .. 0 0 0 0 11 0
- 217 3504 0 .. 0 0 0 0 6 0
- 219 3536 0 .. 0 0 0 0 13 0
- 222 3584 0 .. 0 0 0 0 7 0
- 223 3600 0 .. 0 0 0 0 15 0
- 225 3632 0 .. 0 0 0 0 8 0
- 228 3680 0 .. 0 0 0 0 9 0
- 230 3712 0 .. 0 0 0 0 10 0
- 232 3744 0 .. 0 0 0 0 11 0
- 234 3776 0 .. 0 0 0 0 12 0
- 235 3792 0 .. 0 0 0 0 13 0
- 236 3808 0 .. 0 0 0 0 14 0
- 238 3840 0 .. 0 0 0 0 15 0
- 254 4096 0 .. 0 0 0 0 1 0
- ...
- Overall the combined zspage chain size effect on zsmalloc pool configuration:::
- pages per zspage number of size classes (clusters) huge size class watermark
- 4 69 3264
- 5 86 3408
- 6 93 3504
- 7 112 3584
- 8 123 3632
- 9 140 3680
- 10 143 3712
- 11 159 3744
- 12 164 3776
- 13 180 3792
- 14 183 3808
- 15 188 3840
- 16 191 3840
- A synthetic test
- ----------------
- zram as a build artifacts storage (Linux kernel compilation).
- * `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
- zsmalloc classes stats:::
- class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
- ...
- Total 13 .. 51 413836 412973 159955 3
- zram mm_stat:::
- 1691783168 628083717 655175680 0 655175680 60 0 34048 34049
- * `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
- zsmalloc classes stats:::
- class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
- ...
- Total 18 .. 87 414852 412978 156666 0
- zram mm_stat:::
- 1691803648 627793930 641703936 0 641703936 60 0 33591 33591
- Using larger zspage chains may result in using fewer physical pages, as seen
- in the example where the number of physical pages used decreased from 159955
- to 156666, at the same time maximum zsmalloc pool memory usage went down from
- 655175680 to 641703936 bytes.
- However, this advantage may be offset by the potential for increased system
- memory pressure (as some zspages have larger chain sizes) in cases where there
- is heavy internal fragmentation and zspool compaction is unable to relocate
- objects and release zspages. In these cases, it is recommended to decrease
- the limit on the size of the zspage chains (as specified by the
- CONFIG_ZSMALLOC_CHAIN_SIZE option).
- Functions
- =========
- .. kernel-doc:: mm/zsmalloc.c
|