| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696 |
- ==================
- Memory Hot(Un)Plug
- ==================
- This document describes generic Linux support for memory hot(un)plug with
- a focus on System RAM, including ZONE_MOVABLE support.
- .. contents:: :local:
- Introduction
- ============
- Memory hot(un)plug allows for increasing and decreasing the size of physical
- memory available to a machine at runtime. In the simplest case, it consists of
- physically plugging or unplugging a DIMM at runtime, coordinated with the
- operating system.
- Memory hot(un)plug is used for various purposes:
- - The physical memory available to a machine can be adjusted at runtime, up- or
- downgrading the memory capacity. This dynamic memory resizing, sometimes
- referred to as "capacity on demand", is frequently used with virtual machines
- and logical partitions.
- - Replacing hardware, such as DIMMs or whole NUMA nodes, without downtime. One
- example is replacing failing memory modules.
- - Reducing energy consumption either by physically unplugging memory modules or
- by logically unplugging (parts of) memory modules from Linux.
- Further, the basic memory hot(un)plug infrastructure in Linux is nowadays also
- used to expose persistent memory, other performance-differentiated memory and
- reserved memory regions as ordinary system RAM to Linux.
- Linux only supports memory hot(un)plug on selected 64 bit architectures, such as
- x86_64, arm64, ppc64 and s390x.
- Memory Hot(Un)Plug Granularity
- ------------------------------
- Memory hot(un)plug in Linux uses the SPARSEMEM memory model, which divides the
- physical memory address space into chunks of the same size: memory sections. The
- size of a memory section is architecture dependent. For example, x86_64 uses
- 128 MiB and ppc64 uses 16 MiB.
- Memory sections are combined into chunks referred to as "memory blocks". The
- size of a memory block is architecture dependent and corresponds to the smallest
- granularity that can be hot(un)plugged. The default size of a memory block is
- the same as memory section size, unless an architecture specifies otherwise.
- All memory blocks have the same size.
- Phases of Memory Hotplug
- ------------------------
- Memory hotplug consists of two phases:
- (1) Adding the memory to Linux
- (2) Onlining memory blocks
- In the first phase, metadata, such as the memory map ("memmap") and page tables
- for the direct mapping, is allocated and initialized, and memory blocks are
- created; the latter also creates sysfs files for managing newly created memory
- blocks.
- In the second phase, added memory is exposed to the page allocator. After this
- phase, the memory is visible in memory statistics, such as free and total
- memory, of the system.
- Phases of Memory Hotunplug
- --------------------------
- Memory hotunplug consists of two phases:
- (1) Offlining memory blocks
- (2) Removing the memory from Linux
- In the first phase, memory is "hidden" from the page allocator again, for
- example, by migrating busy memory to other memory locations and removing all
- relevant free pages from the page allocator After this phase, the memory is no
- longer visible in memory statistics of the system.
- In the second phase, the memory blocks are removed and metadata is freed.
- Memory Hotplug Notifications
- ============================
- There are various ways how Linux is notified about memory hotplug events such
- that it can start adding hotplugged memory. This description is limited to
- systems that support ACPI; mechanisms specific to other firmware interfaces or
- virtual machines are not described.
- ACPI Notifications
- ------------------
- Platforms that support ACPI, such as x86_64, can support memory hotplug
- notifications via ACPI.
- In general, a firmware supporting memory hotplug defines a memory class object
- HID "PNP0C80". When notified about hotplug of a new memory device, the ACPI
- driver will hotplug the memory to Linux.
- If the firmware supports hotplug of NUMA nodes, it defines an object _HID
- "ACPI0004", "PNP0A05", or "PNP0A06". When notified about an hotplug event, all
- assigned memory devices are added to Linux by the ACPI driver.
- Similarly, Linux can be notified about requests to hotunplug a memory device or
- a NUMA node via ACPI. The ACPI driver will try offlining all relevant memory
- blocks, and, if successful, hotunplug the memory from Linux.
- Manual Probing
- --------------
- On some architectures, the firmware may not be able to notify the operating
- system about a memory hotplug event. Instead, the memory has to be manually
- probed from user space.
- The probe interface is located at::
- /sys/devices/system/memory/probe
- Only complete memory blocks can be probed. Individual memory blocks are probed
- by providing the physical start address of the memory block::
- % echo addr > /sys/devices/system/memory/probe
- Which results in a memory block for the range [addr, addr + memory_block_size)
- being created.
- .. note::
- Using the probe interface is discouraged as it is easy to crash the kernel,
- because Linux cannot validate user input; this interface might be removed in
- the future.
- Onlining and Offlining Memory Blocks
- ====================================
- After a memory block has been created, Linux has to be instructed to actually
- make use of that memory: the memory block has to be "online".
- Before a memory block can be removed, Linux has to stop using any memory part of
- the memory block: the memory block has to be "offlined".
- The Linux kernel can be configured to automatically online added memory blocks
- and drivers automatically trigger offlining of memory blocks when trying
- hotunplug of memory. Memory blocks can only be removed once offlining succeeded
- and drivers may trigger offlining of memory blocks when attempting hotunplug of
- memory.
- Onlining Memory Blocks Manually
- -------------------------------
- If auto-onlining of memory blocks isn't enabled, user-space has to manually
- trigger onlining of memory blocks. Often, udev rules are used to automate this
- task in user space.
- Onlining of a memory block can be triggered via::
- % echo online > /sys/devices/system/memory/memoryXXX/state
- Or alternatively::
- % echo 1 > /sys/devices/system/memory/memoryXXX/online
- The kernel will select the target zone automatically, depending on the
- configured ``online_policy``.
- One can explicitly request to associate an offline memory block with
- ZONE_MOVABLE by::
- % echo online_movable > /sys/devices/system/memory/memoryXXX/state
- Or one can explicitly request a kernel zone (usually ZONE_NORMAL) by::
- % echo online_kernel > /sys/devices/system/memory/memoryXXX/state
- In any case, if onlining succeeds, the state of the memory block is changed to
- be "online". If it fails, the state of the memory block will remain unchanged
- and the above commands will fail.
- Onlining Memory Blocks Automatically
- ------------------------------------
- The kernel can be configured to try auto-onlining of newly added memory blocks.
- If this feature is disabled, the memory blocks will stay offline until
- explicitly onlined from user space.
- The configured auto-online behavior can be observed via::
- % cat /sys/devices/system/memory/auto_online_blocks
- Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or
- ``online_movable`` to that file, like::
- % echo online > /sys/devices/system/memory/auto_online_blocks
- Similarly to manual onlining, with ``online`` the kernel will select the
- target zone automatically, depending on the configured ``online_policy``.
- Modifying the auto-online behavior will only affect all subsequently added
- memory blocks only.
- .. note::
- In corner cases, auto-onlining can fail. The kernel won't retry. Note that
- auto-onlining is not expected to fail in default configurations.
- .. note::
- DLPAR on ppc64 ignores the ``offline`` setting and will still online added
- memory blocks; if onlining fails, memory blocks are removed again.
- Offlining Memory Blocks
- -----------------------
- In the current implementation, Linux's memory offlining will try migrating all
- movable pages off the affected memory block. As most kernel allocations, such as
- page tables, are unmovable, page migration can fail and, therefore, inhibit
- memory offlining from succeeding.
- Having the memory provided by memory block managed by ZONE_MOVABLE significantly
- increases memory offlining reliability; still, memory offlining can fail in
- some corner cases.
- Further, memory offlining might retry for a long time (or even forever), until
- aborted by the user.
- Offlining of a memory block can be triggered via::
- % echo offline > /sys/devices/system/memory/memoryXXX/state
- Or alternatively::
- % echo 0 > /sys/devices/system/memory/memoryXXX/online
- If offlining succeeds, the state of the memory block is changed to be "offline".
- If it fails, the state of the memory block will remain unchanged and the above
- commands will fail, for example, via::
- bash: echo: write error: Device or resource busy
- or via::
- bash: echo: write error: Invalid argument
- Observing the State of Memory Blocks
- ------------------------------------
- The state (online/offline/going-offline) of a memory block can be observed
- either via::
- % cat /sys/devices/system/memory/memoryXXX/state
- Or alternatively (1/0) via::
- % cat /sys/devices/system/memory/memoryXXX/online
- For an online memory block, the managing zone can be observed via::
- % cat /sys/devices/system/memory/memoryXXX/valid_zones
- Configuring Memory Hot(Un)Plug
- ==============================
- There are various ways how system administrators can configure memory
- hot(un)plug and interact with memory blocks, especially, to online them.
- Memory Hot(Un)Plug Configuration via Sysfs
- ------------------------------------------
- Some memory hot(un)plug properties can be configured or inspected via sysfs in::
- /sys/devices/system/memory/
- The following files are currently defined:
- ====================== =========================================================
- ``auto_online_blocks`` read-write: set or get the default state of new memory
- blocks; configure auto-onlining.
- The default value depends on the
- CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel configuration
- option.
- See the ``state`` property of memory blocks for details.
- ``block_size_bytes`` read-only: the size in bytes of a memory block.
- ``probe`` write-only: add (probe) selected memory blocks manually
- from user space by supplying the physical start address.
- Availability depends on the CONFIG_ARCH_MEMORY_PROBE
- kernel configuration option.
- ``uevent`` read-write: generic udev file for device subsystems.
- ``crash_hotplug`` read-only: when changes to the system memory map
- occur due to hot un/plug of memory, this file contains
- '1' if the kernel updates the kdump capture kernel memory
- map itself (via elfcorehdr and other relevant kexec
- segments), or '0' if userspace must update the kdump
- capture kernel memory map.
- Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
- configuration option.
- ====================== =========================================================
- .. note::
- When the CONFIG_MEMORY_FAILURE kernel configuration option is enabled, two
- additional files ``hard_offline_page`` and ``soft_offline_page`` are available
- to trigger hwpoisoning of pages, for example, for testing purposes. Note that
- this functionality is not really related to memory hot(un)plug or actual
- offlining of memory blocks.
- Memory Block Configuration via Sysfs
- ------------------------------------
- Each memory block is represented as a memory block device that can be
- onlined or offlined. All memory blocks have their device information located in
- sysfs. Each present memory block is listed under
- ``/sys/devices/system/memory`` as::
- /sys/devices/system/memory/memoryXXX
- where XXX is the memory block id; the number of digits is variable.
- A present memory block indicates that some memory in the range is present;
- however, a memory block might span memory holes. A memory block spanning memory
- holes cannot be offlined.
- For example, assume 1 GiB memory block size. A device for a memory starting at
- 0x100000000 is ``/sys/devices/system/memory/memory4``::
- (0x100000000 / 1Gib = 4)
- This device covers address range [0x100000000 ... 0x140000000)
- The following files are currently defined:
- =================== ============================================================
- ``online`` read-write: simplified interface to trigger onlining /
- offlining and to observe the state of a memory block.
- When onlining, the zone is selected automatically.
- ``phys_device`` read-only: legacy interface only ever used on s390x to
- expose the covered storage increment.
- ``phys_index`` read-only: the memory block id (XXX).
- ``removable`` read-only: legacy interface that indicated whether a memory
- block was likely to be offlineable or not. Nowadays, the
- kernel return ``1`` if and only if it supports memory
- offlining.
- ``state`` read-write: advanced interface to trigger onlining /
- offlining and to observe the state of a memory block.
- When writing, ``online``, ``offline``, ``online_kernel`` and
- ``online_movable`` are supported.
- ``online_movable`` specifies onlining to ZONE_MOVABLE.
- ``online_kernel`` specifies onlining to the default kernel
- zone for the memory block, such as ZONE_NORMAL.
- ``online`` let's the kernel select the zone automatically.
- When reading, ``online``, ``offline`` and ``going-offline``
- may be returned.
- ``uevent`` read-write: generic uevent file for devices.
- ``valid_zones`` read-only: when a block is online, shows the zone it
- belongs to; when a block is offline, shows what zone will
- manage it when the block will be onlined.
- For online memory blocks, ``DMA``, ``DMA32``, ``Normal``,
- ``Movable`` and ``none`` may be returned. ``none`` indicates
- that memory provided by a memory block is managed by
- multiple zones or spans multiple nodes; such memory blocks
- cannot be offlined. ``Movable`` indicates ZONE_MOVABLE.
- Other values indicate a kernel zone.
- For offline memory blocks, the first column shows the
- zone the kernel would select when onlining the memory block
- right now without further specifying a zone.
- Availability depends on the CONFIG_MEMORY_HOTREMOVE
- kernel configuration option.
- =================== ============================================================
- .. note::
- If the CONFIG_NUMA kernel configuration option is enabled, the memoryXXX/
- directories can also be accessed via symbolic links located in the
- ``/sys/devices/system/node/node*`` directories.
- For example::
- /sys/devices/system/node/node0/memory9 -> ../../memory/memory9
- A backlink will also be created::
- /sys/devices/system/memory/memory9/node0 -> ../../node/node0
- Command Line Parameters
- -----------------------
- Some command line parameters affect memory hot(un)plug handling. The following
- command line parameters are relevant:
- ======================== =======================================================
- ``memhp_default_state`` configure auto-onlining by essentially setting
- ``/sys/devices/system/memory/auto_online_blocks``.
- ``movable_node`` configure automatic zone selection in the kernel when
- using the ``contig-zones`` online policy. When
- set, the kernel will default to ZONE_MOVABLE when
- onlining a memory block, unless other zones can be kept
- contiguous.
- ======================== =======================================================
- See Documentation/admin-guide/kernel-parameters.txt for a more generic
- description of these command line parameters.
- Module Parameters
- ------------------
- Instead of additional command line parameters or sysfs files, the
- ``memory_hotplug`` subsystem now provides a dedicated namespace for module
- parameters. Module parameters can be set via the command line by predicating
- them with ``memory_hotplug.`` such as::
- memory_hotplug.memmap_on_memory=1
- and they can be observed (and some even modified at runtime) via::
- /sys/module/memory_hotplug/parameters/
- The following module parameters are currently defined:
- ================================ ===============================================
- ``memmap_on_memory`` read-write: Allocate memory for the memmap from
- the added memory block itself. Even if enabled,
- actual support depends on various other system
- properties and should only be regarded as a
- hint whether the behavior would be desired.
- While allocating the memmap from the memory
- block itself makes memory hotplug less likely
- to fail and keeps the memmap on the same NUMA
- node in any case, it can fragment physical
- memory in a way that huge pages in bigger
- granularity cannot be formed on hotplugged
- memory.
- With value "force" it could result in memory
- wastage due to memmap size limitations. For
- example, if the memmap for a memory block
- requires 1 MiB, but the pageblock size is 2
- MiB, 1 MiB of hotplugged memory will be wasted.
- Note that there are still cases where the
- feature cannot be enforced: for example, if the
- memmap is smaller than a single page, or if the
- architecture does not support the forced mode
- in all configurations.
- ``online_policy`` read-write: Set the basic policy used for
- automatic zone selection when onlining memory
- blocks without specifying a target zone.
- ``contig-zones`` has been the kernel default
- before this parameter was added. After an
- online policy was configured and memory was
- online, the policy should not be changed
- anymore.
- When set to ``contig-zones``, the kernel will
- try keeping zones contiguous. If a memory block
- intersects multiple zones or no zone, the
- behavior depends on the ``movable_node`` kernel
- command line parameter: default to ZONE_MOVABLE
- if set, default to the applicable kernel zone
- (usually ZONE_NORMAL) if not set.
- When set to ``auto-movable``, the kernel will
- try onlining memory blocks to ZONE_MOVABLE if
- possible according to the configuration and
- memory device details. With this policy, one
- can avoid zone imbalances when eventually
- hotplugging a lot of memory later and still
- wanting to be able to hotunplug as much as
- possible reliably, very desirable in
- virtualized environments. This policy ignores
- the ``movable_node`` kernel command line
- parameter and isn't really applicable in
- environments that require it (e.g., bare metal
- with hotunpluggable nodes) where hotplugged
- memory might be exposed via the
- firmware-provided memory map early during boot
- to the system instead of getting detected,
- added and onlined later during boot (such as
- done by virtio-mem or by some hypervisors
- implementing emulated DIMMs). As one example, a
- hotplugged DIMM will be onlined either
- completely to ZONE_MOVABLE or completely to
- ZONE_NORMAL, not a mixture.
- As another example, as many memory blocks
- belonging to a virtio-mem device will be
- onlined to ZONE_MOVABLE as possible,
- special-casing units of memory blocks that can
- only get hotunplugged together. *This policy
- does not protect from setups that are
- problematic with ZONE_MOVABLE and does not
- change the zone of memory blocks dynamically
- after they were onlined.*
- ``auto_movable_ratio`` read-write: Set the maximum MOVABLE:KERNEL
- memory ratio in % for the ``auto-movable``
- online policy. Whether the ratio applies only
- for the system across all NUMA nodes or also
- per NUMA nodes depends on the
- ``auto_movable_numa_aware`` configuration.
- All accounting is based on present memory pages
- in the zones combined with accounting per
- memory device. Memory dedicated to the CMA
- allocator is accounted as MOVABLE, although
- residing on one of the kernel zones. The
- possible ratio depends on the actual workload.
- The kernel default is "301" %, for example,
- allowing for hotplugging 24 GiB to a 8 GiB VM
- and automatically onlining all hotplugged
- memory to ZONE_MOVABLE in many setups. The
- additional 1% deals with some pages being not
- present, for example, because of some firmware
- allocations.
- Note that ZONE_NORMAL memory provided by one
- memory device does not allow for more
- ZONE_MOVABLE memory for a different memory
- device. As one example, onlining memory of a
- hotplugged DIMM to ZONE_NORMAL will not allow
- for another hotplugged DIMM to get onlined to
- ZONE_MOVABLE automatically. In contrast, memory
- hotplugged by a virtio-mem device that got
- onlined to ZONE_NORMAL will allow for more
- ZONE_MOVABLE memory within *the same*
- virtio-mem device.
- ``auto_movable_numa_aware`` read-write: Configure whether the
- ``auto_movable_ratio`` in the ``auto-movable``
- online policy also applies per NUMA
- node in addition to the whole system across all
- NUMA nodes. The kernel default is "Y".
- Disabling NUMA awareness can be helpful when
- dealing with NUMA nodes that should be
- completely hotunpluggable, onlining the memory
- completely to ZONE_MOVABLE automatically if
- possible.
- Parameter availability depends on CONFIG_NUMA.
- ================================ ===============================================
- ZONE_MOVABLE
- ============
- ZONE_MOVABLE is an important mechanism for more reliable memory offlining.
- Further, having system RAM managed by ZONE_MOVABLE instead of one of the
- kernel zones can increase the number of possible transparent huge pages and
- dynamically allocated huge pages.
- Most kernel allocations are unmovable. Important examples include the memory
- map (usually 1/64ths of memory), page tables, and kmalloc(). Such allocations
- can only be served from the kernel zones.
- Most user space pages, such as anonymous memory, and page cache pages are
- movable. Such allocations can be served from ZONE_MOVABLE and the kernel zones.
- Only movable allocations are served from ZONE_MOVABLE, resulting in unmovable
- allocations being limited to the kernel zones. Without ZONE_MOVABLE, there is
- absolutely no guarantee whether a memory block can be offlined successfully.
- Zone Imbalances
- ---------------
- Having too much system RAM managed by ZONE_MOVABLE is called a zone imbalance,
- which can harm the system or degrade performance. As one example, the kernel
- might crash because it runs out of free memory for unmovable allocations,
- although there is still plenty of free memory left in ZONE_MOVABLE.
- Usually, MOVABLE:KERNEL ratios of up to 3:1 or even 4:1 are fine. Ratios of 63:1
- are definitely impossible due to the overhead for the memory map.
- Actual safe zone ratios depend on the workload. Extreme cases, like excessive
- long-term pinning of pages, might not be able to deal with ZONE_MOVABLE at all.
- .. note::
- CMA memory part of a kernel zone essentially behaves like memory in
- ZONE_MOVABLE and similar considerations apply, especially when combining
- CMA with ZONE_MOVABLE.
- ZONE_MOVABLE Sizing Considerations
- ----------------------------------
- We usually expect that a large portion of available system RAM will actually
- be consumed by user space, either directly or indirectly via the page cache. In
- the normal case, ZONE_MOVABLE can be used when allocating such pages just fine.
- With that in mind, it makes sense that we can have a big portion of system RAM
- managed by ZONE_MOVABLE. However, there are some things to consider when using
- ZONE_MOVABLE, especially when fine-tuning zone ratios:
- - Having a lot of offline memory blocks. Even offline memory blocks consume
- memory for metadata and page tables in the direct map; having a lot of offline
- memory blocks is not a typical case, though.
- - Memory ballooning without balloon compaction is incompatible with
- ZONE_MOVABLE. Only some implementations, such as virtio-balloon and
- pseries CMM, fully support balloon compaction.
- Further, the CONFIG_BALLOON_COMPACTION kernel configuration option might be
- disabled. In that case, balloon inflation will only perform unmovable
- allocations and silently create a zone imbalance, usually triggered by
- inflation requests from the hypervisor.
- - Gigantic pages are unmovable, resulting in user space consuming a
- lot of unmovable memory.
- - Huge pages are unmovable when an architectures does not support huge
- page migration, resulting in a similar issue as with gigantic pages.
- - Page tables are unmovable. Excessive swapping, mapping extremely large
- files or ZONE_DEVICE memory can be problematic, although only really relevant
- in corner cases. When we manage a lot of user space memory that has been
- swapped out or is served from a file/persistent memory/... we still need a lot
- of page tables to manage that memory once user space accessed that memory.
- - In certain DAX configurations the memory map for the device memory will be
- allocated from the kernel zones.
- - KASAN can have a significant memory overhead, for example, consuming 1/8th of
- the total system memory size as (unmovable) tracking metadata.
- - Long-term pinning of pages. Techniques that rely on long-term pinnings
- (especially, RDMA and vfio/mdev) are fundamentally problematic with
- ZONE_MOVABLE, and therefore, memory offlining. Pinned pages cannot reside
- on ZONE_MOVABLE as that would turn these pages unmovable. Therefore, they
- have to be migrated off that zone while pinning. Pinning a page can fail
- even if there is plenty of free memory in ZONE_MOVABLE.
- In addition, using ZONE_MOVABLE might make page pinning more expensive,
- because of the page migration overhead.
- By default, all the memory configured at boot time is managed by the kernel
- zones and ZONE_MOVABLE is not used.
- To enable ZONE_MOVABLE to include the memory present at boot and to control the
- ratio between movable and kernel zones there are two command line options:
- ``kernelcore=`` and ``movablecore=``. See
- Documentation/admin-guide/kernel-parameters.rst for their description.
- Memory Offlining and ZONE_MOVABLE
- ---------------------------------
- Even with ZONE_MOVABLE, there are some corner cases where offlining a memory
- block might fail:
- - Memory blocks with memory holes; this applies to memory blocks present during
- boot and can apply to memory blocks hotplugged via the XEN balloon and the
- Hyper-V balloon.
- - Mixed NUMA nodes and mixed zones within a single memory block prevent memory
- offlining; this applies to memory blocks present during boot only.
- - Special memory blocks prevented by the system from getting offlined. Examples
- include any memory available during boot on arm64 or memory blocks spanning
- the crashkernel area on s390x; this usually applies to memory blocks present
- during boot only.
- - Memory blocks overlapping with CMA areas cannot be offlined, this applies to
- memory blocks present during boot only.
- - Concurrent activity that operates on the same physical memory area, such as
- allocating gigantic pages, can result in temporary offlining failures.
- - Out of memory when dissolving huge pages, especially when HugeTLB Vmemmap
- Optimization (HVO) is enabled.
- Offlining code may be able to migrate huge page contents, but may not be able
- to dissolve the source huge page because it fails allocating (unmovable) pages
- for the vmemmap, because the system might not have free memory in the kernel
- zones left.
- Users that depend on memory offlining to succeed for movable zones should
- carefully consider whether the memory savings gained from this feature are
- worth the risk of possibly not being able to offline memory in certain
- situations.
- Further, when running into out of memory situations while migrating pages, or
- when still encountering permanently unmovable pages within ZONE_MOVABLE
- (-> BUG), memory offlining will keep retrying until it eventually succeeds.
- When offlining is triggered from user space, the offlining context can be
- terminated by sending a signal. A timeout based offlining can easily be
- implemented via::
- % timeout $TIMEOUT offline_block | failure_handling
|