pagemap.rst 7.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204
  1. .. _pagemap:
  2. =============================
  3. Examining Process Page Tables
  4. =============================
  5. pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
  6. userspace programs to examine the page tables and related information by
  7. reading files in ``/proc``.
  8. There are four components to pagemap:
  9. * ``/proc/pid/pagemap``. This file lets a userspace process find out which
  10. physical frame each virtual page is mapped to. It contains one 64-bit
  11. value for each virtual page, containing the following data (from
  12. ``fs/proc/task_mmu.c``, above pagemap_read):
  13. * Bits 0-54 page frame number (PFN) if present
  14. * Bits 0-4 swap type if swapped
  15. * Bits 5-54 swap offset if swapped
  16. * Bit 55 pte is soft-dirty (see
  17. :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
  18. * Bit 56 page exclusively mapped (since 4.2)
  19. * Bits 57-60 zero
  20. * Bit 61 page is file-page or shared-anon (since 3.5)
  21. * Bit 62 page swapped
  22. * Bit 63 page present
  23. Since Linux 4.0 only users with the CAP_SYS_ADMIN capability can get PFNs.
  24. In 4.0 and 4.1 opens by unprivileged fail with -EPERM. Starting from
  25. 4.2 the PFN field is zeroed if the user does not have CAP_SYS_ADMIN.
  26. Reason: information about PFNs helps in exploiting Rowhammer vulnerability.
  27. If the page is not present but in swap, then the PFN contains an
  28. encoding of the swap file number and the page's offset into the
  29. swap. Unmapped pages return a null PFN. This allows determining
  30. precisely which pages are mapped (or in swap) and comparing mapped
  31. pages between processes.
  32. Efficient users of this interface will use ``/proc/pid/maps`` to
  33. determine which areas of memory are actually mapped and llseek to
  34. skip over unmapped regions.
  35. * ``/proc/kpagecount``. This file contains a 64-bit count of the number of
  36. times each page is mapped, indexed by PFN.
  37. The page-types tool in the tools/vm directory can be used to query the
  38. number of times a page is mapped.
  39. * ``/proc/kpageflags``. This file contains a 64-bit set of flags for each
  40. page, indexed by PFN.
  41. The flags are (from ``fs/proc/page.c``, above kpageflags_read):
  42. 0. LOCKED
  43. 1. ERROR
  44. 2. REFERENCED
  45. 3. UPTODATE
  46. 4. DIRTY
  47. 5. LRU
  48. 6. ACTIVE
  49. 7. SLAB
  50. 8. WRITEBACK
  51. 9. RECLAIM
  52. 10. BUDDY
  53. 11. MMAP
  54. 12. ANON
  55. 13. SWAPCACHE
  56. 14. SWAPBACKED
  57. 15. COMPOUND_HEAD
  58. 16. COMPOUND_TAIL
  59. 17. HUGE
  60. 18. UNEVICTABLE
  61. 19. HWPOISON
  62. 20. NOPAGE
  63. 21. KSM
  64. 22. THP
  65. 23. BALLOON
  66. 24. ZERO_PAGE
  67. 25. IDLE
  68. * ``/proc/kpagecgroup``. This file contains a 64-bit inode number of the
  69. memory cgroup each page is charged to, indexed by PFN. Only available when
  70. CONFIG_MEMCG is set.
  71. Short descriptions to the page flags
  72. ====================================
  73. 0 - LOCKED
  74. page is being locked for exclusive access, e.g. by undergoing read/write IO
  75. 7 - SLAB
  76. page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
  77. When compound page is used, SLUB/SLQB will only set this flag on the head
  78. page; SLOB will not flag it at all.
  79. 10 - BUDDY
  80. a free memory block managed by the buddy system allocator
  81. The buddy system organizes free memory in blocks of various orders.
  82. An order N block has 2^N physically contiguous pages, with the BUDDY flag
  83. set for and _only_ for the first page.
  84. 15 - COMPOUND_HEAD
  85. A compound page with order N consists of 2^N physically contiguous pages.
  86. A compound page with order 2 takes the form of "HTTT", where H donates its
  87. head page and T donates its tail page(s). The major consumers of compound
  88. pages are hugeTLB pages
  89. (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
  90. the SLUB etc. memory allocators and various device drivers.
  91. However in this interface, only huge/giga pages are made visible
  92. to end users.
  93. 16 - COMPOUND_TAIL
  94. A compound page tail (see description above).
  95. 17 - HUGE
  96. this is an integral part of a HugeTLB page
  97. 19 - HWPOISON
  98. hardware detected memory corruption on this page: don't touch the data!
  99. 20 - NOPAGE
  100. no page frame exists at the requested address
  101. 21 - KSM
  102. identical memory pages dynamically shared between one or more processes
  103. 22 - THP
  104. contiguous pages which construct transparent hugepages
  105. 23 - BALLOON
  106. balloon compaction page
  107. 24 - ZERO_PAGE
  108. zero page for pfn_zero or huge_zero page
  109. 25 - IDLE
  110. page has not been accessed since it was marked idle (see
  111. :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
  112. Note that this flag may be stale in case the page was accessed via
  113. a PTE. To make sure the flag is up-to-date one has to read
  114. ``/sys/kernel/mm/page_idle/bitmap`` first.
  115. IO related page flags
  116. ---------------------
  117. 1 - ERROR
  118. IO error occurred
  119. 3 - UPTODATE
  120. page has up-to-date data
  121. ie. for file backed page: (in-memory data revision >= on-disk one)
  122. 4 - DIRTY
  123. page has been written to, hence contains new data
  124. i.e. for file backed page: (in-memory data revision > on-disk one)
  125. 8 - WRITEBACK
  126. page is being synced to disk
  127. LRU related page flags
  128. ----------------------
  129. 5 - LRU
  130. page is in one of the LRU lists
  131. 6 - ACTIVE
  132. page is in the active LRU list
  133. 18 - UNEVICTABLE
  134. page is in the unevictable (non-)LRU list It is somehow pinned and
  135. not a candidate for LRU page reclaims, e.g. ramfs pages,
  136. shmctl(SHM_LOCK) and mlock() memory segments
  137. 2 - REFERENCED
  138. page has been referenced since last LRU list enqueue/requeue
  139. 9 - RECLAIM
  140. page will be reclaimed soon after its pageout IO completed
  141. 11 - MMAP
  142. a memory mapped page
  143. 12 - ANON
  144. a memory mapped page that is not part of a file
  145. 13 - SWAPCACHE
  146. page is mapped to swap space, i.e. has an associated swap entry
  147. 14 - SWAPBACKED
  148. page is backed by swap/RAM
  149. The page-types tool in the tools/vm directory can be used to query the
  150. above flags.
  151. Using pagemap to do something useful
  152. ====================================
  153. The general procedure for using pagemap to find out about a process' memory
  154. usage goes like this:
  155. 1. Read ``/proc/pid/maps`` to determine which parts of the memory space are
  156. mapped to what.
  157. 2. Select the maps you are interested in -- all of them, or a particular
  158. library, or the stack or the heap, etc.
  159. 3. Open ``/proc/pid/pagemap`` and seek to the pages you would like to examine.
  160. 4. Read a u64 for each page from pagemap.
  161. 5. Open ``/proc/kpagecount`` and/or ``/proc/kpageflags``. For each PFN you
  162. just read, seek to that entry in the file, and read the data you want.
  163. For example, to find the "unique set size" (USS), which is the amount of
  164. memory that a process is using that is not shared with any other process,
  165. you can go through every map in the process, find the PFNs, look those up
  166. in kpagecount, and tally up the number of pages that are only referenced
  167. once.
  168. Other notes
  169. ===========
  170. Reading from any of the files will return -EINVAL if you are not starting
  171. the read on an 8-byte boundary (e.g., if you sought an odd number of bytes
  172. into the file), or if the size of the read is not a multiple of 8 bytes.
  173. Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
  174. always 12 at most architectures). Since Linux 3.11 their meaning changes
  175. after first clear of soft-dirty bits. Since Linux 4.2 they are used for
  176. flags unconditionally.