kfence.rst 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340
  1. .. SPDX-License-Identifier: GPL-2.0
  2. .. Copyright (C) 2020, Google LLC.
  3. Kernel Electric-Fence (KFENCE)
  4. ==============================
  5. Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
  6. error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
  7. invalid-free errors.
  8. KFENCE is designed to be enabled in production kernels, and has near zero
  9. performance overhead. Compared to KASAN, KFENCE trades performance for
  10. precision. The main motivation behind KFENCE's design, is that with enough
  11. total uptime KFENCE will detect bugs in code paths not typically exercised by
  12. non-production test workloads. One way to quickly achieve a large enough total
  13. uptime is when the tool is deployed across a large fleet of machines.
  14. Usage
  15. -----
  16. To enable KFENCE, configure the kernel with::
  17. CONFIG_KFENCE=y
  18. To build a kernel with KFENCE support, but disabled by default (to enable, set
  19. ``kfence.sample_interval`` to non-zero value), configure the kernel with::
  20. CONFIG_KFENCE=y
  21. CONFIG_KFENCE_SAMPLE_INTERVAL=0
  22. KFENCE provides several other configuration options to customize behaviour (see
  23. the respective help text in ``lib/Kconfig.kfence`` for more info).
  24. Tuning performance
  25. ~~~~~~~~~~~~~~~~~~
  26. The most important parameter is KFENCE's sample interval, which can be set via
  27. the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
  28. sample interval determines the frequency with which heap allocations will be
  29. guarded by KFENCE. The default is configurable via the Kconfig option
  30. ``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
  31. disables KFENCE.
  32. The sample interval controls a timer that sets up KFENCE allocations. By
  33. default, to keep the real sample interval predictable, the normal timer also
  34. causes CPU wake-ups when the system is completely idle. This may be undesirable
  35. on power-constrained systems. The boot parameter ``kfence.deferrable=1``
  36. instead switches to a "deferrable" timer which does not force CPU wake-ups on
  37. idle systems, at the risk of unpredictable sample intervals. The default is
  38. configurable via the Kconfig option ``CONFIG_KFENCE_DEFERRABLE``.
  39. .. warning::
  40. The KUnit test suite is very likely to fail when using a deferrable timer
  41. since it currently causes very unpredictable sample intervals.
  42. By default KFENCE will only sample 1 heap allocation within each sample
  43. interval. *Burst mode* allows to sample successive heap allocations, where the
  44. kernel boot parameter ``kfence.burst`` can be set to a non-zero value which
  45. denotes the *additional* successive allocations within a sample interval;
  46. setting ``kfence.burst=N`` means that ``1 + N`` successive allocations are
  47. attempted through KFENCE for each sample interval.
  48. The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
  49. further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
  50. 255), the number of available guarded objects can be controlled. Each object
  51. requires 2 pages, one for the object itself and the other one used as a guard
  52. page; object pages are interleaved with guard pages, and every object page is
  53. therefore surrounded by two guard pages.
  54. The total memory dedicated to the KFENCE memory pool can be computed as::
  55. ( #objects + 1 ) * 2 * PAGE_SIZE
  56. Using the default config, and assuming a page size of 4 KiB, results in
  57. dedicating 2 MiB to the KFENCE memory pool.
  58. Note: On architectures that support huge pages, KFENCE will ensure that the
  59. pool is using pages of size ``PAGE_SIZE``. This will result in additional page
  60. tables being allocated.
  61. Error reports
  62. ~~~~~~~~~~~~~
  63. A typical out-of-bounds access looks like this::
  64. ==================================================================
  65. BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234
  66. Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72):
  67. test_out_of_bounds_read+0xa6/0x234
  68. kunit_try_run_case+0x61/0xa0
  69. kunit_generic_run_threadfn_adapter+0x16/0x30
  70. kthread+0x176/0x1b0
  71. ret_from_fork+0x22/0x30
  72. kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32
  73. allocated by task 484 on cpu 0 at 32.919330s:
  74. test_alloc+0xfe/0x738
  75. test_out_of_bounds_read+0x9b/0x234
  76. kunit_try_run_case+0x61/0xa0
  77. kunit_generic_run_threadfn_adapter+0x16/0x30
  78. kthread+0x176/0x1b0
  79. ret_from_fork+0x22/0x30
  80. CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
  81. Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
  82. ==================================================================
  83. The header of the report provides a short summary of the function involved in
  84. the access. It is followed by more detailed information about the access and
  85. its origin. Note that, real kernel addresses are only shown when using the
  86. kernel command line option ``no_hash_pointers``.
  87. Use-after-free accesses are reported as::
  88. ==================================================================
  89. BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
  90. Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79):
  91. test_use_after_free_read+0xb3/0x143
  92. kunit_try_run_case+0x61/0xa0
  93. kunit_generic_run_threadfn_adapter+0x16/0x30
  94. kthread+0x176/0x1b0
  95. ret_from_fork+0x22/0x30
  96. kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32
  97. allocated by task 488 on cpu 2 at 33.871326s:
  98. test_alloc+0xfe/0x738
  99. test_use_after_free_read+0x76/0x143
  100. kunit_try_run_case+0x61/0xa0
  101. kunit_generic_run_threadfn_adapter+0x16/0x30
  102. kthread+0x176/0x1b0
  103. ret_from_fork+0x22/0x30
  104. freed by task 488 on cpu 2 at 33.871358s:
  105. test_use_after_free_read+0xa8/0x143
  106. kunit_try_run_case+0x61/0xa0
  107. kunit_generic_run_threadfn_adapter+0x16/0x30
  108. kthread+0x176/0x1b0
  109. ret_from_fork+0x22/0x30
  110. CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7
  111. Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
  112. ==================================================================
  113. KFENCE also reports on invalid frees, such as double-frees::
  114. ==================================================================
  115. BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
  116. Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81):
  117. test_double_free+0xdc/0x171
  118. kunit_try_run_case+0x61/0xa0
  119. kunit_generic_run_threadfn_adapter+0x16/0x30
  120. kthread+0x176/0x1b0
  121. ret_from_fork+0x22/0x30
  122. kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32
  123. allocated by task 490 on cpu 1 at 34.175321s:
  124. test_alloc+0xfe/0x738
  125. test_double_free+0x76/0x171
  126. kunit_try_run_case+0x61/0xa0
  127. kunit_generic_run_threadfn_adapter+0x16/0x30
  128. kthread+0x176/0x1b0
  129. ret_from_fork+0x22/0x30
  130. freed by task 490 on cpu 1 at 34.175348s:
  131. test_double_free+0xa8/0x171
  132. kunit_try_run_case+0x61/0xa0
  133. kunit_generic_run_threadfn_adapter+0x16/0x30
  134. kthread+0x176/0x1b0
  135. ret_from_fork+0x22/0x30
  136. CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7
  137. Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
  138. ==================================================================
  139. KFENCE also uses pattern-based redzones on the other side of an object's guard
  140. page, to detect out-of-bounds writes on the unprotected side of the object.
  141. These are reported on frees::
  142. ==================================================================
  143. BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
  144. Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156):
  145. test_kmalloc_aligned_oob_write+0xef/0x184
  146. kunit_try_run_case+0x61/0xa0
  147. kunit_generic_run_threadfn_adapter+0x16/0x30
  148. kthread+0x176/0x1b0
  149. ret_from_fork+0x22/0x30
  150. kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96
  151. allocated by task 502 on cpu 7 at 42.159302s:
  152. test_alloc+0xfe/0x738
  153. test_kmalloc_aligned_oob_write+0x57/0x184
  154. kunit_try_run_case+0x61/0xa0
  155. kunit_generic_run_threadfn_adapter+0x16/0x30
  156. kthread+0x176/0x1b0
  157. ret_from_fork+0x22/0x30
  158. CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7
  159. Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
  160. ==================================================================
  161. For such errors, the address where the corruption occurred as well as the
  162. invalidly written bytes (offset from the address) are shown; in this
  163. representation, '.' denote untouched bytes. In the example above ``0xac`` is
  164. the value written to the invalid address at offset 0, and the remaining '.'
  165. denote that no following bytes have been touched. Note that, real values are
  166. only shown if the kernel was booted with ``no_hash_pointers``; to avoid
  167. information disclosure otherwise, '!' is used instead to denote invalidly
  168. written bytes.
  169. And finally, KFENCE may also report on invalid accesses to any protected page
  170. where it was not possible to determine an associated object, e.g. if adjacent
  171. object pages had not yet been allocated::
  172. ==================================================================
  173. BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
  174. Invalid read at 0xffffffffb670b00a:
  175. test_invalid_access+0x26/0xe0
  176. kunit_try_run_case+0x51/0x85
  177. kunit_generic_run_threadfn_adapter+0x16/0x30
  178. kthread+0x137/0x160
  179. ret_from_fork+0x22/0x30
  180. CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
  181. Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
  182. ==================================================================
  183. DebugFS interface
  184. ~~~~~~~~~~~~~~~~~
  185. Some debugging information is exposed via debugfs:
  186. * The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
  187. * The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
  188. allocated via KFENCE, including those already freed but protected.
  189. Implementation Details
  190. ----------------------
  191. Guarded allocations are set up based on the sample interval. After expiration
  192. of the sample interval, the next allocation through the main allocator (SLAB or
  193. SLUB) returns a guarded allocation from the KFENCE object pool (allocation
  194. sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
  195. the next allocation is set up after the expiration of the interval.
  196. When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated"
  197. through the main allocator's fast-path by relying on static branches via the
  198. static keys infrastructure. The static branch is toggled to redirect the
  199. allocation to KFENCE. Depending on sample interval, target workloads, and
  200. system architecture, this may perform better than the simple dynamic branch.
  201. Careful benchmarking is recommended.
  202. KFENCE objects each reside on a dedicated page, at either the left or right
  203. page boundaries selected at random. The pages to the left and right of the
  204. object page are "guard pages", whose attributes are changed to a protected
  205. state, and cause page faults on any attempted access. Such page faults are then
  206. intercepted by KFENCE, which handles the fault gracefully by reporting an
  207. out-of-bounds access, and marking the page as accessible so that the faulting
  208. code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead).
  209. To detect out-of-bounds writes to memory within the object's page itself,
  210. KFENCE also uses pattern-based redzones. For each object page, a redzone is set
  211. up for all non-object memory. For typical alignments, the redzone is only
  212. required on the unguarded side of an object. Because KFENCE must honor the
  213. cache's requested alignment, special alignments may result in unprotected gaps
  214. on either side of an object, all of which are redzoned.
  215. The following figure illustrates the page layout::
  216. ---+-----------+-----------+-----------+-----------+-----------+---
  217. | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
  218. | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
  219. | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
  220. | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
  221. | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
  222. | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
  223. ---+-----------+-----------+-----------+-----------+-----------+---
  224. Upon deallocation of a KFENCE object, the object's page is again protected and
  225. the object is marked as freed. Any further access to the object causes a fault
  226. and KFENCE reports a use-after-free access. Freed objects are inserted at the
  227. tail of KFENCE's freelist, so that the least recently freed objects are reused
  228. first, and the chances of detecting use-after-frees of recently freed objects
  229. is increased.
  230. If pool utilization reaches 75% (default) or above, to reduce the risk of the
  231. pool eventually being fully occupied by allocated objects yet ensure diverse
  232. coverage of allocations, KFENCE limits currently covered allocations of the
  233. same source from further filling up the pool. The "source" of an allocation is
  234. based on its partial allocation stack trace. A side-effect is that this also
  235. limits frequent long-lived allocations (e.g. pagecache) of the same source
  236. filling up the pool permanently, which is the most common risk for the pool
  237. becoming full and the sampled allocation rate dropping to zero. The threshold
  238. at which to start limiting currently covered allocations can be configured via
  239. the boot parameter ``kfence.skip_covered_thresh`` (pool usage%).
  240. Interface
  241. ---------
  242. The following describes the functions which are used by allocators as well as
  243. page handling code to set up and deal with KFENCE allocations.
  244. .. kernel-doc:: include/linux/kfence.h
  245. :functions: is_kfence_address
  246. kfence_shutdown_cache
  247. kfence_alloc kfence_free __kfence_free
  248. kfence_ksize kfence_object_start
  249. kfence_handle_page_fault
  250. Related Tools
  251. -------------
  252. In userspace, a similar approach is taken by `GWP-ASan
  253. <http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
  254. a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
  255. directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
  256. similar but non-sampling approach, that also inspired the name "KFENCE", can be
  257. found in the userspace `Electric Fence Malloc Debugger
  258. <https://linux.die.net/man/3/efence>`_.
  259. In the kernel, several tools exist to debug memory access errors, and in
  260. particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
  261. is more precise, relying on compiler instrumentation, this comes at a
  262. performance cost.
  263. It is worth highlighting that KASAN and KFENCE are complementary, with
  264. different target environments. For instance, KASAN is the better debugging-aid,
  265. where test cases or reproducers exists: due to the lower chance to detect the
  266. error, it would require more effort using KFENCE to debug. Deployments at scale
  267. that cannot afford to enable KASAN, however, would benefit from using KFENCE to
  268. discover bugs due to code paths not exercised by test cases or fuzzers.