map_hash.rst 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259
  1. .. SPDX-License-Identifier: GPL-2.0-only
  2. .. Copyright (C) 2022 Red Hat, Inc.
  3. .. Copyright (C) 2022-2023 Isovalent, Inc.
  4. ===============================================
  5. BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants
  6. ===============================================
  7. .. note::
  8. - ``BPF_MAP_TYPE_HASH`` was introduced in kernel version 3.19
  9. - ``BPF_MAP_TYPE_PERCPU_HASH`` was introduced in version 4.6
  10. - Both ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
  11. were introduced in version 4.10
  12. ``BPF_MAP_TYPE_HASH`` and ``BPF_MAP_TYPE_PERCPU_HASH`` provide general
  13. purpose hash map storage. Both the key and the value can be structs,
  14. allowing for composite keys and values.
  15. The kernel is responsible for allocating and freeing key/value pairs, up
  16. to the max_entries limit that you specify. Hash maps use pre-allocation
  17. of hash table elements by default. The ``BPF_F_NO_PREALLOC`` flag can be
  18. used to disable pre-allocation when it is too memory expensive.
  19. ``BPF_MAP_TYPE_PERCPU_HASH`` provides a separate value slot per
  20. CPU. The per-cpu values are stored internally in an array.
  21. The ``BPF_MAP_TYPE_LRU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
  22. variants add LRU semantics to their respective hash tables. An LRU hash
  23. will automatically evict the least recently used entries when the hash
  24. table reaches capacity. An LRU hash maintains an internal LRU list that
  25. is used to select elements for eviction. This internal LRU list is
  26. shared across CPUs but it is possible to request a per CPU LRU list with
  27. the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. The
  28. following table outlines the properties of LRU maps depending on the a
  29. map type and the flags used to create the map.
  30. ======================== ========================= ================================
  31. Flag ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
  32. ======================== ========================= ================================
  33. **BPF_F_NO_COMMON_LRU** Per-CPU LRU, global map Per-CPU LRU, per-cpu map
  34. **!BPF_F_NO_COMMON_LRU** Global LRU, global map Global LRU, per-cpu map
  35. ======================== ========================= ================================
  36. Usage
  37. =====
  38. Kernel BPF
  39. ----------
  40. bpf_map_update_elem()
  41. ~~~~~~~~~~~~~~~~~~~~~
  42. .. code-block:: c
  43. long bpf_map_update_elem(struct bpf_map *map, const void *key, const void *value, u64 flags)
  44. Hash entries can be added or updated using the ``bpf_map_update_elem()``
  45. helper. This helper replaces existing elements atomically. The ``flags``
  46. parameter can be used to control the update behaviour:
  47. - ``BPF_ANY`` will create a new element or update an existing element
  48. - ``BPF_NOEXIST`` will create a new element only if one did not already
  49. exist
  50. - ``BPF_EXIST`` will update an existing element
  51. ``bpf_map_update_elem()`` returns 0 on success, or negative error in
  52. case of failure.
  53. bpf_map_lookup_elem()
  54. ~~~~~~~~~~~~~~~~~~~~~
  55. .. code-block:: c
  56. void *bpf_map_lookup_elem(struct bpf_map *map, const void *key)
  57. Hash entries can be retrieved using the ``bpf_map_lookup_elem()``
  58. helper. This helper returns a pointer to the value associated with
  59. ``key``, or ``NULL`` if no entry was found.
  60. bpf_map_delete_elem()
  61. ~~~~~~~~~~~~~~~~~~~~~
  62. .. code-block:: c
  63. long bpf_map_delete_elem(struct bpf_map *map, const void *key)
  64. Hash entries can be deleted using the ``bpf_map_delete_elem()``
  65. helper. This helper will return 0 on success, or negative error in case
  66. of failure.
  67. Per CPU Hashes
  68. --------------
  69. For ``BPF_MAP_TYPE_PERCPU_HASH`` and ``BPF_MAP_TYPE_LRU_PERCPU_HASH``
  70. the ``bpf_map_update_elem()`` and ``bpf_map_lookup_elem()`` helpers
  71. automatically access the hash slot for the current CPU.
  72. bpf_map_lookup_percpu_elem()
  73. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  74. .. code-block:: c
  75. void *bpf_map_lookup_percpu_elem(struct bpf_map *map, const void *key, u32 cpu)
  76. The ``bpf_map_lookup_percpu_elem()`` helper can be used to lookup the
  77. value in the hash slot for a specific CPU. Returns value associated with
  78. ``key`` on ``cpu`` , or ``NULL`` if no entry was found or ``cpu`` is
  79. invalid.
  80. Concurrency
  81. -----------
  82. Values stored in ``BPF_MAP_TYPE_HASH`` can be accessed concurrently by
  83. programs running on different CPUs. Since Kernel version 5.1, the BPF
  84. infrastructure provides ``struct bpf_spin_lock`` to synchronise access.
  85. See ``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
  86. Userspace
  87. ---------
  88. bpf_map_get_next_key()
  89. ~~~~~~~~~~~~~~~~~~~~~~
  90. .. code-block:: c
  91. int bpf_map_get_next_key(int fd, const void *cur_key, void *next_key)
  92. In userspace, it is possible to iterate through the keys of a hash using
  93. libbpf's ``bpf_map_get_next_key()`` function. The first key can be fetched by
  94. calling ``bpf_map_get_next_key()`` with ``cur_key`` set to
  95. ``NULL``. Subsequent calls will fetch the next key that follows the
  96. current key. ``bpf_map_get_next_key()`` returns 0 on success, -ENOENT if
  97. cur_key is the last key in the hash, or negative error in case of
  98. failure.
  99. Note that if ``cur_key`` gets deleted then ``bpf_map_get_next_key()``
  100. will instead return the *first* key in the hash table which is
  101. undesirable. It is recommended to use batched lookup if there is going
  102. to be key deletion intermixed with ``bpf_map_get_next_key()``.
  103. Examples
  104. ========
  105. Please see the ``tools/testing/selftests/bpf`` directory for functional
  106. examples. The code snippets below demonstrates API usage.
  107. This example shows how to declare an LRU Hash with a struct key and a
  108. struct value.
  109. .. code-block:: c
  110. #include <linux/bpf.h>
  111. #include <bpf/bpf_helpers.h>
  112. struct key {
  113. __u32 srcip;
  114. };
  115. struct value {
  116. __u64 packets;
  117. __u64 bytes;
  118. };
  119. struct {
  120. __uint(type, BPF_MAP_TYPE_LRU_HASH);
  121. __uint(max_entries, 32);
  122. __type(key, struct key);
  123. __type(value, struct value);
  124. } packet_stats SEC(".maps");
  125. This example shows how to create or update hash values using atomic
  126. instructions:
  127. .. code-block:: c
  128. static void update_stats(__u32 srcip, int bytes)
  129. {
  130. struct key key = {
  131. .srcip = srcip,
  132. };
  133. struct value *value = bpf_map_lookup_elem(&packet_stats, &key);
  134. if (value) {
  135. __sync_fetch_and_add(&value->packets, 1);
  136. __sync_fetch_and_add(&value->bytes, bytes);
  137. } else {
  138. struct value newval = { 1, bytes };
  139. bpf_map_update_elem(&packet_stats, &key, &newval, BPF_NOEXIST);
  140. }
  141. }
  142. Userspace walking the map elements from the map declared above:
  143. .. code-block:: c
  144. #include <bpf/libbpf.h>
  145. #include <bpf/bpf.h>
  146. static void walk_hash_elements(int map_fd)
  147. {
  148. struct key *cur_key = NULL;
  149. struct key next_key;
  150. struct value value;
  151. int err;
  152. for (;;) {
  153. err = bpf_map_get_next_key(map_fd, cur_key, &next_key);
  154. if (err)
  155. break;
  156. bpf_map_lookup_elem(map_fd, &next_key, &value);
  157. // Use key and value here
  158. cur_key = &next_key;
  159. }
  160. }
  161. Internals
  162. =========
  163. This section of the document is targeted at Linux developers and describes
  164. aspects of the map implementations that are not considered stable ABI. The
  165. following details are subject to change in future versions of the kernel.
  166. ``BPF_MAP_TYPE_LRU_HASH`` and variants
  167. --------------------------------------
  168. Updating elements in LRU maps may trigger eviction behaviour when the capacity
  169. of the map is reached. There are various steps that the update algorithm
  170. attempts in order to enforce the LRU property which have increasing impacts on
  171. other CPUs involved in the following operation attempts:
  172. - Attempt to use CPU-local state to batch operations
  173. - Attempt to fetch free nodes from global lists
  174. - Attempt to pull any node from a global list and remove it from the hashmap
  175. - Attempt to pull any node from any CPU's list and remove it from the hashmap
  176. This algorithm is described visually in the following diagram. See the
  177. description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of
  178. the corresponding operations:
  179. .. kernel-figure:: map_lru_hash_update.dot
  180. :alt: Diagram outlining the LRU eviction steps taken during map update.
  181. LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and
  182. variants. See the dot file source for kernel function name code references.
  183. Map updates start from the oval in the top right "begin ``bpf_map_update()``"
  184. and progress through the graph towards the bottom where the result may be
  185. either a successful update or a failure with various error codes. The key in
  186. the top right provides indicators for which locks may be involved in specific
  187. operations. This is intended as a visual hint for reasoning about how map
  188. contention may impact update operations, though the map type and flags may
  189. impact the actual contention on those locks, based on the logic described in
  190. the table above. For instance, if the map is created with type
  191. ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map
  192. properties would be per-cpu.