prog_flow_dissector.rst 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ============================
  3. BPF_PROG_TYPE_FLOW_DISSECTOR
  4. ============================
  5. Overview
  6. ========
  7. Flow dissector is a routine that parses metadata out of the packets. It's
  8. used in the various places in the networking subsystem (RFS, flow hash, etc).
  9. BPF flow dissector is an attempt to reimplement C-based flow dissector logic
  10. in BPF to gain all the benefits of BPF verifier (namely, limits on the
  11. number of instructions and tail calls).
  12. API
  13. ===
  14. BPF flow dissector programs operate on an ``__sk_buff``. However, only the
  15. limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
  16. ``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
  17. and output arguments.
  18. The inputs are:
  19. * ``nhoff`` - initial offset of the networking header
  20. * ``thoff`` - initial offset of the transport header, initialized to nhoff
  21. * ``n_proto`` - L3 protocol type, parsed out of L2 header
  22. * ``flags`` - optional flags
  23. Flow dissector BPF program should fill out the rest of the ``struct
  24. bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
  25. also adjusted accordingly.
  26. The return code of the BPF program is either BPF_OK to indicate successful
  27. dissection, or BPF_DROP to indicate parsing error.
  28. __sk_buff->data
  29. ===============
  30. In the VLAN-less case, this is what the initial state of the BPF flow
  31. dissector looks like::
  32. +------+------+------------+-----------+
  33. | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
  34. +------+------+------------+-----------+
  35. ^
  36. |
  37. +-- flow dissector starts here
  38. .. code:: c
  39. skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
  40. flow_keys->thoff = nhoff
  41. flow_keys->n_proto = ETHER_TYPE
  42. In case of VLAN, flow dissector can be called with the two different states.
  43. Pre-VLAN parsing::
  44. +------+------+------+-----+-----------+-----------+
  45. | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
  46. +------+------+------+-----+-----------+-----------+
  47. ^
  48. |
  49. +-- flow dissector starts here
  50. .. code:: c
  51. skb->data + flow_keys->nhoff point the to first byte of TCI
  52. flow_keys->thoff = nhoff
  53. flow_keys->n_proto = TPID
  54. Please note that TPID can be 802.1AD and, hence, BPF program would
  55. have to parse VLAN information twice for double tagged packets.
  56. Post-VLAN parsing::
  57. +------+------+------+-----+-----------+-----------+
  58. | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
  59. +------+------+------+-----+-----------+-----------+
  60. ^
  61. |
  62. +-- flow dissector starts here
  63. .. code:: c
  64. skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
  65. flow_keys->thoff = nhoff
  66. flow_keys->n_proto = ETHER_TYPE
  67. In this case VLAN information has been processed before the flow dissector
  68. and BPF flow dissector is not required to handle it.
  69. The takeaway here is as follows: BPF flow dissector program can be called with
  70. the optional VLAN header and should gracefully handle both cases: when single
  71. or double VLAN is present and when it is not present. The same program
  72. can be called for both cases and would have to be written carefully to
  73. handle both cases.
  74. Flags
  75. =====
  76. ``flow_keys->flags`` might contain optional input flags that work as follows:
  77. * ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to
  78. continue parsing first fragment; the default expected behavior is that
  79. flow dissector returns as soon as it finds out that the packet is fragmented;
  80. used by ``eth_get_headlen`` to estimate length of all headers for GRO.
  81. * ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to
  82. stop parsing as soon as it reaches IPv6 flow label; used by
  83. ``___skb_get_hash`` to get flow hash.
  84. * ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
  85. parsing as soon as it reaches encapsulated headers; used by routing
  86. infrastructure.
  87. Reference Implementation
  88. ========================
  89. See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
  90. implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
  91. for the loader. bpftool can be used to load BPF flow dissector program as well.
  92. The reference implementation is organized as follows:
  93. * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
  94. * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
  95. does ``bpf_tail_call`` to the appropriate L3 handler
  96. Since BPF at this point doesn't support looping (or any jumping back),
  97. jmp_table is used instead to handle multiple levels of encapsulation (and
  98. IPv6 options).
  99. Current Limitations
  100. ===================
  101. BPF flow dissector doesn't support exporting all the metadata that in-kernel
  102. C-based implementation can export. Notable example is single VLAN (802.1Q)
  103. and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
  104. for a set of information that's currently can be exported from the BPF context.
  105. When BPF flow dissector is attached to the root network namespace (machine-wide
  106. policy), users can't override it in their child network namespaces.