nf_flowtable.txt 5.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112
  1. Netfilter's flowtable infrastructure
  2. ====================================
  3. This documentation describes the software flowtable infrastructure available in
  4. Netfilter since Linux kernel 4.16.
  5. Overview
  6. --------
  7. Initial packets follow the classic forwarding path, once the flow enters the
  8. established state according to the conntrack semantics (ie. we have seen traffic
  9. in both directions), then you can decide to offload the flow to the flowtable
  10. from the forward chain via the 'flow offload' action available in nftables.
  11. Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
  12. output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
  13. path (the visible effect is that you do not see these packets from any of the
  14. netfilter hooks coming after the ingress). In case of flowtable miss, the packet
  15. follows the classic forward path.
  16. The flowtable uses a resizable hashtable, lookups are based on the following
  17. 7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
  18. and destination ports and the input interface (useful in case there are several
  19. conntrack zones in place).
  20. Flowtables are populated via the 'flow offload' nftables action, so the user can
  21. selectively specify what flows are placed into the flow table. Hence, packets
  22. follow the classic forwarding path unless the user explicitly instruct packets
  23. to use this new alternative forwarding path via nftables policy.
  24. This is represented in Fig.1, which describes the classic forwarding path
  25. including the Netfilter hooks and the flowtable fastpath bypass.
  26. userspace process
  27. ^ |
  28. | |
  29. _____|____ ____\/___
  30. / \ / \
  31. | input | | output |
  32. \__________/ \_________/
  33. ^ |
  34. | |
  35. _________ __________ --------- _____\/_____
  36. / \ / \ |Routing | / \
  37. --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit
  38. \_________/ \__________/ ---------- \____________/ ^
  39. | ^ | | ^ |
  40. flowtable | | ____\/___ | |
  41. | | | / \ | |
  42. __\/___ | --------->| forward |------------ |
  43. |-----| | \_________/ |
  44. |-----| | 'flow offload' rule |
  45. |-----| | adds entry to |
  46. |_____| | flowtable |
  47. | | |
  48. / \ | |
  49. /hit\_no_| |
  50. \ ? / |
  51. \ / |
  52. |__yes_________________fastpath bypass ____________________________|
  53. Fig.1 Netfilter hooks and flowtable interactions
  54. The flowtable entry also stores the NAT configuration, so all packets are
  55. mangled according to the NAT policy that matches the initial packets that went
  56. through the classic forwarding path. The TTL is decremented before calling
  57. neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
  58. path given that the transport selectors are missing, therefore flowtable lookup
  59. is not possible.
  60. Example configuration
  61. ---------------------
  62. Enabling the flowtable bypass is relatively easy, you only need to create a
  63. flowtable and add one rule to your forward chain.
  64. table inet x {
  65. flowtable f {
  66. hook ingress priority 0; devices = { eth0, eth1 };
  67. }
  68. chain y {
  69. type filter hook forward priority 0; policy accept;
  70. ip protocol tcp flow offload @f
  71. counter packets 0 bytes 0
  72. }
  73. }
  74. This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
  75. netdevices. You can create as many flowtables as you want in case you need to
  76. perform resource partitioning. The flowtable priority defines the order in which
  77. hooks are run in the pipeline, this is convenient in case you already have a
  78. nftables ingress chain (make sure the flowtable priority is smaller than the
  79. nftables ingress chain hence the flowtable runs before in the pipeline).
  80. The 'flow offload' action from the forward chain 'y' adds an entry to the
  81. flowtable for the TCP syn-ack packet coming in the reply direction. Once the
  82. flow is offloaded, you will observe that the counter rule in the example above
  83. does not get updated for the packets that are being forwarded through the
  84. forwarding bypass.
  85. More reading
  86. ------------
  87. This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also
  88. made a very complete and comprehensive summary called "A state of network
  89. acceleration" that describes how things were before this infrastructure was
  90. mailined [3] and it also makes a rough summary of this work [4].
  91. [1] https://lwn.net/Articles/738214/
  92. [2] https://lwn.net/Articles/742164/
  93. [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
  94. [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html