| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515 |
- ================================
- Documentation for /proc/sys/net/
- ================================
- Copyright
- Copyright (c) 1999
- - Terrehon Bowden <terrehon@pacbell.net>
- - Bodo Bauer <bb@ricochet.net>
- Copyright (c) 2000
- - Jorge Nerin <comandante@zaralinux.com>
- Copyright (c) 2009
- - Shen Feng <shen@cn.fujitsu.com>
- For general info and legal blurb, please look in index.rst.
- ------------------------------------------------------------------------------
- This file contains the documentation for the sysctl files in
- /proc/sys/net
- The interface to the networking parts of the kernel is located in
- /proc/sys/net. The following table shows all possible subdirectories. You may
- see only some of them, depending on your kernel's configuration.
- Table : Subdirectories in /proc/sys/net
- ========= =================== = ========== ===================
- Directory Content Directory Content
- ========= =================== = ========== ===================
- 802 E802 protocol mptcp Multipath TCP
- appletalk Appletalk protocol netfilter Network Filter
- ax25 AX25 netrom NET/ROM
- bridge Bridging rose X.25 PLP layer
- core General parameter tipc TIPC
- ethernet Ethernet protocol unix Unix domain sockets
- ipv4 IP version 4 x25 X.25 protocol
- ipv6 IP version 6
- ========= =================== = ========== ===================
- 1. /proc/sys/net/core - Network core options
- ============================================
- bpf_jit_enable
- --------------
- This enables the BPF Just in Time (JIT) compiler. BPF is a flexible
- and efficient infrastructure allowing to execute bytecode at various
- hook points. It is used in a number of Linux kernel subsystems such
- as networking (e.g. XDP, tc), tracing (e.g. kprobes, uprobes, tracepoints)
- and security (e.g. seccomp). LLVM has a BPF back end that can compile
- restricted C into a sequence of BPF instructions. After program load
- through bpf(2) and passing a verifier in the kernel, a JIT will then
- translate these BPF proglets into native CPU instructions. There are
- two flavors of JITs, the newer eBPF JIT currently supported on:
- - x86_64
- - x86_32
- - arm64
- - arm32
- - ppc64
- - ppc32
- - sparc64
- - mips64
- - s390x
- - riscv64
- - riscv32
- - loongarch64
- - arc
- And the older cBPF JIT supported on the following archs:
- - mips
- - sparc
- eBPF JITs are a superset of cBPF JITs, meaning the kernel will
- migrate cBPF instructions into eBPF instructions and then JIT
- compile them transparently. Older cBPF JITs can only translate
- tcpdump filters, seccomp rules, etc, but not mentioned eBPF
- programs loaded through bpf(2).
- Values:
- - 0 - disable the JIT (default value)
- - 1 - enable the JIT
- - 2 - enable the JIT and ask the compiler to emit traces on kernel log.
- bpf_jit_harden
- --------------
- This enables hardening for the BPF JIT compiler. Supported are eBPF
- JIT backends. Enabling hardening trades off performance, but can
- mitigate JIT spraying.
- Values:
- - 0 - disable JIT hardening (default value)
- - 1 - enable JIT hardening for unprivileged users only
- - 2 - enable JIT hardening for all users
- where "privileged user" in this context means a process having
- CAP_BPF or CAP_SYS_ADMIN in the root user name space.
- bpf_jit_kallsyms
- ----------------
- When BPF JIT compiler is enabled, then compiled images are unknown
- addresses to the kernel, meaning they neither show up in traces nor
- in /proc/kallsyms. This enables export of these addresses, which can
- be used for debugging/tracing. If bpf_jit_harden is enabled, this
- feature is disabled.
- Values :
- - 0 - disable JIT kallsyms export (default value)
- - 1 - enable JIT kallsyms export for privileged users only
- bpf_jit_limit
- -------------
- This enforces a global limit for memory allocations to the BPF JIT
- compiler in order to reject unprivileged JIT requests once it has
- been surpassed. bpf_jit_limit contains the value of the global limit
- in bytes.
- dev_weight
- ----------
- The maximum number of packets that kernel can handle on a NAPI interrupt,
- it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
- aggregated packet is counted as one packet in this context.
- Default: 64
- dev_weight_rx_bias
- ------------------
- RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
- of the driver for the per softirq cycle netdev_budget. This parameter influences
- the proportion of the configured netdev_budget that is spent on RPS based packet
- processing during RX softirq cycles. It is further meant for making current
- dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
- (see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
- on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
- Default: 1
- dev_weight_tx_bias
- ------------------
- Scales the maximum number of packets that can be processed during a TX softirq cycle.
- Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
- net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
- Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
- Default: 1
- default_qdisc
- -------------
- The default queuing discipline to use for network devices. This allows
- overriding the default of pfifo_fast with an alternative. Since the default
- queuing discipline is created without additional parameters so is best suited
- to queuing disciplines that work well without configuration like stochastic
- fair queue (sfq), CoDel (codel) or fair queue CoDel (fq_codel). Don't use
- queuing disciplines like Hierarchical Token Bucket or Deficit Round Robin
- which require setting up classes and bandwidths. Note that physical multiqueue
- interfaces still use mq as root qdisc, which in turn uses this default for its
- leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
- default to noqueue.
- Default: pfifo_fast
- busy_read
- ---------
- Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
- Approximate time in us to busy loop waiting for packets on the device queue.
- This sets the default value of the SO_BUSY_POLL socket option.
- Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
- which is the preferred method of enabling. If you need to enable the feature
- globally via sysctl, a value of 50 is recommended.
- Will increase power usage.
- Default: 0 (off)
- busy_poll
- ----------------
- Low latency busy poll timeout for poll and select. (needs CONFIG_NET_RX_BUSY_POLL)
- Approximate time in us to busy loop waiting for events.
- Recommended value depends on the number of sockets you poll on.
- For several sockets 50, for several hundreds 100.
- For more than that you probably want to use epoll.
- Note that only sockets with SO_BUSY_POLL set will be busy polled,
- so you want to either selectively set SO_BUSY_POLL on those sockets or set
- sysctl.net.busy_read globally.
- Will increase power usage.
- Default: 0 (off)
- mem_pcpu_rsv
- ------------
- Per-cpu reserved forward alloc cache size in page units. Default 1MB per CPU.
- rmem_default
- ------------
- The default setting of the socket receive buffer in bytes.
- rmem_max
- --------
- The maximum receive socket buffer size in bytes.
- rps_default_mask
- ----------------
- The default RPS CPU mask used on newly created network devices. An empty
- mask means RPS disabled by default.
- tstamp_allow_data
- -----------------
- Allow processes to receive tx timestamps looped together with the original
- packet contents. If disabled, transmit timestamp requests from unprivileged
- processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
- Default: 1 (on)
- wmem_default
- ------------
- The default setting (in bytes) of the socket send buffer.
- wmem_max
- --------
- The maximum send socket buffer size in bytes.
- message_burst and message_cost
- ------------------------------
- These parameters are used to limit the warning messages written to the kernel
- log from the networking code. They enforce a rate limit to make a
- denial-of-service attack impossible. A higher message_cost factor, results in
- fewer messages that will be written. Message_burst controls when messages will
- be dropped. The default settings limit warning messages to one every five
- seconds.
- warnings
- --------
- This sysctl is now unused.
- This was used to control console messages from the networking stack that
- occur because of problems on the network like duplicate address or bad
- checksums.
- These messages are now emitted at KERN_DEBUG and can generally be enabled
- and controlled by the dynamic_debug facility.
- netdev_budget
- -------------
- Maximum number of packets taken from all interfaces in one polling cycle (NAPI
- poll). In one polling cycle interfaces which are registered to polling are
- probed in a round-robin manner. Also, a polling cycle may not exceed
- netdev_budget_usecs microseconds, even if netdev_budget has not been
- exhausted.
- netdev_budget_usecs
- ---------------------
- Maximum number of microseconds in one NAPI polling cycle. Polling
- will exit when either netdev_budget_usecs have elapsed during the
- poll cycle or the number of packets processed reaches netdev_budget.
- netdev_max_backlog
- ------------------
- Maximum number of packets, queued on the INPUT side, when the interface
- receives packets faster than kernel can process them.
- netdev_rss_key
- --------------
- RSS (Receive Side Scaling) enabled drivers use a 40 bytes host key that is
- randomly generated.
- Some user space might need to gather its content even if drivers do not
- provide ethtool -x support yet.
- ::
- myhost:~# cat /proc/sys/net/core/netdev_rss_key
- 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
- File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
- Note:
- /proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
- but most drivers only use 40 bytes of it.
- ::
- myhost:~# ethtool -x eth0
- RX flow hash indirection table for eth0 with 8 RX ring(s):
- 0: 0 1 2 3 4 5 6 7
- RSS hash key:
- 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
- netdev_tstamp_prequeue
- ----------------------
- If set to 0, RX packet timestamps can be sampled after RPS processing, when
- the target CPU processes packets. It might give some delay on timestamps, but
- permit to distribute the load on several cpus.
- If set to 1 (default), timestamps are sampled as soon as possible, before
- queueing.
- netdev_unregister_timeout_secs
- ------------------------------
- Unregister network device timeout in seconds.
- This option controls the timeout (in seconds) used to issue a warning while
- waiting for a network device refcount to drop to 0 during device
- unregistration. A lower value may be useful during bisection to detect
- a leaked reference faster. A larger value may be useful to prevent false
- warnings on slow/loaded systems.
- Default value is 10, minimum 1, maximum 3600.
- skb_defer_max
- -------------
- Max size (in skbs) of the per-cpu list of skbs being freed
- by the cpu which allocated them. Used by TCP stack so far.
- Default: 64
- optmem_max
- ----------
- Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
- of struct cmsghdr structures with appended data. TCP tx zerocopy also uses
- optmem_max as a limit for its internal structures.
- Default : 128 KB
- fb_tunnels_only_for_init_net
- ----------------------------
- Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0,
- sit0, ip6tnl0, ip6gre0) are automatically created. There are 3 possibilities
- (a) value = 0; respective fallback tunnels are created when module is
- loaded in every net namespaces (backward compatible behavior).
- (b) value = 1; [kcmd value: initns] respective fallback tunnels are
- created only in init net namespace and every other net namespace will
- not have them.
- (c) value = 2; [kcmd value: none] fallback tunnels are not created
- when a module is loaded in any of the net namespace. Setting value to
- "2" is pointless after boot if these modules are built-in, so there is
- a kernel command-line option that can change this default. Please refer to
- Documentation/admin-guide/kernel-parameters.txt for additional details.
- Not creating fallback tunnels gives control to userspace to create
- whatever is needed only and avoid creating devices which are redundant.
- Default : 0 (for compatibility reasons)
- devconf_inherit_init_net
- ------------------------
- Controls if a new network namespace should inherit all current
- settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
- default, we keep the current behavior: for IPv4 we inherit all current
- settings from init_net and for IPv6 we reset all settings to default.
- If set to 1, both IPv4 and IPv6 settings are forced to inherit from
- current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
- forced to reset to their default values. If set to 3, both IPv4 and IPv6
- settings are forced to inherit from current ones in the netns where this
- new netns has been created.
- Default : 0 (for compatibility reasons)
- txrehash
- --------
- Controls default hash rethink behaviour on socket when SO_TXREHASH option is set
- to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
- If set to 1 (default), hash rethink is performed on listening socket.
- If set to 0, hash rethink is not performed.
- gro_normal_batch
- ----------------
- Maximum number of the segments to batch up on output of GRO. When a packet
- exits GRO, either as a coalesced superframe or as an original packet which
- GRO has decided not to coalesce, it is placed on a per-NAPI list. This
- list is then passed to the stack when the number of segments reaches the
- gro_normal_batch limit.
- high_order_alloc_disable
- ------------------------
- By default the allocator for page frags tries to use high order pages (order-3
- on x86). While the default behavior gives good results in most cases, some users
- might have hit a contention in page allocations/freeing. This was especially
- true on older kernels (< 5.14) when high-order pages were not stored on per-cpu
- lists. This allows to opt-in for order-0 allocation instead but is now mostly of
- historical importance.
- Default: 0
- 2. /proc/sys/net/unix - Parameters for Unix domain sockets
- ----------------------------------------------------------
- There is only one file in this directory.
- unix_dgram_qlen limits the max number of datagrams queued in Unix domain
- socket's buffer. It will not take effect unless PF_UNIX flag is specified.
- 3. /proc/sys/net/ipv4 - IPV4 settings
- -------------------------------------
- Please see: Documentation/networking/ip-sysctl.rst and
- Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
- 4. Appletalk
- ------------
- The /proc/sys/net/appletalk directory holds the Appletalk configuration data
- when Appletalk is loaded. The configurable parameters are:
- aarp-expiry-time
- ----------------
- The amount of time we keep an ARP entry before expiring it. Used to age out
- old hosts.
- aarp-resolve-time
- -----------------
- The amount of time we will spend trying to resolve an Appletalk address.
- aarp-retransmit-limit
- ---------------------
- The number of times we will retransmit a query before giving up.
- aarp-tick-time
- --------------
- Controls the rate at which expires are checked.
- The directory /proc/net/appletalk holds the list of active Appletalk sockets
- on a machine.
- The fields indicate the DDP type, the local address (in network:node format)
- the remote address, the size of the transmit pending queue, the size of the
- received queue (bytes waiting for applications to read) the state and the uid
- owning the socket.
- /proc/net/atalk_iface lists all the interfaces configured for appletalk.It
- shows the name of the interface, its Appletalk address, the network range on
- that address (or network number for phase 1 networks), and the status of the
- interface.
- /proc/net/atalk_route lists each known network route. It lists the target
- (network) that the route leads to, the router (may be directly connected), the
- route flags, and the device the route is using.
- 5. TIPC
- -------
- tipc_rmem
- ---------
- The TIPC protocol now has a tunable for the receive memory, similar to the
- tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
- ::
- # cat /proc/sys/net/tipc/tipc_rmem
- 4252725 34021800 68043600
- #
- The max value is set to CONN_OVERLOAD_LIMIT, and the default and min values
- are scaled (shifted) versions of that same value. Note that the min value
- is not at this point in time used in any meaningful way, but the triplet is
- preserved in order to be consistent with things like tcp_rmem.
- named_timeout
- -------------
- TIPC name table updates are distributed asynchronously in a cluster, without
- any form of transaction handling. This means that different race scenarios are
- possible. One such is that a name withdrawal sent out by one node and received
- by another node may arrive after a second, overlapping name publication already
- has been accepted from a third node, although the conflicting updates
- originally may have been issued in the correct sequential order.
- If named_timeout is nonzero, failed topology updates will be placed on a defer
- queue until another event arrives that clears the error, or until the timeout
- expires. Value is in milliseconds.
|