mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
98 lines
3.8 KiB
98 lines
3.8 KiB
.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) |
|
|
|
===================== |
|
BPF sk_lookup program |
|
===================== |
|
|
|
BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability |
|
into the socket lookup performed by the transport layer when a packet is to be |
|
delivered locally. |
|
|
|
When invoked BPF sk_lookup program can select a socket that will receive the |
|
incoming packet by calling the ``bpf_sk_assign()`` BPF helper function. |
|
|
|
Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP. |
|
|
|
Motivation |
|
========== |
|
|
|
BPF sk_lookup program type was introduced to address setup scenarios where |
|
binding sockets to an address with ``bind()`` socket call is impractical, such |
|
as: |
|
|
|
1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when |
|
binding to a wildcard address ``INADRR_ANY`` is not possible due to a port |
|
conflict, |
|
2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use |
|
case. |
|
|
|
Such setups would require creating and ``bind()``'ing one socket to each of the |
|
IP address/port in the range, leading to resource consumption and potential |
|
latency spikes during socket lookup. |
|
|
|
Attachment |
|
========== |
|
|
|
BPF sk_lookup program can be attached to a network namespace with |
|
``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a |
|
netns FD as attachment ``target_fd``. |
|
|
|
Multiple programs can be attached to one network namespace. Programs will be |
|
invoked in the same order as they were attached. |
|
|
|
Hooks |
|
===== |
|
|
|
The attached BPF sk_lookup programs run whenever the transport layer needs to |
|
find a listening (TCP) or an unconnected (UDP) socket for an incoming packet. |
|
|
|
Incoming traffic to established (TCP) and connected (UDP) sockets is delivered |
|
as usual without triggering the BPF sk_lookup hook. |
|
|
|
The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP`` |
|
verdict code. As for other BPF program types that are network filters, |
|
``SK_PASS`` signifies that the socket lookup should continue on to regular |
|
hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the |
|
packet. |
|
|
|
A BPF sk_lookup program can also select a socket to receive the packet by |
|
calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket |
|
in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a |
|
``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the |
|
selection. Selecting a socket only takes effect if the program has terminated |
|
with ``SK_PASS`` code. |
|
|
|
When multiple programs are attached, the end result is determined from return |
|
codes of all the programs according to the following rules: |
|
|
|
1. If any program returned ``SK_PASS`` and selected a valid socket, the socket |
|
is used as the result of the socket lookup. |
|
2. If more than one program returned ``SK_PASS`` and selected a socket, the last |
|
selection takes effect. |
|
3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and |
|
selected a socket, socket lookup fails. |
|
4. If all programs returned ``SK_PASS`` and none of them selected a socket, |
|
socket lookup continues on. |
|
|
|
API |
|
=== |
|
|
|
In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program |
|
receives information about the packet that triggered the socket lookup. Namely: |
|
|
|
* IP version (``AF_INET`` or ``AF_INET6``), |
|
* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``), |
|
* source and destination IP address, |
|
* source and destination L4 port, |
|
* the socket that has been selected with ``bpf_sk_assign()``. |
|
|
|
Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API |
|
header, and `bpf-helpers(7) |
|
<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section |
|
for ``bpf_sk_assign()`` for details. |
|
|
|
Example |
|
======= |
|
|
|
See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference |
|
implementation.
|
|
|