mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
117 lines
6.0 KiB
117 lines
6.0 KiB
.. SPDX-License-Identifier: GPL-2.0 |
|
|
|
=================================== |
|
Running BPF programs from userspace |
|
=================================== |
|
|
|
This document describes the ``BPF_PROG_RUN`` facility for running BPF programs |
|
from userspace. |
|
|
|
.. contents:: |
|
:local: |
|
:depth: 2 |
|
|
|
|
|
Overview |
|
-------- |
|
|
|
The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to |
|
execute a BPF program in the kernel and return the results to userspace. This |
|
can be used to unit test BPF programs against user-supplied context objects, and |
|
as way to explicitly execute programs in the kernel for their side effects. The |
|
command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue |
|
to be defined in the UAPI header, aliased to the same value. |
|
|
|
The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the |
|
following types: |
|
|
|
- ``BPF_PROG_TYPE_SOCKET_FILTER`` |
|
- ``BPF_PROG_TYPE_SCHED_CLS`` |
|
- ``BPF_PROG_TYPE_SCHED_ACT`` |
|
- ``BPF_PROG_TYPE_XDP`` |
|
- ``BPF_PROG_TYPE_SK_LOOKUP`` |
|
- ``BPF_PROG_TYPE_CGROUP_SKB`` |
|
- ``BPF_PROG_TYPE_LWT_IN`` |
|
- ``BPF_PROG_TYPE_LWT_OUT`` |
|
- ``BPF_PROG_TYPE_LWT_XMIT`` |
|
- ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` |
|
- ``BPF_PROG_TYPE_FLOW_DISSECTOR`` |
|
- ``BPF_PROG_TYPE_STRUCT_OPS`` |
|
- ``BPF_PROG_TYPE_RAW_TRACEPOINT`` |
|
- ``BPF_PROG_TYPE_SYSCALL`` |
|
|
|
When using the ``BPF_PROG_RUN`` command, userspace supplies an input context |
|
object and (for program types operating on network packets) a buffer containing |
|
the packet data that the BPF program will operate on. The kernel will then |
|
execute the program and return the results to userspace. Note that programs will |
|
not have any side effects while being run in this mode; in particular, packets |
|
will not actually be redirected or dropped, the program return code will just be |
|
returned to userspace. A separate mode for live execution of XDP programs is |
|
provided, documented separately below. |
|
|
|
Running XDP programs in "live frame mode" |
|
----------------------------------------- |
|
|
|
The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, |
|
which can be used to execute XDP programs in a way where packets will actually |
|
be processed by the kernel after the execution of the XDP program as if they |
|
arrived on a physical interface. This mode is activated by setting the |
|
``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to |
|
``BPF_PROG_RUN``. |
|
|
|
The live packet mode is optimised for high performance execution of the supplied |
|
XDP program many times (suitable for, e.g., running as a traffic generator), |
|
which means the semantics are not quite as straight-forward as the regular test |
|
run mode. Specifically: |
|
|
|
- When executing an XDP program in live frame mode, the result of the execution |
|
will not be returned to userspace; instead, the kernel will perform the |
|
operation indicated by the program's return code (drop the packet, redirect |
|
it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes |
|
in the syscall parameters when running in this mode will be rejected. In |
|
addition, not all failures will be reported back to userspace directly; |
|
specifically, only fatal errors in setup or during execution (like memory |
|
allocation errors) will halt execution and return an error. If an error occurs |
|
in packet processing, like a failure to redirect to a given interface, |
|
execution will continue with the next repetition; these errors can be detected |
|
via the same trace points as for regular XDP programs. |
|
|
|
- Userspace can supply an ifindex as part of the context object, just like in |
|
the regular (non-live) mode. The XDP program will be executed as though the |
|
packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context |
|
object will point to that interface. Furthermore, if the XDP program returns |
|
``XDP_PASS``, the packet will be injected into the kernel networking stack as |
|
though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet |
|
will be transmitted *out* of that same interface. Do note, though, that |
|
because the program execution is not happening in driver context, an |
|
``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to |
|
that same interface (i.e., it will only work if the driver has support for the |
|
``ndo_xdp_xmit`` driver op). |
|
|
|
- When running the program with multiple repetitions, the execution will happen |
|
in batches. The batch size defaults to 64 packets (which is same as the |
|
maximum NAPI receive batch size), but can be specified by userspace through |
|
the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, |
|
the kernel executes the XDP program repeatedly, each invocation getting a |
|
separate copy of the packet data. For each repetition, if the program drops |
|
the packet, the data page is immediately recycled (see below). Otherwise, the |
|
packet is buffered until the end of the batch, at which point all packets |
|
buffered this way during the batch are transmitted at once. |
|
|
|
- When setting up the test run, the kernel will initialise a pool of memory |
|
pages of the same size as the batch size. Each memory page will be initialised |
|
with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` |
|
invocation. When possible, the pages will be recycled on future program |
|
invocations, to improve performance. Pages will generally be recycled a full |
|
batch at a time, except when a packet is dropped (by return code or because |
|
of, say, a redirection error), in which case that page will be recycled |
|
immediately. If a packet ends up being passed to the regular networking stack |
|
(because the XDP program returns ``XDP_PASS``, or because it ends up being |
|
redirected to an interface that injects it into the stack), the page will be |
|
released and a new one will be allocated when the pool is empty. |
|
|
|
When recycling, the page content is not rewritten; only the packet boundary |
|
pointers (``data``, ``data_end`` and ``data_meta``) in the context object will |
|
be reset to the original values. This means that if a program rewrites the |
|
packet contents, it has to be prepared to see either the original content or |
|
the modified version on subsequent invocations.
|
|
|