mirror of https://github.com/Qortal/Brooklyn
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
983 lines
32 KiB
983 lines
32 KiB
===================== |
|
BPF Type Format (BTF) |
|
===================== |
|
|
|
1. Introduction |
|
*************** |
|
|
|
BTF (BPF Type Format) is the metadata format which encodes the debug info |
|
related to BPF program/map. The name BTF was used initially to describe data |
|
types. The BTF was later extended to include function info for defined |
|
subroutines, and line info for source/line information. |
|
|
|
The debug info is used for map pretty print, function signature, etc. The |
|
function signature enables better bpf program/function kernel symbol. The line |
|
info helps generate source annotated translated byte code, jited code and |
|
verifier log. |
|
|
|
The BTF specification contains two parts, |
|
* BTF kernel API |
|
* BTF ELF file format |
|
|
|
The kernel API is the contract between user space and kernel. The kernel |
|
verifies the BTF info before using it. The ELF file format is a user space |
|
contract between ELF file and libbpf loader. |
|
|
|
The type and string sections are part of the BTF kernel API, describing the |
|
debug info (mostly types related) referenced by the bpf program. These two |
|
sections are discussed in details in :ref:`BTF_Type_String`. |
|
|
|
.. _BTF_Type_String: |
|
|
|
2. BTF Type and String Encoding |
|
******************************* |
|
|
|
The file ``include/uapi/linux/btf.h`` provides high-level definition of how |
|
types/strings are encoded. |
|
|
|
The beginning of data blob must be:: |
|
|
|
struct btf_header { |
|
__u16 magic; |
|
__u8 version; |
|
__u8 flags; |
|
__u32 hdr_len; |
|
|
|
/* All offsets are in bytes relative to the end of this header */ |
|
__u32 type_off; /* offset of type section */ |
|
__u32 type_len; /* length of type section */ |
|
__u32 str_off; /* offset of string section */ |
|
__u32 str_len; /* length of string section */ |
|
}; |
|
|
|
The magic is ``0xeB9F``, which has different encoding for big and little |
|
endian systems, and can be used to test whether BTF is generated for big- or |
|
little-endian target. The ``btf_header`` is designed to be extensible with |
|
``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is |
|
generated. |
|
|
|
2.1 String Encoding |
|
=================== |
|
|
|
The first string in the string section must be a null string. The rest of |
|
string table is a concatenation of other null-terminated strings. |
|
|
|
2.2 Type Encoding |
|
================= |
|
|
|
The type id ``0`` is reserved for ``void`` type. The type section is parsed |
|
sequentially and type id is assigned to each recognized type starting from id |
|
``1``. Currently, the following types are supported:: |
|
|
|
#define BTF_KIND_INT 1 /* Integer */ |
|
#define BTF_KIND_PTR 2 /* Pointer */ |
|
#define BTF_KIND_ARRAY 3 /* Array */ |
|
#define BTF_KIND_STRUCT 4 /* Struct */ |
|
#define BTF_KIND_UNION 5 /* Union */ |
|
#define BTF_KIND_ENUM 6 /* Enumeration */ |
|
#define BTF_KIND_FWD 7 /* Forward */ |
|
#define BTF_KIND_TYPEDEF 8 /* Typedef */ |
|
#define BTF_KIND_VOLATILE 9 /* Volatile */ |
|
#define BTF_KIND_CONST 10 /* Const */ |
|
#define BTF_KIND_RESTRICT 11 /* Restrict */ |
|
#define BTF_KIND_FUNC 12 /* Function */ |
|
#define BTF_KIND_FUNC_PROTO 13 /* Function Proto */ |
|
#define BTF_KIND_VAR 14 /* Variable */ |
|
#define BTF_KIND_DATASEC 15 /* Section */ |
|
#define BTF_KIND_FLOAT 16 /* Floating point */ |
|
|
|
Note that the type section encodes debug info, not just pure types. |
|
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram. |
|
|
|
Each type contains the following common data:: |
|
|
|
struct btf_type { |
|
__u32 name_off; |
|
/* "info" bits arrangement |
|
* bits 0-15: vlen (e.g. # of struct's members) |
|
* bits 16-23: unused |
|
* bits 24-28: kind (e.g. int, ptr, array...etc) |
|
* bits 29-30: unused |
|
* bit 31: kind_flag, currently used by |
|
* struct, union and fwd |
|
*/ |
|
__u32 info; |
|
/* "size" is used by INT, ENUM, STRUCT and UNION. |
|
* "size" tells the size of the type it is describing. |
|
* |
|
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT, |
|
* FUNC and FUNC_PROTO. |
|
* "type" is a type_id referring to another type. |
|
*/ |
|
union { |
|
__u32 size; |
|
__u32 type; |
|
}; |
|
}; |
|
|
|
For certain kinds, the common data are followed by kind-specific data. The |
|
``name_off`` in ``struct btf_type`` specifies the offset in the string table. |
|
The following sections detail encoding of each kind. |
|
|
|
2.2.1 BTF_KIND_INT |
|
~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: any valid offset |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_INT |
|
* ``info.vlen``: 0 |
|
* ``size``: the size of the int type in bytes. |
|
|
|
``btf_type`` is followed by a ``u32`` with the following bits arrangement:: |
|
|
|
#define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24) |
|
#define BTF_INT_OFFSET(VAL) (((VAL) & 0x00ff0000) >> 16) |
|
#define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff) |
|
|
|
The ``BTF_INT_ENCODING`` has the following attributes:: |
|
|
|
#define BTF_INT_SIGNED (1 << 0) |
|
#define BTF_INT_CHAR (1 << 1) |
|
#define BTF_INT_BOOL (1 << 2) |
|
|
|
The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or |
|
bool, for the int type. The char and bool encoding are mostly useful for |
|
pretty print. At most one encoding can be specified for the int type. |
|
|
|
The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int |
|
type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4. |
|
The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()`` |
|
for the type. The maximum value of ``BTF_INT_BITS()`` is 128. |
|
|
|
The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values |
|
for this int. For example, a bitfield struct member has: |
|
|
|
* btf member bit offset 100 from the start of the structure, |
|
* btf member pointing to an int type, |
|
* the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` |
|
|
|
Then in the struct memory layout, this member will occupy ``4`` bits starting |
|
from bits ``100 + 2 = 102``. |
|
|
|
Alternatively, the bitfield struct member can be the following to access the |
|
same bits as the above: |
|
|
|
* btf member bit offset 102, |
|
* btf member pointing to an int type, |
|
* the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4`` |
|
|
|
The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of |
|
bitfield encoding. Currently, both llvm and pahole generate |
|
``BTF_INT_OFFSET() = 0`` for all int types. |
|
|
|
2.2.2 BTF_KIND_PTR |
|
~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_PTR |
|
* ``info.vlen``: 0 |
|
* ``type``: the pointee type of the pointer |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
2.2.3 BTF_KIND_ARRAY |
|
~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_ARRAY |
|
* ``info.vlen``: 0 |
|
* ``size/type``: 0, not used |
|
|
|
``btf_type`` is followed by one ``struct btf_array``:: |
|
|
|
struct btf_array { |
|
__u32 type; |
|
__u32 index_type; |
|
__u32 nelems; |
|
}; |
|
|
|
The ``struct btf_array`` encoding: |
|
* ``type``: the element type |
|
* ``index_type``: the index type |
|
* ``nelems``: the number of elements for this array (``0`` is also allowed). |
|
|
|
The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``, |
|
``u64``, ``unsigned __int128``). The original design of including |
|
``index_type`` follows DWARF, which has an ``index_type`` for its array type. |
|
Currently in BTF, beyond type verification, the ``index_type`` is not used. |
|
|
|
The ``struct btf_array`` allows chaining through element type to represent |
|
multidimensional arrays. For example, for ``int a[5][6]``, the following type |
|
information illustrates the chaining: |
|
|
|
* [1]: int |
|
* [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6`` |
|
* [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5`` |
|
|
|
Currently, both pahole and llvm collapse multidimensional array into |
|
one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is |
|
equal to ``30``. This is because the original use case is map pretty print |
|
where the whole array is dumped out so one-dimensional array is enough. As |
|
more BTF usage is explored, pahole and llvm can be changed to generate proper |
|
chained representation for multidimensional arrays. |
|
|
|
2.2.4 BTF_KIND_STRUCT |
|
~~~~~~~~~~~~~~~~~~~~~ |
|
2.2.5 BTF_KIND_UNION |
|
~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 or offset to a valid C identifier |
|
* ``info.kind_flag``: 0 or 1 |
|
* ``info.kind``: BTF_KIND_STRUCT or BTF_KIND_UNION |
|
* ``info.vlen``: the number of struct/union members |
|
* ``info.size``: the size of the struct/union in bytes |
|
|
|
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_member``.:: |
|
|
|
struct btf_member { |
|
__u32 name_off; |
|
__u32 type; |
|
__u32 offset; |
|
}; |
|
|
|
``struct btf_member`` encoding: |
|
* ``name_off``: offset to a valid C identifier |
|
* ``type``: the member type |
|
* ``offset``: <see below> |
|
|
|
If the type info ``kind_flag`` is not set, the offset contains only bit offset |
|
of the member. Note that the base type of the bitfield can only be int or enum |
|
type. If the bitfield size is 32, the base type can be either int or enum |
|
type. If the bitfield size is not 32, the base type must be int, and int type |
|
``BTF_INT_BITS()`` encodes the bitfield size. |
|
|
|
If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member |
|
bitfield size and bit offset. The bitfield size and bit offset are calculated |
|
as below.:: |
|
|
|
#define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24) |
|
#define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff) |
|
|
|
In this case, if the base type is an int type, it must be a regular int type: |
|
|
|
* ``BTF_INT_OFFSET()`` must be 0. |
|
* ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``. |
|
|
|
The following kernel patch introduced ``kind_flag`` and explained why both |
|
modes exist: |
|
|
|
https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3 |
|
|
|
2.2.6 BTF_KIND_ENUM |
|
~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 or offset to a valid C identifier |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_ENUM |
|
* ``info.vlen``: number of enum values |
|
* ``size``: 4 |
|
|
|
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.:: |
|
|
|
struct btf_enum { |
|
__u32 name_off; |
|
__s32 val; |
|
}; |
|
|
|
The ``btf_enum`` encoding: |
|
* ``name_off``: offset to a valid C identifier |
|
* ``val``: any value |
|
|
|
2.2.7 BTF_KIND_FWD |
|
~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: offset to a valid C identifier |
|
* ``info.kind_flag``: 0 for struct, 1 for union |
|
* ``info.kind``: BTF_KIND_FWD |
|
* ``info.vlen``: 0 |
|
* ``type``: 0 |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
2.2.8 BTF_KIND_TYPEDEF |
|
~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: offset to a valid C identifier |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_TYPEDEF |
|
* ``info.vlen``: 0 |
|
* ``type``: the type which can be referred by name at ``name_off`` |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
2.2.9 BTF_KIND_VOLATILE |
|
~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_VOLATILE |
|
* ``info.vlen``: 0 |
|
* ``type``: the type with ``volatile`` qualifier |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
2.2.10 BTF_KIND_CONST |
|
~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_CONST |
|
* ``info.vlen``: 0 |
|
* ``type``: the type with ``const`` qualifier |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
2.2.11 BTF_KIND_RESTRICT |
|
~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_RESTRICT |
|
* ``info.vlen``: 0 |
|
* ``type``: the type with ``restrict`` qualifier |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
2.2.12 BTF_KIND_FUNC |
|
~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: offset to a valid C identifier |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_FUNC |
|
* ``info.vlen``: 0 |
|
* ``type``: a BTF_KIND_FUNC_PROTO type |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose |
|
signature is defined by ``type``. The subprogram is thus an instance of that |
|
type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the |
|
:ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load` |
|
(ABI). |
|
|
|
2.2.13 BTF_KIND_FUNC_PROTO |
|
~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: 0 |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_FUNC_PROTO |
|
* ``info.vlen``: # of parameters |
|
* ``type``: the return type |
|
|
|
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_param``.:: |
|
|
|
struct btf_param { |
|
__u32 name_off; |
|
__u32 type; |
|
}; |
|
|
|
If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then |
|
``btf_param.name_off`` must point to a valid C identifier except for the |
|
possible last argument representing the variable argument. The btf_param.type |
|
refers to parameter type. |
|
|
|
If the function has variable arguments, the last parameter is encoded with |
|
``name_off = 0`` and ``type = 0``. |
|
|
|
2.2.14 BTF_KIND_VAR |
|
~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: offset to a valid C identifier |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_VAR |
|
* ``info.vlen``: 0 |
|
* ``type``: the type of the variable |
|
|
|
``btf_type`` is followed by a single ``struct btf_variable`` with the |
|
following data:: |
|
|
|
struct btf_var { |
|
__u32 linkage; |
|
}; |
|
|
|
``struct btf_var`` encoding: |
|
* ``linkage``: currently only static variable 0, or globally allocated |
|
variable in ELF sections 1 |
|
|
|
Not all type of global variables are supported by LLVM at this point. |
|
The following is currently available: |
|
|
|
* static variables with or without section attributes |
|
* global variables with section attributes |
|
|
|
The latter is for future extraction of map key/value type id's from a |
|
map definition. |
|
|
|
2.2.15 BTF_KIND_DATASEC |
|
~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: offset to a valid name associated with a variable or |
|
one of .data/.bss/.rodata |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_DATASEC |
|
* ``info.vlen``: # of variables |
|
* ``size``: total section size in bytes (0 at compilation time, patched |
|
to actual size by BPF loaders such as libbpf) |
|
|
|
``btf_type`` is followed by ``info.vlen`` number of ``struct btf_var_secinfo``.:: |
|
|
|
struct btf_var_secinfo { |
|
__u32 type; |
|
__u32 offset; |
|
__u32 size; |
|
}; |
|
|
|
``struct btf_var_secinfo`` encoding: |
|
* ``type``: the type of the BTF_KIND_VAR variable |
|
* ``offset``: the in-section offset of the variable |
|
* ``size``: the size of the variable in bytes |
|
|
|
2.2.16 BTF_KIND_FLOAT |
|
~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
``struct btf_type`` encoding requirement: |
|
* ``name_off``: any valid offset |
|
* ``info.kind_flag``: 0 |
|
* ``info.kind``: BTF_KIND_FLOAT |
|
* ``info.vlen``: 0 |
|
* ``size``: the size of the float type in bytes: 2, 4, 8, 12 or 16. |
|
|
|
No additional type data follow ``btf_type``. |
|
|
|
3. BTF Kernel API |
|
***************** |
|
|
|
The following bpf syscall command involves BTF: |
|
* BPF_BTF_LOAD: load a blob of BTF data into kernel |
|
* BPF_MAP_CREATE: map creation with btf key and value type info. |
|
* BPF_PROG_LOAD: prog load with btf function and line info. |
|
* BPF_BTF_GET_FD_BY_ID: get a btf fd |
|
* BPF_OBJ_GET_INFO_BY_FD: btf, func_info, line_info |
|
and other btf related info are returned. |
|
|
|
The workflow typically looks like: |
|
:: |
|
|
|
Application: |
|
BPF_BTF_LOAD |
|
| |
|
v |
|
BPF_MAP_CREATE and BPF_PROG_LOAD |
|
| |
|
V |
|
...... |
|
|
|
Introspection tool: |
|
...... |
|
BPF_{PROG,MAP}_GET_NEXT_ID (get prog/map id's) |
|
| |
|
V |
|
BPF_{PROG,MAP}_GET_FD_BY_ID (get a prog/map fd) |
|
| |
|
V |
|
BPF_OBJ_GET_INFO_BY_FD (get bpf_prog_info/bpf_map_info with btf_id) |
|
| | |
|
V | |
|
BPF_BTF_GET_FD_BY_ID (get btf_fd) | |
|
| | |
|
V | |
|
BPF_OBJ_GET_INFO_BY_FD (get btf) | |
|
| | |
|
V V |
|
pretty print types, dump func signatures and line info, etc. |
|
|
|
|
|
3.1 BPF_BTF_LOAD |
|
================ |
|
|
|
Load a blob of BTF data into kernel. A blob of data, described in |
|
:ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd`` |
|
is returned to a userspace. |
|
|
|
3.2 BPF_MAP_CREATE |
|
================== |
|
|
|
A map can be created with ``btf_fd`` and specified key/value type id.:: |
|
|
|
__u32 btf_fd; /* fd pointing to a BTF type data */ |
|
__u32 btf_key_type_id; /* BTF type_id of the key */ |
|
__u32 btf_value_type_id; /* BTF type_id of the value */ |
|
|
|
In libbpf, the map can be defined with extra annotation like below: |
|
:: |
|
|
|
struct bpf_map_def SEC("maps") btf_map = { |
|
.type = BPF_MAP_TYPE_ARRAY, |
|
.key_size = sizeof(int), |
|
.value_size = sizeof(struct ipv_counts), |
|
.max_entries = 4, |
|
}; |
|
BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts); |
|
|
|
Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and |
|
value types for the map. During ELF parsing, libbpf is able to extract |
|
key/value type_id's and assign them to BPF_MAP_CREATE attributes |
|
automatically. |
|
|
|
.. _BPF_Prog_Load: |
|
|
|
3.3 BPF_PROG_LOAD |
|
================= |
|
|
|
During prog_load, func_info and line_info can be passed to kernel with proper |
|
values for the following attributes: |
|
:: |
|
|
|
__u32 insn_cnt; |
|
__aligned_u64 insns; |
|
...... |
|
__u32 prog_btf_fd; /* fd pointing to BTF type data */ |
|
__u32 func_info_rec_size; /* userspace bpf_func_info size */ |
|
__aligned_u64 func_info; /* func info */ |
|
__u32 func_info_cnt; /* number of bpf_func_info records */ |
|
__u32 line_info_rec_size; /* userspace bpf_line_info size */ |
|
__aligned_u64 line_info; /* line info */ |
|
__u32 line_info_cnt; /* number of bpf_line_info records */ |
|
|
|
The func_info and line_info are an array of below, respectively.:: |
|
|
|
struct bpf_func_info { |
|
__u32 insn_off; /* [0, insn_cnt - 1] */ |
|
__u32 type_id; /* pointing to a BTF_KIND_FUNC type */ |
|
}; |
|
struct bpf_line_info { |
|
__u32 insn_off; /* [0, insn_cnt - 1] */ |
|
__u32 file_name_off; /* offset to string table for the filename */ |
|
__u32 line_off; /* offset to string table for the source line */ |
|
__u32 line_col; /* line number and column number */ |
|
}; |
|
|
|
func_info_rec_size is the size of each func_info record, and |
|
line_info_rec_size is the size of each line_info record. Passing the record |
|
size to kernel make it possible to extend the record itself in the future. |
|
|
|
Below are requirements for func_info: |
|
* func_info[0].insn_off must be 0. |
|
* the func_info insn_off is in strictly increasing order and matches |
|
bpf func boundaries. |
|
|
|
Below are requirements for line_info: |
|
* the first insn in each func must have a line_info record pointing to it. |
|
* the line_info insn_off is in strictly increasing order. |
|
|
|
For line_info, the line number and column number are defined as below: |
|
:: |
|
|
|
#define BPF_LINE_INFO_LINE_NUM(line_col) ((line_col) >> 10) |
|
#define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff) |
|
|
|
3.4 BPF_{PROG,MAP}_GET_NEXT_ID |
|
============================== |
|
|
|
In kernel, every loaded program, map or btf has a unique id. The id won't |
|
change during the lifetime of a program, map, or btf. |
|
|
|
The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for |
|
each command, to user space, for bpf program or maps, respectively, so an |
|
inspection tool can inspect all programs and maps. |
|
|
|
3.5 BPF_{PROG,MAP}_GET_FD_BY_ID |
|
=============================== |
|
|
|
An introspection tool cannot use id to get details about program or maps. |
|
A file descriptor needs to be obtained first for reference-counting purpose. |
|
|
|
3.6 BPF_OBJ_GET_INFO_BY_FD |
|
========================== |
|
|
|
Once a program/map fd is acquired, an introspection tool can get the detailed |
|
information from kernel about this fd, some of which are BTF-related. For |
|
example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids. |
|
``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated |
|
bpf byte codes, and jited_line_info. |
|
|
|
3.7 BPF_BTF_GET_FD_BY_ID |
|
======================== |
|
|
|
With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf |
|
syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with |
|
command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the |
|
kernel with BPF_BTF_LOAD, can be retrieved. |
|
|
|
With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection |
|
tool has full btf knowledge and is able to pretty print map key/values, dump |
|
func signatures and line info, along with byte/jit codes. |
|
|
|
4. ELF File Format Interface |
|
**************************** |
|
|
|
4.1 .BTF section |
|
================ |
|
|
|
The .BTF section contains type and string data. The format of this section is |
|
same as the one describe in :ref:`BTF_Type_String`. |
|
|
|
.. _BTF_Ext_Section: |
|
|
|
4.2 .BTF.ext section |
|
==================== |
|
|
|
The .BTF.ext section encodes func_info and line_info which needs loader |
|
manipulation before loading into the kernel. |
|
|
|
The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h`` |
|
and ``tools/lib/bpf/btf.c``. |
|
|
|
The current header of .BTF.ext section:: |
|
|
|
struct btf_ext_header { |
|
__u16 magic; |
|
__u8 version; |
|
__u8 flags; |
|
__u32 hdr_len; |
|
|
|
/* All offsets are in bytes relative to the end of this header */ |
|
__u32 func_info_off; |
|
__u32 func_info_len; |
|
__u32 line_info_off; |
|
__u32 line_info_len; |
|
}; |
|
|
|
It is very similar to .BTF section. Instead of type/string section, it |
|
contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details |
|
about func_info and line_info record format. |
|
|
|
The func_info is organized as below.:: |
|
|
|
func_info_rec_size |
|
btf_ext_info_sec for section #1 /* func_info for section #1 */ |
|
btf_ext_info_sec for section #2 /* func_info for section #2 */ |
|
... |
|
|
|
``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when |
|
.BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of |
|
func_info for each specific ELF section.:: |
|
|
|
struct btf_ext_info_sec { |
|
__u32 sec_name_off; /* offset to section name */ |
|
__u32 num_info; |
|
/* Followed by num_info * record_size number of bytes */ |
|
__u8 data[0]; |
|
}; |
|
|
|
Here, num_info must be greater than 0. |
|
|
|
The line_info is organized as below.:: |
|
|
|
line_info_rec_size |
|
btf_ext_info_sec for section #1 /* line_info for section #1 */ |
|
btf_ext_info_sec for section #2 /* line_info for section #2 */ |
|
... |
|
|
|
``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when |
|
.BTF.ext is generated. |
|
|
|
The interpretation of ``bpf_func_info->insn_off`` and |
|
``bpf_line_info->insn_off`` is different between kernel API and ELF API. For |
|
kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct |
|
bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the |
|
beginning of section (``btf_ext_info_sec->sec_name_off``). |
|
|
|
4.2 .BTF_ids section |
|
==================== |
|
|
|
The .BTF_ids section encodes BTF ID values that are used within the kernel. |
|
|
|
This section is created during the kernel compilation with the help of |
|
macros defined in ``include/linux/btf_ids.h`` header file. Kernel code can |
|
use them to create lists and sets (sorted lists) of BTF ID values. |
|
|
|
The ``BTF_ID_LIST`` and ``BTF_ID`` macros define unsorted list of BTF ID values, |
|
with following syntax:: |
|
|
|
BTF_ID_LIST(list) |
|
BTF_ID(type1, name1) |
|
BTF_ID(type2, name2) |
|
|
|
resulting in following layout in .BTF_ids section:: |
|
|
|
__BTF_ID__type1__name1__1: |
|
.zero 4 |
|
__BTF_ID__type2__name2__2: |
|
.zero 4 |
|
|
|
The ``u32 list[];`` variable is defined to access the list. |
|
|
|
The ``BTF_ID_UNUSED`` macro defines 4 zero bytes. It's used when we |
|
want to define unused entry in BTF_ID_LIST, like:: |
|
|
|
BTF_ID_LIST(bpf_skb_output_btf_ids) |
|
BTF_ID(struct, sk_buff) |
|
BTF_ID_UNUSED |
|
BTF_ID(struct, task_struct) |
|
|
|
The ``BTF_SET_START/END`` macros pair defines sorted list of BTF ID values |
|
and their count, with following syntax:: |
|
|
|
BTF_SET_START(set) |
|
BTF_ID(type1, name1) |
|
BTF_ID(type2, name2) |
|
BTF_SET_END(set) |
|
|
|
resulting in following layout in .BTF_ids section:: |
|
|
|
__BTF_ID__set__set: |
|
.zero 4 |
|
__BTF_ID__type1__name1__3: |
|
.zero 4 |
|
__BTF_ID__type2__name2__4: |
|
.zero 4 |
|
|
|
The ``struct btf_id_set set;`` variable is defined to access the list. |
|
|
|
The ``typeX`` name can be one of following:: |
|
|
|
struct, union, typedef, func |
|
|
|
and is used as a filter when resolving the BTF ID value. |
|
|
|
All the BTF ID lists and sets are compiled in the .BTF_ids section and |
|
resolved during the linking phase of kernel build by ``resolve_btfids`` tool. |
|
|
|
5. Using BTF |
|
************ |
|
|
|
5.1 bpftool map pretty print |
|
============================ |
|
|
|
With BTF, the map key/value can be printed based on fields rather than simply |
|
raw bytes. This is especially valuable for large structure or if your data |
|
structure has bitfields. For example, for the following map,:: |
|
|
|
enum A { A1, A2, A3, A4, A5 }; |
|
typedef enum A ___A; |
|
struct tmp_t { |
|
char a1:4; |
|
int a2:4; |
|
int :4; |
|
__u32 a3:4; |
|
int b; |
|
___A b1:4; |
|
enum A b2:4; |
|
}; |
|
struct bpf_map_def SEC("maps") tmpmap = { |
|
.type = BPF_MAP_TYPE_ARRAY, |
|
.key_size = sizeof(__u32), |
|
.value_size = sizeof(struct tmp_t), |
|
.max_entries = 1, |
|
}; |
|
BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t); |
|
|
|
bpftool is able to pretty print like below: |
|
:: |
|
|
|
[{ |
|
"key": 0, |
|
"value": { |
|
"a1": 0x2, |
|
"a2": 0x4, |
|
"a3": 0x6, |
|
"b": 7, |
|
"b1": 0x8, |
|
"b2": 0xa |
|
} |
|
} |
|
] |
|
|
|
5.2 bpftool prog dump |
|
===================== |
|
|
|
The following is an example showing how func_info and line_info can help prog |
|
dump with better kernel symbol names, function prototypes and line |
|
information.:: |
|
|
|
$ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv |
|
[...] |
|
int test_long_fname_2(struct dummy_tracepoint_args * arg): |
|
bpf_prog_44a040bf25481309_test_long_fname_2: |
|
; static int test_long_fname_2(struct dummy_tracepoint_args *arg) |
|
0: push %rbp |
|
1: mov %rsp,%rbp |
|
4: sub $0x30,%rsp |
|
b: sub $0x28,%rbp |
|
f: mov %rbx,0x0(%rbp) |
|
13: mov %r13,0x8(%rbp) |
|
17: mov %r14,0x10(%rbp) |
|
1b: mov %r15,0x18(%rbp) |
|
1f: xor %eax,%eax |
|
21: mov %rax,0x20(%rbp) |
|
25: xor %esi,%esi |
|
; int key = 0; |
|
27: mov %esi,-0x4(%rbp) |
|
; if (!arg->sock) |
|
2a: mov 0x8(%rdi),%rdi |
|
; if (!arg->sock) |
|
2e: cmp $0x0,%rdi |
|
32: je 0x0000000000000070 |
|
34: mov %rbp,%rsi |
|
; counts = bpf_map_lookup_elem(&btf_map, &key); |
|
[...] |
|
|
|
5.3 Verifier Log |
|
================ |
|
|
|
The following is an example of how line_info can help debugging verification |
|
failure.:: |
|
|
|
/* The code at tools/testing/selftests/bpf/test_xdp_noinline.c |
|
* is modified as below. |
|
*/ |
|
data = (void *)(long)xdp->data; |
|
data_end = (void *)(long)xdp->data_end; |
|
/* |
|
if (data + 4 > data_end) |
|
return XDP_DROP; |
|
*/ |
|
*(u32 *)data = dst->dst; |
|
|
|
$ bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp |
|
; data = (void *)(long)xdp->data; |
|
224: (79) r2 = *(u64 *)(r10 -112) |
|
225: (61) r2 = *(u32 *)(r2 +0) |
|
; *(u32 *)data = dst->dst; |
|
226: (63) *(u32 *)(r2 +0) = r1 |
|
invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0) |
|
R2 offset is outside of the packet |
|
|
|
6. BTF Generation |
|
***************** |
|
|
|
You need latest pahole |
|
|
|
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ |
|
|
|
or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't |
|
support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,:: |
|
|
|
-bash-4.4$ cat t.c |
|
struct t { |
|
int a:2; |
|
int b:3; |
|
int c:2; |
|
} g; |
|
-bash-4.4$ gcc -c -O2 -g t.c |
|
-bash-4.4$ pahole -JV t.o |
|
File t.o: |
|
[1] STRUCT t kind_flag=1 size=4 vlen=3 |
|
a type_id=2 bitfield_size=2 bits_offset=0 |
|
b type_id=2 bitfield_size=3 bits_offset=2 |
|
c type_id=2 bitfield_size=2 bits_offset=5 |
|
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED |
|
|
|
The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target |
|
only. The assembly code (-S) is able to show the BTF encoding in assembly |
|
format.:: |
|
|
|
-bash-4.4$ cat t2.c |
|
typedef int __int32; |
|
struct t2 { |
|
int a2; |
|
int (*f2)(char q1, __int32 q2, ...); |
|
int (*f3)(); |
|
} g2; |
|
int main() { return 0; } |
|
int test() { return 0; } |
|
-bash-4.4$ clang -c -g -O2 -target bpf t2.c |
|
-bash-4.4$ readelf -S t2.o |
|
...... |
|
[ 8] .BTF PROGBITS 0000000000000000 00000247 |
|
000000000000016e 0000000000000000 0 0 1 |
|
[ 9] .BTF.ext PROGBITS 0000000000000000 000003b5 |
|
0000000000000060 0000000000000000 0 0 1 |
|
[10] .rel.BTF.ext REL 0000000000000000 000007e0 |
|
0000000000000040 0000000000000010 16 9 8 |
|
...... |
|
-bash-4.4$ clang -S -g -O2 -target bpf t2.c |
|
-bash-4.4$ cat t2.s |
|
...... |
|
.section .BTF,"",@progbits |
|
.short 60319 # 0xeb9f |
|
.byte 1 |
|
.byte 0 |
|
.long 24 |
|
.long 0 |
|
.long 220 |
|
.long 220 |
|
.long 122 |
|
.long 0 # BTF_KIND_FUNC_PROTO(id = 1) |
|
.long 218103808 # 0xd000000 |
|
.long 2 |
|
.long 83 # BTF_KIND_INT(id = 2) |
|
.long 16777216 # 0x1000000 |
|
.long 4 |
|
.long 16777248 # 0x1000020 |
|
...... |
|
.byte 0 # string offset=0 |
|
.ascii ".text" # string offset=1 |
|
.byte 0 |
|
.ascii "/home/yhs/tmp-pahole/t2.c" # string offset=7 |
|
.byte 0 |
|
.ascii "int main() { return 0; }" # string offset=33 |
|
.byte 0 |
|
.ascii "int test() { return 0; }" # string offset=58 |
|
.byte 0 |
|
.ascii "int" # string offset=83 |
|
...... |
|
.section .BTF.ext,"",@progbits |
|
.short 60319 # 0xeb9f |
|
.byte 1 |
|
.byte 0 |
|
.long 24 |
|
.long 0 |
|
.long 28 |
|
.long 28 |
|
.long 44 |
|
.long 8 # FuncInfo |
|
.long 1 # FuncInfo section string offset=1 |
|
.long 2 |
|
.long .Lfunc_begin0 |
|
.long 3 |
|
.long .Lfunc_begin1 |
|
.long 5 |
|
.long 16 # LineInfo |
|
.long 1 # LineInfo section string offset=1 |
|
.long 2 |
|
.long .Ltmp0 |
|
.long 7 |
|
.long 33 |
|
.long 7182 # Line 7 Col 14 |
|
.long .Ltmp3 |
|
.long 7 |
|
.long 58 |
|
.long 8206 # Line 8 Col 14 |
|
|
|
7. Testing |
|
********** |
|
|
|
Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests.
|
|
|