Geek/Engineer - Kernel Hacker - Low Level Software - Views my own
In the past I’ve been very happy in using BCC and bpftrace to jump into the BPF wagon. But since I spend a lot of my time on embedded systems (mostly Android phones nowadays), getting these tools on these systems is a bit of a headache.
The BPF community has been very active and hard to catch up with. I’ve heard about libbpf and CO-RE (Compile Once - Run Everywhere) since a while now but never managed the time to get over the learning curve.
But this time has finally come and I thought I’d share my learning experience to help others in ramping up and jump on the wagon too. It definitely felt confusing for me to learn about the various bits and pieces involved and how they all tie together.
First, Andrii Nakryiko who’s an active BPF developer/maintainer has written comprehensive blogs on the topic and what I’m doing here is a mere summary of what I’ve learnt from mostly reading his posts.
BPF CO-RE goal is to produce a single executable binary that you can run on any system.
This binary is split into 3 parts:
To understand how all of these should be combined together, let’s start with libbpf.
https://github.com/libbpf/libbpf
libbf is part of kernel tree and is mirrored on github for convenience. It requires a kernel built with
Based on my humble understanding it covers the following aspects:
In an ideal world, you should be able to install this library from your favourite distro. But based on my experience at least, and based on what I’ve seen others have to do, you’d need to include and compile this yourself as part of your BPF application.
Simply all you need to do is make libbpf part of your project and compile it so that when you compile your BPF program and userspace program later you can reference it.
git submodule add https://github.com/libbpf/libbpf libbpf
make -C libbpf/src BUILD_STATIC_ONLY=1 DESTDIR=$(pwd)/bpf install
NOTE: You must install it in bpf subdirectory as we’ll see later there
are autogenerated skeleton files that references headers in <bpf/$FILE.h>
.
Purpose of BPF is to attach to points of interests within the kernel and collect some data, or extend the kernel with new functionality.
To do so, you need to reference and use internal data structures that are not exported in normal linux headers. To overcome this, the BPF community introrduced BTF (BPF Type Format) that records information about kernel data structures that allows BPF programs to reference safely at runtime.
It also helps with compiling your BPF programs that need to find definitions of whatever structs you’re trying to access.
For the latter, you can generate a vmlinux.h
from BTF which contains all
definitions of internal kernel structures in a single file that you can include
in your BPF program source files.
To generate this vmlinux.h
, you need to use bpftool
which is a utility
available from linux-tools package.
BPF uses BTF to find out information about data structures. libbpf is probably
hiding lots of this glue logic when it loads and run the BPF program. But for
compiling your CO-RE program, all you need to know is that you need a BTF file
and bpftool
to generate vmlinux.h
to be able to compile your program.
sudo apt install linux-tools-$(uname -r)
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
Kernels must be compiled with these config options
hello_world.bpf.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>
char LICENSE[] SEC("license") = "GPL";
SEC("raw_tp/pelt_se_tp")
int BPF_PROG(handle_pelt_se, struct sched_entity *se)
{
int cpu = BPF_CORE_READ(se, cfs_rq, rq, cpu);
bpf_printk("[%d] Hello world!", cpu);
return 0;
}
I hope it’s self explanatory. Andrii’s blogs delves into great details, please check them out.
Nonetheless few things worth highlighting.
SEC
is a simple macro to put functions/variables in a section. It looks like
libbpf uses this to identify what to do.
SEC(raw_tp/pelt_se_tp)
tells libbpf to attach handle_pelt_se()
function
into pelt_se_tp
raw tracepoint. A raw tracepoint is where the kernel calls
trace_pelt_se_tp()
.
BPF_PROG()
is a helper macro to define handle_pelt_se()
to get access to
the args passed to trace_pelt_se_tp()
, otherwise you’d need to do some extra
work to access struct sched_entity *se
inside handle_pelt_se()
.
You can’t just read kernel memory from BPF programs. You must use helper
functions from libbpf. BPF_CORE_READ()
is the macro to use to access
se->cfs_rq->rq->cpu
. It expands to multiple bpf_core_read()
. Andrii’s blog
post explains this very nicely.
bpf_printk()
will write into the trace buffer. If you have used BCC or
bpftrace before you’d be familiar with such call. In practice this is useful
only for development/debugging. In reality you’d want to use mappings, which we
don’t cover in this post.
We must use clang
to compile. I think clang/llvm v10 or above is advised.
A newer one might be even better as newer features keeps getting added that
relies on latest version of clang. gcc
should gain support to compile BPF in
the near future, if hasn’t done already!
llvm-stip
is required to remove DWARF info from the object file which results
in a bloated binary. Important if you wanted to deploy in production
environment. Can be ignored otherwise.
sudo apt install clang llvm
clang -g -O2 -Wall -target bpf -D__TARGET_ARCH_arm64 -I bpf/usr/include \
-c hello_world.bpf.c -o hello_world.bpf.o
llvm-strip -g hello_world.bpf.o
hello_world.bpf.o
can be loaded into the kernel now using the syscall. But
that’s too much hassle. There’s an easier way to get this handled via libbpf.
BPF Skeleton is an autogenerated .skel.h
from the compiled
hello_world.bpf.o
by bpftool
. It simply does all the magic required to load
your bpf program from another userspace program. It produces a set of functions
that you can just call and tada, your bpf program will be loaded and running!
We shall see what .skel.h
file contains in the next section, but for now just
produce it with this command:
bpftool gen skeleton hello_world.bpf.o > hello_world.skel.h
hello_world.c
#include <bpf/libbpf.h>
#include <signal.h>
#include <stdio.h>
#include "hello_world.skel.h"
static volatile bool exiting = false;
static void sig_handler(int sig)
{
exiting = true;
}
int main(int argc, char **argv)
{
struct hello_world_bpf *skel;
int err;
signal(SIGINT, sig_handler);
signal(SIGTERM, sig_handler);
skel = hello_world_bpf__open();
if (!skel) {
fprintf(stderr, "Failed to open and load BPF skeleton\n");
return 1;
}
err = hello_world_bpf__load(skel);
if (err) {
fprintf(stderr, "Failed to load and verify BPF skeleton\n");
goto cleanup;
}
err = hello_world_bpf__attach(skel);
if (err) {
fprintf(stderr, "Failed to attach BPF skeleton\n");
goto cleanup;
}
while (!exiting);
cleanup:
hello_world_bpf__destroy(skel);
return err < 0 ? -err : 0;
}
Again, hopefully the program is self explanatory.
hello_world.skel.h
provides the following definitions for us.
I believe struct hello_world_bpf
skeleton just points to where the BPF
program is embedded within the final executable.
Then we get a reference to it with open()
, then load()
the BPF program into
the kernel with the syscall. Finally attach()
it to the whatever we asked it
to attach to - basically tells the kernel to execute our loaded program when
trace_pelt_se_tp()
is called.
We don’t do any processing in our simple program. We loop indefinitely until
user interrupts with CTRL+c
. libbpf will cleanup everything for us when we
destroy()
before we return.
In theory, these examples should be common to most if not all BPF programs that want to use BPF CO-RE. What should be different is what you attach to and how you process the data.
sudo apt install libelf1 zlib1g
cc -g -O2 -Wall -I bpf/usr/include -c hello_world.c -o hello_world.o
cc -g -O2 -Wall -I bpf/usr/include hello_world.o \
bpf/usr/lib64/libbpf.a -lelf -lz -o hello_world
You can pass -static
when compiling hello_world
and be able to carry this
binary around without having to install libelf
and zlib
on the target.
Useful for working with embedded systems ;-)
From one terminal window run:
sudo ./hello_world
And to observe the output, from another terminal window run:
sudo cat /sys/kernel/tracing/trace_pipe
You should see something like this
<idle>-0 [000] d.h. 414033.085918: bpf_trace_printk: [0] Hello world!
<idle>-0 [000] d.h. 414033.085919: bpf_trace_printk: [0] Hello world!
<idle>-0 [000] d.s. 414033.086875: bpf_trace_printk: [0] Hello world!
<idle>-0 [000] dNs. 414033.086881: bpf_trace_printk: [0] Hello world!
<idle>-0 [000] dNs. 414033.086881: bpf_trace_printk: [0] Hello world!
<idle>-0 [000] dNH. 414033.086883: bpf_trace_printk: [0] Hello world!
<idle>-0 [000] dNH. 414033.086883: bpf_trace_printk: [0] Hello world!
<idle>-0 [000] dNs. 414033.086892: bpf_trace_printk: [0] Hello world!
cat-2337243 [000] d.h. 414033.086904: bpf_trace_printk: [0] Hello world!
cat-2337243 [000] d.h. 414033.086905: bpf_trace_printk: [0] Hello world!
cat-2337243 [000] d... 414033.086910: bpf_trace_printk: [0] Hello world!
kworker/u4:0-2324534 [000] d... 414033.086916: bpf_trace_printk: [0] Hello world!
kworker/u4:0-2324534 [000] d... 414033.086917: bpf_trace_printk: [0] Hello world!
InputThread-1986 [000] d... 414033.086986: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087093: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087095: bpf_trace_printk: [0] Hello world!
hello_world-2337202 [001] d.h. 414033.087139: bpf_trace_printk: [1] Hello world!
hello_world-2337202 [001] d.h. 414033.087139: bpf_trace_printk: [1] Hello world!
firefox-1476923 [000] d.s. 414033.087142: bpf_trace_printk: [0] Hello world!
firefox-1476923 [000] d.s. 414033.087143: bpf_trace_printk: [0] Hello world!
firefox-1476923 [000] d.s. 414033.087143: bpf_trace_printk: [0] Hello world!
firefox-1476923 [000] d.s. 414033.087143: bpf_trace_printk: [0] Hello world!
firefox-1476923 [000] d.s. 414033.087146: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087189: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087191: bpf_trace_printk: [0] Hello world!
QXcbEventQueue-275310 [000] d... 414033.087217: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087254: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087256: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087316: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087318: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087360: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087362: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087396: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087397: bpf_trace_printk: [0] Hello world!
Xorg-1980 [000] d... 414033.087447: bpf_trace_printk: [0] Hello world!
BPF community is trying hard to ensure these programs can truly run anywhere. But in practice this is near impossible depends on what you do.
Almost by definition, you’re hooking into the internals of the kernel, which is not a user ABI and can change at any time. For example recently the function signature of sched_switch has changed and broke many BPF programs.
https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/
Be wary of these issues. I’m not sure if libbpf will fail silently or let you know you’re running on incompatible kernel. Always keep that in mind while developing and running BPF programs.
You can manage some of these problems though, check Andrii’s post
BPF CO-RE helps creating a stand alone binary that acts as a carrier of your BPF program, its loader and its processor.
To generate such a program, you need to take care of 3 elements:
vmlinux.h
to
get it to compile.bpftool
to generate
a skeleton that will make the job of loading and attaching your BPF program
a piece of cake.For a real use case, you’re likely to need to use mapping to store data in BPF program and then access these data from the userspace counterpart to do something with this data.
If you just want to examine what’s happening in side some parts of the kernel, the recipe in this blog should be adequate for a quick and dirty way to peek inside some kernel functionality - useful if you’re a kernel developer like me.
You can also extend the kernel with a new functionality.
To compile and run the BPF CO-RE program, you need kernel with these configs:
The kernel you compile your BPF CO-RE program on does NOT have to be the same as the one you’re running on. As long as they both have BTF, libbpf will handle the complexity. You’d still need to be wary of compatability issues as you’re hooking into the internals of the kernel which are not user ABI.
You can find the sample hello_world
program
here