Qais Yousef

Geek/Engineer - Kernel Hacker - Low Level Software - Views my own


github: qais-yousef twitter: @_qyousef

Intro to BPF CO-RE

Apr 23, 2022

In the past I’ve been very happy in using BCC and bpftrace to jump into the BPF wagon. But since I spend a lot of my time on embedded systems (mostly Android phones nowadays), getting these tools on these systems is a bit of a headache.

The BPF community has been very active and hard to catch up with. I’ve heard about libbpf and CO-RE (Compile Once - Run Everywhere) since a while now but never managed the time to get over the learning curve.

But this time has finally come and I thought I’d share my learning experience to help others in ramping up and jump on the wagon too. It definitely felt confusing for me to learn about the various bits and pieces involved and how they all tie together.

Refernces

First, Andrii Nakryiko who’s an active BPF developer/maintainer has written comprehensive blogs on the topic and what I’m doing here is a mere summary of what I’ve learnt from mostly reading his posts.

Structure

BPF CO-RE goal is to produce a single executable binary that you can run on any system.

This binary is split into 3 parts:

  1. BPF program that is embedded in the final executable binary.
  2. User space counter part that loads the BPF program and processes its output.
  3. Glue logic that is smartly hidden inside libbpf and aided by bpftool.

To understand how all of these should be combined together, let’s start with libbpf.

libbpf

https://github.com/libbpf/libbpf

libbf is part of kernel tree and is mirrored on github for convenience. It requires a kernel built with

Based on my humble understanding it covers the following aspects:

What do you need to know and do

In an ideal world, you should be able to install this library from your favourite distro. But based on my experience at least, and based on what I’ve seen others have to do, you’d need to include and compile this yourself as part of your BPF application.

Simply all you need to do is make libbpf part of your project and compile it so that when you compile your BPF program and userspace program later you can reference it.

Clone

git submodule add https://github.com/libbpf/libbpf libbpf

Compile

make -C libbpf/src BUILD_STATIC_ONLY=1 DESTDIR=$(pwd)/bpf install

NOTE: You must install it in bpf subdirectory as we’ll see later there are autogenerated skeleton files that references headers in <bpf/$FILE.h>.

bpftool, BTF and vmlinux.h

Purpose of BPF is to attach to points of interests within the kernel and collect some data, or extend the kernel with new functionality.

To do so, you need to reference and use internal data structures that are not exported in normal linux headers. To overcome this, the BPF community introrduced BTF (BPF Type Format) that records information about kernel data structures that allows BPF programs to reference safely at runtime.

It also helps with compiling your BPF programs that need to find definitions of whatever structs you’re trying to access.

For the latter, you can generate a vmlinux.h from BTF which contains all definitions of internal kernel structures in a single file that you can include in your BPF program source files.

To generate this vmlinux.h, you need to use bpftool which is a utility available from linux-tools package.

What do you need to know and do

BPF uses BTF to find out information about data structures. libbpf is probably hiding lots of this glue logic when it loads and run the BPF program. But for compiling your CO-RE program, all you need to know is that you need a BTF file and bpftool to generate vmlinux.h to be able to compile your program.

Install bpftool

sudo apt install linux-tools-$(uname -r)

Generate vmlinux.h

bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

Pre-requisites for BTF

Kernels must be compiled with these config options

My first BPF Program

hello_world.bpf.c

#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

char LICENSE[] SEC("license") = "GPL";

SEC("raw_tp/pelt_se_tp")
int BPF_PROG(handle_pelt_se, struct sched_entity *se)
{
	int cpu = BPF_CORE_READ(se, cfs_rq, rq, cpu);

	bpf_printk("[%d] Hello world!", cpu);
	return 0;
}

I hope it’s self explanatory. Andrii’s blogs delves into great details, please check them out.

Nonetheless few things worth highlighting.

SEC is a simple macro to put functions/variables in a section. It looks like libbpf uses this to identify what to do.

SEC(raw_tp/pelt_se_tp) tells libbpf to attach handle_pelt_se() function into pelt_se_tp raw tracepoint. A raw tracepoint is where the kernel calls trace_pelt_se_tp().

BPF_PROG() is a helper macro to define handle_pelt_se() to get access to the args passed to trace_pelt_se_tp(), otherwise you’d need to do some extra work to access struct sched_entity *se inside handle_pelt_se().

You can’t just read kernel memory from BPF programs. You must use helper functions from libbpf. BPF_CORE_READ() is the macro to use to access se->cfs_rq->rq->cpu. It expands to multiple bpf_core_read(). Andrii’s blog post explains this very nicely.

bpf_printk() will write into the trace buffer. If you have used BCC or bpftrace before you’d be familiar with such call. In practice this is useful only for development/debugging. In reality you’d want to use mappings, which we don’t cover in this post.

What do you need to know and do

We must use clang to compile. I think clang/llvm v10 or above is advised. A newer one might be even better as newer features keeps getting added that relies on latest version of clang. gcc should gain support to compile BPF in the near future, if hasn’t done already!

llvm-stip is required to remove DWARF info from the object file which results in a bloated binary. Important if you wanted to deploy in production environment. Can be ignored otherwise.

Install clang and llvm

sudo apt install clang llvm

Compile

clang -g -O2 -Wall -target bpf -D__TARGET_ARCH_arm64 -I bpf/usr/include \
	-c hello_world.bpf.c -o hello_world.bpf.o

llvm-strip -g hello_world.bpf.o

hello_world.bpf.o can be loaded into the kernel now using the syscall. But that’s too much hassle. There’s an easier way to get this handled via libbpf.

BPF Skeleton

BPF Skeleton is an autogenerated .skel.h from the compiled hello_world.bpf.o by bpftool. It simply does all the magic required to load your bpf program from another userspace program. It produces a set of functions that you can just call and tada, your bpf program will be loaded and running!

What do you need to know and do

We shall see what .skel.h file contains in the next section, but for now just produce it with this command:

bpftool gen skeleton hello_world.bpf.o > hello_world.skel.h

Userpace Counterpart: The Carrier, The Loader and The Processor

hello_world.c

#include <bpf/libbpf.h>
#include <signal.h>
#include <stdio.h>

#include "hello_world.skel.h"

static volatile bool exiting = false;

static void sig_handler(int sig)
{
	exiting = true;
}

int main(int argc, char **argv)
{
	struct hello_world_bpf *skel;
	int err;

	signal(SIGINT, sig_handler);
	signal(SIGTERM, sig_handler);

	skel = hello_world_bpf__open();
	if (!skel) {
		fprintf(stderr, "Failed to open and load BPF skeleton\n");
		return 1;
	}

	err = hello_world_bpf__load(skel);
	if (err) {
		fprintf(stderr, "Failed to load and verify BPF skeleton\n");
		goto cleanup;
	}

	err = hello_world_bpf__attach(skel);
	if (err) {
		fprintf(stderr, "Failed to attach BPF skeleton\n");
		goto cleanup;
	}

	while (!exiting);

cleanup:
	hello_world_bpf__destroy(skel);
	return err < 0 ? -err : 0;
}

Again, hopefully the program is self explanatory.

hello_world.skel.h provides the following definitions for us.

I believe struct hello_world_bpf skeleton just points to where the BPF program is embedded within the final executable.

Then we get a reference to it with open(), then load() the BPF program into the kernel with the syscall. Finally attach() it to the whatever we asked it to attach to - basically tells the kernel to execute our loaded program when trace_pelt_se_tp() is called.

We don’t do any processing in our simple program. We loop indefinitely until user interrupts with CTRL+c. libbpf will cleanup everything for us when we destroy() before we return.

What do you need to know and do

In theory, these examples should be common to most if not all BPF programs that want to use BPF CO-RE. What should be different is what you attach to and how you process the data.

Install libelf and zlib

sudo apt install libelf1 zlib1g

Compile

cc -g -O2 -Wall -I bpf/usr/include -c hello_world.c -o hello_world.o

cc -g -O2 -Wall -I bpf/usr/include hello_world.o \
		bpf/usr/lib64/libbpf.a -lelf -lz -o hello_world

You can pass -static when compiling hello_world and be able to carry this binary around without having to install libelf and zlib on the target. Useful for working with embedded systems ;-)

Run and examine the output

From one terminal window run:

sudo ./hello_world

And to observe the output, from another terminal window run:

sudo cat /sys/kernel/tracing/trace_pipe

You should see something like this

          <idle>-0       [000] d.h. 414033.085918: bpf_trace_printk: [0] Hello world!
          <idle>-0       [000] d.h. 414033.085919: bpf_trace_printk: [0] Hello world!
          <idle>-0       [000] d.s. 414033.086875: bpf_trace_printk: [0] Hello world!
          <idle>-0       [000] dNs. 414033.086881: bpf_trace_printk: [0] Hello world!
          <idle>-0       [000] dNs. 414033.086881: bpf_trace_printk: [0] Hello world!
          <idle>-0       [000] dNH. 414033.086883: bpf_trace_printk: [0] Hello world!
          <idle>-0       [000] dNH. 414033.086883: bpf_trace_printk: [0] Hello world!
          <idle>-0       [000] dNs. 414033.086892: bpf_trace_printk: [0] Hello world!
             cat-2337243 [000] d.h. 414033.086904: bpf_trace_printk: [0] Hello world!
             cat-2337243 [000] d.h. 414033.086905: bpf_trace_printk: [0] Hello world!
             cat-2337243 [000] d... 414033.086910: bpf_trace_printk: [0] Hello world!
    kworker/u4:0-2324534 [000] d... 414033.086916: bpf_trace_printk: [0] Hello world!
    kworker/u4:0-2324534 [000] d... 414033.086917: bpf_trace_printk: [0] Hello world!
     InputThread-1986    [000] d... 414033.086986: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087093: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087095: bpf_trace_printk: [0] Hello world!
     hello_world-2337202 [001] d.h. 414033.087139: bpf_trace_printk: [1] Hello world!
     hello_world-2337202 [001] d.h. 414033.087139: bpf_trace_printk: [1] Hello world!
         firefox-1476923 [000] d.s. 414033.087142: bpf_trace_printk: [0] Hello world!
         firefox-1476923 [000] d.s. 414033.087143: bpf_trace_printk: [0] Hello world!
         firefox-1476923 [000] d.s. 414033.087143: bpf_trace_printk: [0] Hello world!
         firefox-1476923 [000] d.s. 414033.087143: bpf_trace_printk: [0] Hello world!
         firefox-1476923 [000] d.s. 414033.087146: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087189: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087191: bpf_trace_printk: [0] Hello world!
  QXcbEventQueue-275310  [000] d... 414033.087217: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087254: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087256: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087316: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087318: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087360: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087362: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087396: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087397: bpf_trace_printk: [0] Hello world!
            Xorg-1980    [000] d... 414033.087447: bpf_trace_printk: [0] Hello world!

Compatibility Issues

BPF community is trying hard to ensure these programs can truly run anywhere. But in practice this is near impossible depends on what you do.

Almost by definition, you’re hooking into the internals of the kernel, which is not a user ABI and can change at any time. For example recently the function signature of sched_switch has changed and broke many BPF programs.

https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/

Be wary of these issues. I’m not sure if libbpf will fail silently or let you know you’re running on incompatible kernel. Always keep that in mind while developing and running BPF programs.

You can manage some of these problems though, check Andrii’s post

Summary

BPF CO-RE helps creating a stand alone binary that acts as a carrier of your BPF program, its loader and its processor.

To generate such a program, you need to take care of 3 elements:

  1. Compile libbpf - which might not become necessary as the tools mature in the future..
  2. Compile your BPF program. You most likely need to generate vmlinux.h to get it to compile.
  3. Compile your userspace program. Using the help of bpftool to generate a skeleton that will make the job of loading and attaching your BPF program a piece of cake.

For a real use case, you’re likely to need to use mapping to store data in BPF program and then access these data from the userspace counterpart to do something with this data.

If you just want to examine what’s happening in side some parts of the kernel, the recipe in this blog should be adequate for a quick and dirty way to peek inside some kernel functionality - useful if you’re a kernel developer like me.

You can also extend the kernel with a new functionality.

To compile and run the BPF CO-RE program, you need kernel with these configs:

The kernel you compile your BPF CO-RE program on does NOT have to be the same as the one you’re running on. As long as they both have BTF, libbpf will handle the complexity. You’d still need to be wary of compatability issues as you’re hooking into the internals of the kernel which are not user ABI.

You can find the sample hello_world program here