This was written in November 2024, as a final project for Dartmouth’s class COSC059 - Principles of Programming Languages, with small modifications for this blog. I want to thank Professor Sergey Bratus for his guidance, and a big part of the credit of this piece goes to him and his class.
Introduction
As the Linux kernel and ecosystem grew, user mode applications began to develop a growing need for adding and running safe kernel extensions. Consequently, the Berkeley Packet Filter (which, as its name indicates, was originally created for packet filtering in the kernel) was extended into eBPF, which allows safe execution of untrusted user-defined extensions in the kernel. These eBPF programs can be used for a wide variety of applications, including packet capturing and filtering, as well as tracing, instrumentation, and debugging, among others. They are typically written in high level programming languages, such as some subsets of C, Python, or Lua [7], and then compiled into a circa-x86 bytecode for the eBPF VM. This VM has eleven 64-bit registers (plus a special auxiliary register for ALU sanitation), a program counter, and a 512 byte fixed-size stack. This code is then loaded into the kernel with the bpf()
syscall, and depending on the configuration, is either JIT-compiled into native instructions or ran in the eBPF interpreter [9]. In order to allow more flexibility and for optimization reasons, the eBPF programs also interact with the system using helper functions, which hook to other parts of the kernel.
However, as running user-defined code in the kernel can obviously be dangerous, some analysis needs to be done on the programs to ensure their safety. This is where the eBPF verifier comes in: given a program, it can statically analyze it and determine whether it is okay to run in kernel space, according to a list of constraints of what an eBPF extension can do, and more importantly, cannot do. In this article, we will explore how this verifier works, some of the problems it has, and some of the proposed solutions to fix it.
The eBPF verifier
The basic idea of the eBPF verifier is to statically analyze all possible permutations of the program before it is compiled [13]. The verifier builds a graph of all possible paths, based on branching instructions, and checks the safety of each one. If unreachable dead code is found, it will be rejected, as it could be a link in an exploit chain [13]. It will then look at the possible flows of the program top-down, while maintaining the possible states for each register, which includes the minimum and maximum possible values for the register as used in signed/unsigned and 32-/64-bit instructions. When branching instructions are found, the two paths update their states accordingly. For example, if we know the value of R2 is between 10 and 30, the instruction “if R2 > 20” will produce two branches with ranges 10-20 and 21-30 [9] [13]. It can also keep track of linked registers (if we have R3 = R2, and we know R2 > 10, then R3 > 10 too), and type and nullity information (which it uses to assert that the parameters passed to helper functions are of the correct type).
Using this information, the verifier attempts to enforce a set of constraints on what a program can do. Here is a (non-exhaustive) list of what a program can and cannot do [9] [13]:
- Programs must terminate. In order to avoid DoS attacks, all eBPF programs must terminate within a reasonable amount of time. Infinite loops and recursion are not allowed. Loops used to be disallowed too (a common workaround was having the compiler unroll the loops), until bounded loops were introduced. This also means no back edges are allowed.
- Pointers are not allowed to read arbitrary memory. In order to avoid leaking sensitive information, pointers are restricted to reading within the “safe” bounds of a map.
- Pointers cannot be returned, and pointer comparisons cannot be performed. This is meant to avoid leaking kernel addresses to user space. With this purpose, the verifier keeps track of what registers hold pointers and which hold scalar values, and only scalar values can be added or subtracted to a pointer.
- No dead code. As previously mentioned, this is to avoid it being used as a link in an exploit chain.
- Programs are not allowed to deadlock. Any held spinlocks need to be released, and only one lock can be held at a time.
- Programs are not allowed to read uninitialized memory. This is to avoid leaking sensitive information.
These guarantees should, in principle, allow the kernel to safely compile and run any code which is approved by the verifier. However, as we will see now, this is far from a perfect solution.
Problems with the eBPF verifier
To explain the main problems with the eBPF verifier, we can divide them into three main categories.
Flexibility issues
Some of the constraints the verifier tries to impose are hard or even impossible to statically determine on arbitrary code. The most obvious example is the termination requirement, as it is famously undecidable. For this reason, arbitrary loops and recursion are disallowed, and only bounded loops are allowed. In general, this translates to programmers having to “heavily massage their code for the verifier to accept it” [5]. Programmers might need to, for example, add unnecessary checks or redundant accesses. This creates a frustrating experience for developers, especially when trying to write longer programs.
Scaling issues
The problem of scaling is present in two separate ways: the difficulties of writing longer eBPF programs, and the verifier itself growing rapidly in complexity. For the former, as the verifier needs to check every possible execution path, this makes the longer programs become exponentially slower to verify. The verifier has a limited amount of storage for states [13], so developers are forced to break their programs down into smaller pieces to fit into this limit [8]. This has improved since the introduction of function-by-function verification [13] (which allows the usage of global functions that only need to be verified once), but it is still an issue. The latter, the growth of the complexity of the verifier itself, means there is a larger codebase for bugs to appear in, with logic that is harder to debug. At the time of writing, verifier.c
is over 22700 lines of code [14]. This is also manifested in the growth in complexity of helper functions. As these functions don’t need to go through the verifier, they are an attractive attack surface.
Vulnerabilities
The previous two problems could potentially be acceptable if the verifier guaranteed that no dangerous programmed would be accepted. However, this is not the case. Many exploits have been found, based on different vulnerabilities of the verifier. A couple illustrative examples are:
- CVE-2021-3490 [9]: This exploit attacked an error in how the range verifier considered 32 and 64 bit possible values. The verifier keeps a mask describing which bits of each register are known, which is updated with each instruction. However, the
BPF_AND
instruction skipped updating the 32 bit bounds if the lower bits of the source and destination registers are known, citing that the 64 bit counterpart of the function would take care of it. This assumption was broken if the lower 32 bits are known, but the upper 32 bits are unknown, and it turns out this is enough to create an exploit (even being able to leak kernel space pointers). A great explanation of this can be found in this blog post [9]. - CVE-2023-2163 [2] [4]: This exploit, which was found by the Buzzer eBPF fuzzer, concerns the verifier’s path pruning. This is an optimization strategy that is able to determine whether a state is equivalent to another one which was already proven safe and skip its verification. This includes marking some registers as “precise” (which forces the verifier to explore all state said register is involved in) when it is used for pointer arithmetic, as a constant to a helper function, etc. [2]. However, in some cases the verifier will fail to mark a register as contributing to the precision of another, which can cause it to incorrectly mark two paths as equivalent. This allows a register to hold a value outside the bounds the verifier thinks it could have, and can be used for exploits (including local privilege escalation). A more detailed explanation of the bug and the exploit can be found here [2].
Possible solutions
There are many proposals for different paths that can be taken to address these shortcomings. Some of them are detailed below:
PREVAIL
This project is the most developed among the ones we’ll discuss here. It was first introduced in 2019 [5], and has since grown significantly, even being adopted by the eBPF for Windows project [10]. The main idea behind it is to introduce Abstract Interpretation for the static analysis of programs. In particular, the authors propose the Zone abstract domain as a good middle ground between expressiveness and cost [5]. This allows tracking not only the range, as is done by the Linux verifier, but also bounds on the differences between pairs of registers. The motivation behind this is that this allows verifying cases such as whether the addition of two registers would be within memory bounds [3]. This solution offers better asymptotic performance (polynomial) than the Linux verifier, as it can perform abstract operations (such as joining ranges) which allow it to avoid branching unnecessarily for every possible path. This approach, however, has many of the same issues as the Linux verifier. It lacks flexibility, even showing more false positives than the current verifier, which means the developer experience is unlikely to be much better. It also could be subject to similar vulnerabilities due to possible errors in its logic. A good explanation to this project can be found here [3].
A Rust-based Approach
This proposal suggests using the Rust programming language to write eBPF programs. The motivation behind this is that if a module is written in safe Rust (no unsafe
blocks), then the compiler would handle most of the verification process. This is because of Rust’s effective elimination of undefined behavior and the compiler’s borrow checker, which together ensure memory safety and could guarantee many of the desired constraints (such as type safety and no arbitrary control-flow transfer) [8]. This pattern could be used with a trusted “kernel crate”, which would ensure resources are properly released when they go out of scope (by the same ownership model). The other protections could be ensured by lightweight runtime protections, for guaranteeing things like stack protection, termination, or safe resource management. This would also help improve the helper function situation, as functions meant to improve the eBPF language expressiveness (such as bpf_loop
or bpf_strncmp
) would no longer be necessary, and others would be made safer by being rewritten in Rust.
The way this could be implemented from a technical perspective can be compared to some Rust-based Zero Knowledge VMs. In particular, the implementation of helper functions in the eBPF context could benefit from looking at how these VMs handle external or optimized functions; in many cases, this is handled through the creation of syscalls in the VM to represent them [15], and which can then be made to link to the desired code. A similar approach could be beneficial here. By having a common semantic interface for both actual syscalls and helper functions, which can be properly distinguished when loading and linking the program, we reduce the potential attack surface of the system.
This approach moves most of the verification process to the compiler, and as such, requires a trusted compilation step. This is problematic, as the Rust toolchain is quite large and cannot be required for users to use eBPF extensions. The authors propose piggybacking on kernel support for signed kernel modules [8]. This would mean a trusted compiler has to sign the compiled extension module, and this signature would be validated when the program is loaded, before performing some load-time fix up (such as resolving helper function addresses and other relocations). Thus, we introduce a new trust assumption, as the user no longer has the ability to verify the modules it loads and would need to trust a third party to compile it.
exoBPF
This approach suggests having each module carry with it a proof of its correctness [1]. This moves the burden of proving a program is correct from the kernel while still maintaining minimal trust assumptions (as the kernel would perform a light-weight proof checking step when loading the module) and has a strong correctness guarantee. At the same time, it also allows more flexibility for application development. Applications can choose different methods of proof generation to show that more complex behavior is correct, as long as the generated proof is compatible with the checker. For example, the authors use Lean to specify the eBPF VM and generate proofs for the safety and correctness of some programs. They also choose Lean for the proof format, as many independent proof checkers have been developed (C++, Scala, Rust, Haskell).
In theory, this provides the best possible compromise. The expensive proving stage is moved to the developer and does not need to be performed in the kernel, while not making the trust assumptions weaker as the verification step is still performed there. In practice, however, the prototype they provide generates large proofs (28MB) and takes about 7.5s to verify using a Rust-based proof checker. The authors say this could be reduced by having the checker only verify the part of the proof specific to the particular eBPF program (and not having to prove the whole eBPF model every time), which would reduce this to ~8kB proofs and take about ~1.3s to verify. Even then, this procedure could be very expensive (specially at proof time), and requires maintaining a complex proof generator for the eBPF model. More information on this approach, as well as links to the prototype source code, can be found here [1].
Conclusion
As the eBPF ecosystem continues to advance, the verifier plays a crucial role in ensuring its safety. This is a very complex task, which has made the verifier grow considerably and introduced many high-risk vulnerabilities. As such, and considering the issues outlined above, it is necessary for the verifier to be fixed or replaced entirely with a module that avoids these same issues. While none of the outlined solutions is perfect, they show promising paths forward in the verification of eBPF programs.
References
[1] A proof-carrying approach to building correct and flexible BPF verifiers. (2021). Linux Plumbers Conference 2021. https://lpc.events/event/11/contributions/944/
[2] Blog: A deep dive into CVE-2023-2163: How we found and fixed an eBPF Linux Kernel Vulnerability. (n.d.). https://bughunters.google.com/blog/6303226026131456/a-deep-dive-into-cve-2023-2163-how-we-found-and-fixed-an-ebpf-linux-kernel-vulnerability
[3] Chaignon, P. (2023, October 30). PREVAIL: Understanding the Windows EBPF verifier. pchaigno. https://pchaigno.github.io/ebpf/2023/09/06/prevail-understanding-the-windows-ebpf-verifier.html
[4] eBPF & Cilium Community. (2024, September 12). Lessons from the Buzz - Juan José López Jaimez [Video]. YouTube. https://www.youtube.com/watch?v=iMEjhl0V_Rw
[5] Gershuni, E., Amit, N., Gurfinkel, A., Narodytska, N., Navas, J. A., Rinetzky, N., Ryzhyk, L., & Sagiv, M. (2019). Simple and precise static analysis of untrusted Linux kernel extensions. ACM SIGPLAN. https://doi.org/10.1145/3314221.3314590
[6] google. (n.d.). Linux Kernel: Vulnerability in the eBPF verifier register limit tracking. GitHub. https://github.com/google/security-research/security/advisories/GHSA-hfqc-63c7-rj9f
[7] Iovisor. (n.d.). GitHub - iovisor/bcc: BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more. GitHub. https://github.com/iovisor/bcc
[8] Jia, J., Sahu, R., Oswald, A., Williams, D., Le, M. V., & Xu, T. (2023). Kernel extension verification is untenable. HOTOS ’23: 19th Workshop on Hot Topics in Operating Systems. https://doi.org/10.1145/3593856.3595892
[9] Kernel Pwning with eBPF - a Love Story - chompie at the bits. (n.d.). https://chomp.ie/Blog+Posts/Kernel+Pwning+with+eBPF+-+a+Love+Story
[10] microsoft. (n.d.). GitHub - microsoft/ebpf-for-windows: eBPF implementation that runs on top of Windows. GitHub. https://github.com/microsoft/ebpf-for-windows?tab=readme-ov-file
[11] NCC Group. (2024). eBPF Verifier Code Review. https://www.nccgroup.com/media/4lilthtf/ncc_group_nccgroup_e015561_report_2024-11-11_v10.pdf
[12] Unprivileged eBPF disabled by default for Ubuntu 20.04 LTS, 18.04 LTS, 16.04 ESM. (2022, March 9). Ubuntu Community Hub. https://discourse.ubuntu.com/t/unprivileged-ebpf-disabled-by-default-for-ubuntu-20-04-lts-18-04-lts-16-04-esm/27047
[13] Verifier - eBPF Docs. (n.d.). https://docs.ebpf.io/linux/concepts/verifier/
[14] verifier.c. (n.d.). GitHub. https://github.com/torvalds/linux/blob/master/kernel/bpf/verifier.c
[15] Zero Knowledge VM - SP1 Book. (n.d.). https://docs.succinct.xyz/writing-programs/precompiles.html