Pitfalls of relying on eBPF for security monitoring (and some solutions) by jruohonen

Share This Article

Sed ut perspiciatis unde.

By Artem Dinaburg

eBPF (extended Berkeley Packet Filter) has emerged as the de facto Linux standard for security monitoring and endpoint observability. It is used by technologies such as BPFTrace, Cilium, Pixie, Sysdig, and Falco due to its low overhead and its versatility.

There is, however, a dark (but open) secret: eBPF was never intended for security monitoring. It is first and foremost a networking and debugging tool. As Brendan Gregg observed:

eBPF has many uses in improving computer security, but just taking eBPF observability tools as-is and using them for security monitoring would be like driving your car into the ocean and expecting it to float.

But eBPF is being used for security monitoring anyway, and developers may not be aware of the common pitfalls and under-reported problems that come with this use case. In this post, we cover some of these problems and provide workarounds. However, some challenges with using eBPF for security monitoring are inherent to the platform and cannot be easily addressed.

Pitfall #1: eBPF probes are not invoked

In theory, the kernel is never supposed to fail to fire eBPF probes. In practice, it does. Sometimes, although very rarely, the kernel will not fire eBPF probes when user code expects to see them. This behavior is not explicitly documented or acknowledged, but you can find hints of it in bug reports for eBPF tooling.

This bug report provides valuable insight. First, the issues involved are rare and difficult to debug. Second, the kernel may be technically correct, but the observed behavior on the user side is missing events, even if the proximate behavior was different (e.g., too many probes). Comments on the bug report present two theories for why events are missing:

First, there is a set limit on the number of kRetProbes that the kernel can have active at once. As of kernel 6.4.5, the default limit is 4,096. Attempts to create more kRetProbes will fail, resulting in a missed event.
Second, the callback logic for a kProbe and a kRetProbe is slightly different, which means that sometimes a kProbe will not see a matching kRetProbe, resulting in a missed event.

More of these issues are likely lurking in the kernel, either as documented edge cases or surprise emergent effects of unrelated design decisions. eBPF is not a security monitoring mechanism, so there is not a guarantee that probes will fire as expected.

Workarounds

None. The callback logic and value for the maximum number of kRetProbes are hard-coded into the kernel. While one can manually edit and rebuild the kernel source, doing so is not advisable or feasible for most scenarios. Any tools relying on eBPF must be prepared for an occasional missing callback.

Pitfall #2: Data is truncated due to space constraints

An eBPF program’s stack space is limited to 512 bytes. When writing eBPF code, developers need to be particularly cautious about how much scratch data they use and the depth of their call stacks. This limit affects both the amount and kind of data that can be processed using eBPF code. For instance, 512 bytes is less than the longest permitted file path length, which is 4,096 bytes.

Workarounds

There are multiple options to get more scratch space, but they all involve cheating. Thanks to the bpf_map_lookup_elem helper, it’s possible to use a map’s memory d

Pitfalls of relying on eBPF for security monitoring (and some solutions) by jruohonen

Pitfalls of relying on eBPF for security monitoring (and some solutions) by jruohonen

Share This Article

Newsletter

Pitfall #1: eBPF probes are not invoked

Workarounds

Pitfall #2: Data is truncated due to space constraints

Workarounds

HackTech

Leave a comment Cancel reply

Editor's Choice

Pitfalls of relying on eBPF for security monitoring (and some solutions) by jruohonen

Pitfalls of relying on eBPF for security monitoring (and some solutions) by jruohonen

Share This Article

Newsletter

Pitfall #1: eBPF probes are not invoked

Workarounds

Pitfall #2: Data is truncated due to space constraints

Workarounds

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter