 Hello everyone. Happy to be speaking here and happy that all of you are here. I'm Anna. I work at Isovalent on all observability things related to networking and security use cases. My software engineer building solutions on top of IPF for network and security visibility. And today I will talk about using EVPF for observability, what are common issues with this and show you some examples how we see it, EVPF being commonly used for observability and common use cases we see. So EVPF is present in observability space for quite a while. And it promises a lot. It promises that we will get no instrumentation observability. We just drop in something and magically get observability for our applications. It promises complete visibility because EVPF is running in the kernel. It sees all the applications running on the machine. And especially important in environments like Kubernetes when applications can be just scheduled anywhere and application developers don't generally need to think about where they are running but EVPF programs still can see them. EVPF also promises low overhead of the visibility we are getting and reliability. Reliability is part that in my opinion a bit underappreciated because EVPF programs are verified. They have to be checked that they are safe to run. It means that they are pretty reliable. They are not bug free of course but more reliable than most software running right. And for all of these things like to get observability with no instrumentation there are other approaches like we can install some agents, we can inject some sidecars but I'm sure many of us have experienced this dreadful incident that for example affected connectivity between sidecar and then application and then when we need data at the most we lose them. With EVPF because programs are running in the kernel we in general they just stay there and run. They don't cause that many issues as user space agents or sidecars that can crash so they add this extra layer of reliability. So these are all things that EVPF promises us for observability and how it works. I assume not everybody is familiar so EVPF is here for a while but most of us use it indirectly and not really writing EVPF code so short overview of how it works. We have a user space agent so usually there is some sort of agent that loads EVPF programs then EVPF programs are verified by the verifier. Verifier is a very interesting piece of software that I'm not going to go into but it essentially checks that programs are safe to run but they want to use all memory, won't crash the machine, things like that. Then the programs are just in time compiled so this piece provides us a minimal overhead because everything is just running in the kernel and it's compiled efficiently and then the EVPF program is hooked. It is hooked to on this picture it's an example Cisco but it doesn't have to be Cisco there are many different hook points and it's hooked somewhere and then it runs on every execution of this hook point and this is the bit that gives us this complete visibility because the EVPF program doesn't care what triggered this hook point. We have applications running in Britain in many languages maintain many different teams no matter what language application is written in the hook point will be executed and there's no way for application to stop it. I was talking about these hook points so what they are here you have a sneak peek of a tool that I'm at the moment working on Tetragon. This is an example of Tetragon configuration and a tracing policy which is pretty low level configuration where you can define which hook point exactly you want to hook into which ones you want to observe. We have K-Propes. K-Propes are kernel probes which are any kernel functions practically. Essentially you can hook into any kernel branch function. There are K-RedPropes which are like K-Propes but executed on the return of the function not when it starts. Very often in observability context we use both K-Propes and K-RedPropes because we usually want to understand what the function was doing, what it did exactly and not only that it's executed. There are trace points which are like K-Propes but static so a bit more stable and there are also U-Propes and U-RedPropes so like K-Propes again but user interface user-level probes when we can hook into user application symbols. Also very commonly used in observability. I don't have them in this example but very commonly used for example for automated auto instrumentation for distributed tracing. Now this is promising us a lot but you might now think how this can actually help us with observability because it's cool that we can see what kernel is doing but this is not exactly what we call observability. What actually we would call observability then? So in the past at this point we would quote somebody who has some smart things to say about it. These days we are asking chat GPT for things like that so I did this so this is what chat GPT told me what is observability. Observability is the capability to inspect and understand a system state based on its external outputs such as locks, metrics and traces. It involves collecting and analyzing these data points to diagnose issues, optimize performance and ensure a system's reliability. Observability provides deep insights into the system facilitating proactive problem solving and improving overall system understanding. Okay this this sounds like reasonable to me. It's not different from what I actually hearing from humans. The key points are that there is a system that exposes some data. Locks, metrics, traces or other data but there is some data exposed by the system. We are collecting this data in some database and then we are expecting that this data will let us understand the system. So without touching the system we just need to query the data. Probably we need some query language, some visualization maybe but the goal is to understand the system for whatever we are doing. All right and now if we think back about what we are doing with eBPF we are hooking some kernel functions or some uprobes whatever. How does it actually help us understand the system? The fact that some kernel functions are executing, yeah of course they are executing, they are doing something but this is something happening in the kernel. What and we want to understand not the kernel but the whole system, our applications and the wider system. So in my opinion the two key things to really ensure system observability is context and correlation. What we need in an observability tool is we need context. We need to have this user context to understand not only that kernel is doing something but also understand it in the context of user requests of some business application whatever the application is doing. And we need correlation because the system is doing so much that if we just have some individual events about what is going on then still understanding the system from these individual events would be very, very difficult because how to build full picture from all of them. And one feature in eBPF that really allows us to use it for observability and for anything else really, eBPF maps. eBPF maps are like a data store in eBPF world. So wherever we would use eBPF map there are many different kinds of them but they are key value stores starting in kernel memory. They are used for a few things. So first of all they allow us to pass the high level context to kernel. So this is the bit that allows us to build observability tool with eBPF not only just observe what the kernel is doing and inform user about it. We can pass this high level context for example from Kubernetes and MetaLisa would be standard example or any sort of business context really. We can correlate between different events so different eBPF programs can use the same map and thanks to that we can correlate for example between K-Prop and K-Rap-Prop. This is what we do very often but also between completely different events like a network event and a file event. Why not? We can correlate them to understand the full picture of what is going on. And then kind of basic usage of eBPF maps and fundamental to everything we do. We need to somehow pass the data from the kernel to the user. So we use eBPF map for that too because both kernel and the user space agent can access it. Let's zoom in to this bit a little bit. So in general in observability tooling there are two modes of how data is retrieved. Either we pull things or push things. Typically in primitives for example we pull metrics from the targets other tools might be pushing some some events. So two typical how we typically transfer data between user space and kernel space we push events through ring buffer with the user space agent and then the user space engine has to keep with the ring buffer and just read this event. And we can also pull metrics from a map. I will dive into this a little bit later. All right. So the data problem. Here we see this is a graph from recent survey from Graphana Labs about biggest concerns about observability. And working on eBPF based observability tooling what I would like to say here is that the biggest concern is that instrumentation is difficult and that we are like bam eBPF no observability no no instrumentation and full observability or like that's the overhead is big and we are like bam eBPF low overhead. But it doesn't look like that exactly. What we see here is that the biggest concern is cost and eBPF can help with this. All of these aspects I talked about contribute to cost and can help to reduce cost. But cost is a complex thing. There are a million factors to it and it's pretty hard to measure how we are actually affecting the cost of observability tooling of the whole pipeline. The next biggest concerns are complexity, originality, signal to noise ratio and data retention time. So essentially we see here a lot of things related to data, to cardinality of data, how much of this data is signal, how much is noise that we need to somehow filter out to actually find the signal. How much data can we actually store to make troubleshooting possible but still keep the cost reasonable. The eBPF based observability tools produce different kinds of data. Here are a few of these tools. Petragon which produces events. By events I mean like locks but structure. So essentially locks with a schema that are just exported in JSON format and metrics. There is Pixi which exports metrics and profiles. There are a few other eBPF based profiles to like a parka or a pyroscope. There is recently more recently released Kefana Bela which provides metrics to and tracepans. So it does auto instrumentation for distributed tracing and exports tracepans. Currently I work on Petragon so I know it's the most so I will show a few examples from this one. This is graphics describing Petragon in more details. It was built as a security tool. So the main use cases are focused on security but in reality the core functionality is pretty generic loader of probes. So we can use it for million different use cases from security domain but also from many other domains because if you can load any K probe or U probe then we can do whatever we want with events we are getting from it. So here is the example configuration for Petragon. Petragon in general is configured with tracing policy CRD and this example is a tracing policy for monitoring file operations. Now it is pretty low level and if you are wondering how you can see that it hooked into security with higher permission K probe. If you are wondering how I came up with this particular K probe I did not. I generally rely on kernel engineers who know kernel very well and know exactly where to hook to for example observe fire operations because it turns out there are many different ways how the application can actually write to files. If you like me are not a kernel developer then in the Petragon repository there is quite big directory with examples of tracing policy for many many different use cases from security domain but also some networking use cases so I invite you to go to examples directory in Petragon repository and browse these examples. All right so if we load this policy to observe all file operations you can imagine that this would produce a lot of data because there are applications that are doing a lot of file operations applications developers don't that often need to think about it because a library is abstracted for us but most applications are doing a ton of file operations all the time so what do we do with all this data and how how we generally approach using evpf for observability the first step is to understand why we are using even using evpf for observability and how how it typically looks like so most companies when they are building their observability pipeline then go through a few stages and this is a very very simplified view of this but first when we first build applications usually we don't put any instrumentation in there and there is very good reason for that because instrumentation is an additional overhead and if code is evolving rapidly then this overhead either slows us down or we just end up with very very incorrect instrumentation which is worse than no instrumentation because it is completely misleading to us so this is this is not bad that yeah we don't write instrumentation from day one but then we also end up with painful incidents and very often the second step is do something go from nothing to something from from zero to one and this step is usually a game changer so there are plenty of tools now that provide some sort of auto instrumentation that provide out of the box solution so that we are getting some visibility with one command for example and this is great this already allows us to troubleshoot incidents more easily while it's also it's still a low maintenance solution because we just installed something and we don't have to have the whole team actually maintaining it and then there are probably a few other stages that I skipped on this slide but at some point the company has actually mature observability platform a team maintaining it and some business specific instrumentation that allows teams to troubleshoot incidents efficiently however usually there are many teams and many applications that are different maturity levels different stages of of this whole journey and usually we still want to keep low overhead and maybe in particular in mature big organizations we want to keep very low overhead because at scale it really adds up and it does adds up very quickly so ABPF can help with step one a lot because we get no instrumentation and many ABPF based solutions are provided out of the box experience that you install something and you get the whole solution but it can also help with in mature organizations where there are many applications and this full visibility aspect of ABPF really kicks off here that it can see everything that is happening not only applications that are well instrumented and we are getting low overhead with what matters a lot at scale many ABPF based tools are not really providing something I completely knew that is impossible with any other tools but are providing more efficient or easier to use easier to maintain replacement and this is one example of this so with Tetragon we built some Grafana dashboards for networking and security use cases which this example is showing the TCP traffic and this kind of data is also possible to get with built-in C-advisor metrics built into Kubernetes and Kube Prometheus dashboard with Tetragon in general it's it's likely to be more efficient because data is aggregated in in the kernel but and also we might get we get extra information like the binary association so you might need a team or not but probably it doesn't make sense to keep both of these tools which is common cause of of the overhead in the observability space that we just have so many tools that are duplicating each other and all right the step two what we want to observe so filtering and I will show now a few examples of Tetragon tracing policy that we saw previously so we saw this example with pile operations and the solution to to like all the fluid of data we are we would get with this is to filter by only some files for example here something that security teams like very much because this was built in with security teams in mind filtering only writes to tc password so we can write configuration like that and Tetragon will pass this configuration to the kernel so the filtering actually happens in bpf programs not in user space this is what what allows us to to have this very very low overheads because yeah everything is happening the kernel only relevant events are pushed to user space the second example is something that is pretty new I think part of it is still in in request but stack traces and on crushes so we can hook into function that that is called when application is seq folding and get stack trace for it and many many companies use some tools that that provide you that powers or the collect all the errors for you and and like group them for you however what we saw if we saw is very beneficial with bpf is that it provides this extra layer of reliability and when application seq folding this is something you you really need to investigate you really need this to have this information this stack traces so this kind of policy can be used to provide this extra layer of visibility of like reliability and be passed plugged in into some sort of stack trace parser and next example is external traffic so very often trafficking Kubernetes cluster is observed by for example a service mesh but very often also external traffic is something that people care more about for various reasons we've worked in with a few air-gapped environments where people wanted to detect attempts to external traffic to like external connections not only an even unsuccessful attempt so not only like ongoing external traffic which shouldn't be happening at all customer monitoring is a big thing external traffic egress traffic usually cost more doubt providers usually charge up more from for it so monitoring it to monitor cost is usually important and such requests external such requests are generally common and attack vector and tracing policy for this looks like this so here we hook into tcp to function that's called when tcp packet is sent but we filter by cluster ciders so we filter out all traffic that is going that is happening within ports and services ciders and localhost so then we will get events only for external traffic and just because I like showing the pf code on slides and to show that it's the filtering is happening in turner this is actual code from tetragone you can go on linking the slides and see the full code this is heavily truncated because yeah it would be filled slide but what is happening is that we restore this filter from tracing policy in a map and here you can see the map look up which compares the address that we are handling right now to the range in the map and decides if we should get an event on this do something on this IP or not okay and then last thing and maybe something that that's it's most that we first think of when we want to reduce amount of data aggregation so aggregation means metrics in general this is how metrics usually how metrics flow works so first collect metrics expose then in your metrics solution for example Prometheus pull this metrics start and somewhere and then query visualize so the part on the right is usually Prometheus or different stages there can be replaced by different more efficient solutions the part on the left is the instrumentation part and usually it's thought of us one thing usually collect metrics and expose metrics is like one instrumentation one library however we can separate these parts we can in particular we can defer the collection parts to ebpf and how we do this is this is example where we collect metrics per so we want metrics with labels from with Kubernetes metadata because this gives us this high level context and this allows us to associate events with particular with like teams in the organization for example so in the tetragone agent we watch bots this can be done with the case by watching api server or container runtimes we actually switch from api server to container runtimes at some point because of security reasons but in general like both of these are possible for observability tools and then we pass this metadata we got about the pods to the kernel to the we write them to the bpf map where we map container id to Kubernetes metadata the next step is that we have bpf program and on every event when where it's hooked it gathers labels from this map and it updates metrics start in another map and as simple as that we in user space what we need to do this is again a bit truncated code but the only thing that we need to do is read this metrics iterate over this metrics map to collect metric with for all the all the kubernetes labels and expose them with with primitives all right i'm think i'm out of time so thank you very much for listening do we have four questions if anybody has any questions then yeah feel free to find me here or anytime at the conference sorry quick question i'm here i'm here ipv6 support sorry ipv6 support do we support ipv6 with this ipv6 yeah yes yeah so uh this code that for filtering i showed yeah actually one thing that i truncated there was ipv6 because it wouldn't fit on a slide but yes absolutely we support ipv6 yeah thank you for your talk sorry i have a question about ebpf you were showing some c code yes i played a bit with ebpf and i was wondering if maybe someone is working maybe you know on another language on top of it to maybe make it more mainstream could be luah or maybe even i mean why is it see do you know if some people are trained to also make ebpf available for no kernel developers is it yeah so so it's see because kernel is written in c i know there are people writing some bpf code in rust uh i know it's a thing but i haven't digged that much into it so yeah this this is another language i know it's possible yeah please check them hello so do we have a bunch of default policies or blueprints which companies who want to move to tecagon can get for free or by default or is it like everything has to be created from scratch for example the example you showed where the etcd password access changing that is a warning alert like do we get those things for free or is it like we have to create those rules ourselves you have to create those rules yourself you don't necessarily have to write them yourself you can use the examples directory in the in the trigonal repository so yeah this is this is the current solution for this because writing this tracing policies is difficult for non-kernel developers we all know it is so the current solution for this is this like examples library which has documented policies for many different use cases so you just take them and you apply them yeah by default like when you just install tetragon and don't apply this configuration you i think we will get events for process execution only this is all done to ensure lower overhead this is like the main goal of this this decision that by default you don't get like this you know events for things you are not interested it or things you will monitor with other tools because tetragon is often used with also other tools that you know can duplicate this so yeah this answer is like use the examples and i'm going to assume that there's also no a community driven rules like like these rules are good for rest app these good these rules are good for post-grass application databases like there's no a community driven like we can look at like certain scenarios yeah so i would say like this examples directory in tetragon repository is is community driven most of the examples at this point are security focused there are not that many based for monitoring for user applications so not that many application policies using q-probes for example but it's absolutely possible i guess most of them were written by no kernel developers not postgres developers and when you have rules for like monitoring specific user space application then writing this rule always requires certain expertise make knowledge of internals of the application this is something we can't work it out really so yeah it is community driven we would absolutely love to have like more examples from the community for specific applications specific use cases thank you hi congratulations for your presentation there are some integration based in auto instrumentation for traces there are some integration integration with open telemetry for example i collect the data sent to open telemetry and implement some kind of sampling maybe head sampling or tail sampling because the volume of data i i imagine that can be high yeah so for for distributed tracing specifically right the tools that i know about um i think you can so open telemetry itself provides like auto instrumentation but i think it doesn't have like built-in ebpf based instrumentation but as far as i know you there you can go language is instrumented by open telemetry using the ebpf for example maybe yeah yeah yeah um i mean i am not sure how the something configuration looks looks there because i think i don't recall any sort of something happening actually in bpf there but also i haven't looked in the face that much so so maybe because you can sample for sure like somewhere in the pipeline wherever you go it wants like you can sample it when you collect the traces the most efficient thing would be to sample um in kernel so that you don't produce this trace but i didn't say this happening but yeah maybe i guess the refiner before would be good to ask about okay thank you so much hello uh dominik from epfl equal polytechnic federal uh of lozan in switzerland uh congrats for your awesome presentation and synthetical thinking you were able to present a very complex topic in a very clear and efficient fashion thank you very much for that i've got a question regarding uh so you know you probably know of detrace you probably remember of that detrace that had a detrace that was the thing that uh did a bit like ebpf but with no passive effect at all because it was able to mutate some code inside the kernel when the probe was active and then remove itself when not and there was a feature that was really interesting in that which was that it had user space support uh you could put uh the very same kind of probes inside programs that didn't even consent to it is there an equivalent in ebpf word do you have like uh libc ebpf or something like that i mean there are you probes but honestly i uh i don't know exactly how how this um work in detrace so not sure how uh so i don't know sorry all right that's fine thank you how close it is you can answer this question thank you uh detrace uh uh was used back in the past but had a higher overhead than ebpf so you can cover the tracing that you were able to do with detrace with ebpf with a lower overhead it's better lower overhead thank you