 So yeah, my name is Aviv Zohari. I'm a founding engineer in Groundcover, where we're developing an eBPF-based observability platform. And today we're going to be talking just about that, about how we can use eBPF to get observability data, specifically logs. And like Adiato just said, that's not something that we hear a lot about logs. And we'll talk a little bit about why and how maybe that is going to change. So a little bit about myself. I was a security researcher for quite a lot of years. And I did a lot of low-level kernel stuff, which maybe explains how I got to eBPF later on when moving to the observability space in Groundcover. And today I'm working mostly with eBPF to try and see how we can redefine some of the things that are already in place, but with this really cool new technology. And just so you know, this is my first time speaking at a CNCF event, which is exciting enough as it is. But about an hour ago, my dad just showed up in the conference to surprise me. Yeah, sitting right there. So the excitement levels are a bit higher than expected. So sorry if that translates in some way. But yeah, let's get going. So today we're going to be talking about, we're going to start with why. We even want to look at eBPF for logs, how it works today with the current implementations that we have, what are the problems or the inefficiencies in those methods, why eBPF could maybe solve that or how it could be used to solve that, and conclude with some benchmarks and what's going to come in the future. So let's start with the why. Well, we already have a lot of ways to collect logs. But I think that logs are probably the most basic pillar there is in observability. Every developer knows how to write logs. So you have logs from your applications, from your third party software, from everything basically. And that generates a lot, a lot, a lot of data. And it's also very unpredictable in volume. So it's very challenging to build a system that can actually work, that you would always collect logs even when there's suddenly a spike. And there are a lot of tools for doing that. And we're going to talk about some of them. And mostly they operate in the same way, which is not to say it's a bad way. But I do kind of feel that when there's sort of a consensus on how things work, then maybe there's an opportunity to try and reinvent that when new tools like eBPF come into play. Now, collecting logs is hard. And it gets harder the more logs you have, right? And we've seen a lot of nodes and clusters which can reach 1,000 log lines per second, maybe even 10,000s of log lines per second. That's a real engineering trouble. That's just a bandwidth problem that you need to know how to solve. And the thing with collectors that often run as demon sets, for example, in a Kubernetes cluster is that even though CPU of one instance might be low, when you multiply that with the amount of nodes that you have, it adds up very fast. And that directs to cost. So this is one very good reason to all try and challenge this, right? Logs are everywhere. Collecting them is hard. It might be costly. But I think the more important reason for milk list is that when thinking about it, when we're starting to look at eBPF for logs collection, it became very clear to me at least that this is the right way to do it. And we're going to see why. We're going to see why it fits the challenge of logs collection perfectly. And if you don't think that's the right way because it's eBPF, then we're going to be seeing why this actually improves the collection, the overhead in collection by up to 40x. And I think that eBPF might be this really new technology to a lot of people. But for log specifically, it can actually be a drop in replacement. You can replace the collection part of logs. And we see just that. And most people won't even know the difference, except the resources are going to go down drastically. And this would work on Kubernetes. This would work on ECS. This would work everywhere. There's the Linux kernel, basically. OK, so I hope I got you a bit pumped up to see just how this is going to work. But we'll have to start with how things work today. And we're going to limit our talk to containers on Linux, on Kubernetes specifically, and specifically, tailors, so programs that will always fetch logs from all of your containers continuously. And not, for example, open telemetry for using logs. By the way, there's another session right now happening that is comparing open telemetry with fluent bits. It's a shame it's on the same time I would love to be there myself. But I would definitely check that out later on. So how do we get logs from containers? Containers are secluded, right? There's Linux namespaces. You can't see what's going on inside the container. That's the whole point. But luckily for us, the container runtimes, which actually manage those containers, do that for us. And the way that it works, I'd like to think about it as, well, we have our containers that are ran by the container runtimes. And we can see that these containers are secluded in this world, this purple world. And every time that a container is writing out logs to standard output, it's basically saying something out loud, and hopefully someone will be able to hear that and somehow pass that on. So just as an analogy here, I'd like to think of mail pigeons. So we have the runtimes actually employing those pigeons. And every time that a container is writing something to standard output, then it will be delivered to a mailbox that is specific to that container. And we're going to get more technical in a minute, don't worry, but just to still lay out the ground. And the cool thing about those mailboxes is that they live outside of the purple world. So they are accessible to other containers, for example. So if you have tailors like prom tail and fluent bits, then now they can go ahead and get the mail from the mailboxes. But how do we know where to look? So it depends on the environment. But in Kubernetes specifically, there's a path. There's a well-known path on every node on every host machine that, by giving the pod details, the namespace, the name, the ID, et cetera, et cetera, you can find those logs. So if you know what you're looking for, then you can get that. For example, when you use a kubectl get logs, or log, sorry. But what tailors would actually do is they would just scan the whole directory. And that way they can find our pods, which is very cool. So we have some pods running, and we have those long paths. But at the end of the day, we can find them. Awesome. There's one important optimization that I have to talk about, which is that when tailing, you don't usually just busy pull the mailboxes, because that would be very intensive. So instead, we can use what's called inodify, which is a system called the Linux kernel, which asks the kernel, hey, every time there's new mail coming in, please let me know. And we can see that process here. And then we know where to read from. And this makes a lot of difference. And this concept of only doing things when needed, this kind of puts us on the track of this being an event-based thing, right? Logs are event-based. Basically, log is being written. And well, then this chain of events will lead to someone reading it later on. So that was very high level, of course. But I think it's good enough to start understanding why this method, even though it works really well, if you go ahead and install those tailers, you will have all your flogs, why there's also so many inefficiencies with it. And I'll give you about five seconds to try and think of those yourself. And then you can see if we've thought of the same ones. OK, you ready? Awesome. So there's actually a lot going on here that we haven't discussed when starting to actually look at the details, the technical details. So first off, we're using files. And we just saw that. We saw the files. We saw the path. And files are not a very good middleman in terms of performance because, well, even though you have to use files here because you're writing something and then it's asynchronous, right? Someone else is going to read it sometime in the future. What are we going to do about it? Yeah, that's true. So you need someone to store the logs. But file.io is obviously CPU intensive, especially when you're continuously doing that for every log line, every, all the time. And think about that amount of disk that you need on each of your nodes to store all of the logs, right? And you're probably storing them for a lot of time as well because you don't know when someone is going to go ahead and query them. So that was one point. The second point has to do with not batching the logs, the reading of the logs across containers. Now, what do I mean by that? Well, when you're trying to optimize pipeline efficiency, the one thing that you want to always focus on is doing as much reading as you can or much writing as you can every time that you're doing an IOR operation. So this is not new, right? This is in every pipeline problem there ever was. And even though we can do that per mailbox, for example, so say that some tailor would basically say, well, I'm going to wait until I have this many bytes until I actually do the read operation. So that's good. But you can't do that across the mailbox. It's just not the way that it works. So if you have more containers, you're actually getting less and less efficient. You have to read from different files, and you're doing a lot of reading, a lot of physicals to do that. And lastly, there's a lot of serialization and deserialization happening here. And you're probably wondering, where? I mean, it's just text, right? But in reality, there's other things that we want to share about logs. We want to know, for example, the timestamp. When was this log written? Maybe which stream it was written to? Was it standard output, standard error? So there's a format for defining how do we write logs? How do the container runtimes actually transfer those logs? And it looks like this. We have a timestamp. We have the stream, stdout, stdl, flags that we won't get into right now, and the actual content of the message. And this is a very simple format, right? It's just a couple of things and spaces in between. But again, those things tend to blow up on scale. So for example, taking a timestamp, taking the amount of seconds or nanoseconds that passed since epoch, for example, and converting that into a 30-byte text, doing that 1,000 times per second, 5,000 times a second, is very costly. More than you'd expect, I think. OK, so I know that some of you probably have thought of other things. And there are other things that could always be improved in those. But these are the key things that I would like you to remember when we talk about eBPF logs. And we have to start with a very short intro into eBPF. Now, this is not an intro lecture, of course, but just so you get a very basic understanding of how it works so we can explain how we're going to use it. So eBPF is a kernel technology that lets us run custom code in the kernel. Well, that's not very new. We always had kernel modules. But the really cool thing about eBPF is that it lets us do it in a very safe way. And what does safety mean exactly? Well, it means that the kernel makes sure that the code we are running cannot crash the system, for example. But it's not just that. It also makes sure that the amount of time that we are adding with our programs is limited. So we can't interfere with the operation too much, at least. And eBPF is a very, very powerful tool. It lets us look at both the kernel activity and, for example, here you can see VFS, you can see files, you can see network storage. So very classic kernel stuff. And it also lets us look at the user processes themselves. So we can actually put probes. We can look at the processes themselves as they are running in specific points. And the cool thing about eBPF is that it changes the way that we can think about a problem. Because we used to thinking in concepts of, well, we have the container runtime, for example, that is exporting those logs in those files. What are we going to do about it? We can't change that. But suddenly now we have a way to rethink the problem from the system perspective, from the place that is actually managing the whole thing. And the really cool thing about eBPF is that it's very, very good. I think it was meant mainly for event-driven scenarios. So every time x happens, do y. Every time this function is called, please let me know. Every time this event is happening. And we just discussed how logs are one of the most event-driven things there are, I think. It's just a process riding logs to the output. And lastly, it breaks the container collector boundaries in the sense that containers are actually managed by the kernel, right? So if we are thinking with the kernel in mind, if we are sort of the kernel, then we are seeing things differently. So what does that mean? Well, we've seen this image before. But now we need to add on top of that the actual Linux kernel, which is managing the whole thing. It's actually responsible for the containers being secluded, for example. And when we think in eBPF, things look a bit differently. So we no longer have containers and mailboxes. We just have processes. And we no longer have mailboxes. We have disk. And whenever a container is trying to write logs, it does a very, very simple thing. Does anyone know what happens when you write? It doesn't matter what language you're running. So you're just going to run a syscall. You're just going to do write or write v. But those are the two Linux syscalls for writing two file descriptors. And that's all there is to it. So in the kernel's eyes, there's not a lot going on here that is too complex. We just have those syscalls happening. And how do they look? Well, the write and write v. syscalls are pretty similar. They have the file descriptor that we are writing to. They have the buffer, the text that we are writing, and the amount of bytes that we are writing. And that's it. I mean, that's what you would expect from writing logs. It's very simple. So what can we do with evpf here? Well, we can probe means every time this syscall happens in the kernel, please let me know so I can run this small piece of code. And in this code, we can check if the file descriptor matches standard output or standard error, which has one or two. And if not, then we don't care about that for now. We're only dealing with those outputs. If we care about it, then we can get the identity of who did that, which is either the process ID or the C group, something that will help us later on understand who made this call. And then we just send it back to our collector in user mode. We're going to see that in just a moment. And that's it. There's no mailboxes. There's nothing else in the middle. So we've seen this picture before. But the only thing that we need to add now is the B or the EBPF collector on the right, which every time this event is right or right, the event is happening, it's going to have this data sent to it with EBPF primitives. So basically, we ring buffer submit means just send this data on a ring buffer. It's that simple. So with this in mind, with this very different approach to getting logs from applications, let's take a look at the points that we've seen before. So we talked about how logs are stored in files. Well, there aren't any files anymore, because the collection is now happening in a completely synchronized way. Every time there's an event happening, it's magically appearing in our collector, in our Taylor. And the memory is directly shared, which basically means that we don't have to copy the buffers anywhere. We just have the original log buffer used by the process available to us for copying directly through the ring buffer, but copying it immediately to the collector. And this means less CPU overhead, because we no longer have any file I.O. And in theory, we could remove the files completely, remove the storage. Now, of course, that means that other applications can access that. So maybe we'll do that right away, but we could do that. So the second problem that we discussed is how we are not batching across containers. Well, now we can batch from all containers together. And why is that? Because the kernel, like we've seen, it doesn't really care which container issued the syscall. It sees all processes doing all syscalls in the same way. So as long as we can identify which container did that in some way, and we've discussed that with the PID and SIG group, then we have all the logs together in one place. And we can batch them together. That's actually how the ring buffer lets us do that. It lets us aggregate data and only lets the reader know if we've reached some threshold, for example. But this is, by its nature, working for all containers together. And we get a very high efficiency of copying our data. And this also results in less TPU overhead and in difference to the number of containers, which we previously saw will make our life harder as we go with more containers. At the last point, of course, was the formatting. So this is a tricky one, but the cool thing is that we no longer need a textual format, because we actually share the code. It's the same code base. The code base that is the user code base, the collector, and the kernel code that is formatting the data are written together. And they have this way to share, I called it C-style, because it reminds me of parsing protocols where you send something on a socket and you get the raw bytes as they were. And this is exactly what's happening here. So yeah, we still have timestamp. We still have the stream, but it's no longer text. It's as simple as it can possibly get. You're just sending the data as is using shared memory. So that's pretty cool. And of course, this also results in less TPU overhead. OK, so we've talked about why VPF is a very, it's a new way. It's a cool way to do things, but of course, we want to make sure that we're actually doing something that improves our TPU here, right? And what we did is we compared different tools just to measure the amount of TPU they had. And by the way, I want your memory here, not because it's not important, but just because all collectors are pretty good in memory. That's not something that I signed proving with EBPF. They're all very consistent about the same memory levels. And what we did is we, sorry for the slack notification there. So we used Promtel, we used Fluentbit, and we used Flora, which is ground covers EBPF sensor. And something's going to go away. Sorry for that, exactly the moment of truth here. Well, you'll have to excuse me. So we had those three collectors running. And we had a very simple vanilla EKS setup. We had this logs generator that just generated any amount of logs that we'd like, very simple logs. It was NGINX-like logs. And we also had sanity metrics, which basically let us make sure that we're actually collecting the amount of logs that we'd expect, right? Otherwise, we're just not sure if it actually works. And I'm going to give you a moment to take a look at the results here. Sorry, there you go. OK, so on the left side, we have the amount of logs being generated per second, going up from 100 to 10,000 at the very end. And in green, we have Flora, our EBPF sensor, which is constantly, well, the lowest CPU used, but I would say by a pretty great margin. And you can see on the right side, we have the minimal multipliers. So it's not even the higher one. I just didn't want to put the higher one. But it goes even higher if you compare that one. And we can see that Fluentbit and Promptail are switching between them. And by the way, the reason that Promptail, well, I think the reason that Promptail suddenly became more CPU expensive is that in order to generate 10,000 logs per second, I had to use more than one container. And then we're reaching the problem that we discussed before with how can we group together logs from different containers. So yeah, you see, the numbers are pretty intense. And I was very happy to have the sanity matrix here, because at first I was thinking that this just can't be right. But yeah, so just a few thoughts to wrap this up. Well, first of all, I'm sure I closed Slack. I'm not sure why it's popping up. Sorry about that. But there's actually more we can do with EVPF that it's not just a pipeline efficiency in terms of logs. So first off, imagine that you want to filter logs, that you don't want all of the logs from all of your applications. You only want LO logs. You only want logs from a specific service, anything like that. Well, with the current implementations, you would have to first get the logs into your Taylor. And then you can decide that you don't want them or that you do want them. And you still have to get them all the way to you. But with EVPF, we've seen that basically the sending happens very early. So since we're doing things just when the logs are being sent, we can decide at the kernel, at the EVPF probe, hey, we don't care about this log. It's not error. It's not from a container that I care about. Just don't send it. And we're going to save a lot of resources just by doing that. And I think that the real cool thing is what happened if we could use EVPF with the container run times directly. So we haven't discussed that at all. How do the pigeons work? How do the container run times get the logs? And it's pretty similar to what we've seen here. But there's a lot of IEO going on over there as well. And of course, we could just do that with EVPF as well. It's just something to think about. OK, so wrapping this up, I really think that EVPF is the right way to get logs. It's not just another way. And this technology fits this problem perfectly. It's a very event-driven thing. It's a high load problem. And these are things that we've seen working with network before, but I think that logs, this text, is basically the same problem. It's a bandwidth problem. And we've seen how we can improve CPU by a lot. And keep in mind, this is per node. So we've discussed this at the beginning. But obviously, in a cluster, for example, you can have 100 nodes, 500 nodes, whatever. And you can scale this up. And I will leave you with this one question. How many EVPF-based log collectors do you know? Well, I'm guessing the answer, for most of you at least, is this is probably the first time that you've heard about this concept. And I have a feeling this is going to change soon. I think the community is going to pick up on this. We're going to see more projects. It could be just replacing some parts of existing projects, which is a really cool notion. And it could be completely new tools. But I'm pretty sure that we're going to see this evolving very soon. So thank you very much. If you have any, I think we're out of time. But if you have any questions, please feel free to find me or any of the ground cover team at our booth later about this, about EVPF observability, and about how to turn off slack notifications. Please don't come to me. I'm not the one for that. Thank you very much. You mentioned that you use like EVPF to send logs to a ring buffer. But the size of this ring buffer is limited. So it's a constant site. And the one advantage that we have from files that you can store a lot of data in files. And somehow, if your collectors are down, this data is stored in files. And then these collectors will come up and take this data. And you won't lose this data. But if we send data to the ring buffer, we can lose the data. So what's the solution for that? Yeah, so that's a good question. I think there are two things that are interesting to talk. First, the ring buffer size is limited. But in terms of bandwidth, it's manageable. As long as your agent is running, then you'll be fine in just picking up the pace all the time. I mean, it depends if you do it right. I'm not saying it's easy. I'm saying if you do it right, then you can do that. In terms of recovery, in terms of what happens if you go down, I don't have a good answer to that. That's probably the reason why files will still be with us for some time. But I think that most of the time, as things should be up and running, then you could maybe use files as, I would say, as cold storage or something that you don't have a lot of, maybe 30 minutes back, maybe, I don't know, just not so much as we have today. But yeah, that's a good question. Just seeing the start of how this could work. OK, thank you. Thank you. I can't hear you. Maybe the mic's fine. Hello. Yeah, go ahead. Go ahead, yeah, sorry. So thank you for your great presentation. My question is, if any developer did improper logging practice, and some things are missing, like client ID or developer ID, something, which are very important. So due to that, we could not find who are the affected by the failures or anything happened. So is there any way, using eBPF, we can identify those any clients or developer or container or anything? So you mean adding data if it was not added by. So eBPF, the downside, I would say in quotes of eBPF, is that the safety thing makes sure that you can't touch the data. So you won't be able to modify the data with eBPF, unfortunately, I would say. Probably more of a benefit than this advantage. But yeah, developer, for getting to add things, is not something that eBPF can solve yet, I think. Thank you. Unfortunately. Great talk. Thank you very much. You mentioned you were augmenting the data with the container ID. I wanted this is probably potential to augment it with maybe the place in the application stack where the log came from, perhaps. And I also wanted what the overhead of looking up that information is and whether you're doing any caching there or something like that. Yeah. Those are two really good questions. I think that to answer the second one first, so again, one of the cool things of eBPF in the safety manner is that it guarantees that the scope of the program you're running can't be too long. And just to put it in numbers, I haven't seen a program that takes more than a couple microseconds top because the kernel just won't let you run that kind of program. So even if you fetch the data, even if you use caching or you don't use caching, it's still not going to be a very intensive CPU overhead on your system. And to answer the first question, to get that, well, you need to understand, I think the problem with that is that loggers often tend to batch things together. So it's not that every time that you write something in your code, then you immediately get a syscall. You have a batching inside the log runtime as well. So when you actually see the syscall, you might be looking at logs that were collected for the last 10, 20, 100 calls. But the answer to that, and I've mentioned that when talking about eBPF, is that you can also do things in eBPF inside the user code. So if, for example, we did this on the actual log libraries, then we would maybe be able to get that information. But that would require a lot of work to basically making this working for each and every type of library there is. So definitely something we can think about this way just is kind of a drop in replacement that works really well. But yeah, definitely. Thank you very much, Sean.