 So, folks, how many of you are familiar with EVPF here? Okay, wonderful, wonderful. About half of you, that's great. Well, before we get to that, let me say a few words about myself to manage your expectations. So, I'm a performance geek who turned to be CEO. So now, I had to work more, I would guess, on the company performance than assistant performance, but you know what, I still keep my passion and that's why I go and talk on a lot of technical conference still. In this presentation, we'll look into EVPF overview and how it works on Linux as well as look at the practical examples of the EVPF-based tools, which I hope you find helpful and be able to apply to new environments. So, EVPF history and its support in Linux. EVPF, it stands for Extended Berkeley Pocket Filter, which is a weird name for a tool we want to use for monitoring, isn't it? Well, and reality is that ease originated from the Berkeley Pocket Filter, which was exactly designed for efficient, to build an efficient virtual machine for Pocket Filter, and then extended E. That means it was pretty much extended to be able to be more stuff, right? So Linux kernel has support for that virtual machine called EVPF. In general, EVPF is event processing framework, which is used a lot for monitoring, but that also can be used for some other stuff indeed in a network stack. In modern versions, we have the JIT compiler, which compiles those programs for higher efficiency. EVPF in Linux appeared first in 2014, so it has been out there for quite a while, but it has still been very actively improved. And in the recent years, it's also been integrated in a pair of tooling, which is this wonderful framework for interactive performance analysis in Linux. If you look at the improvements in the recent kernels, you can see the list of changes in large, and pretty much every major or minor kernel release does include some additional features in EVPF, which can range from the new instrumentation points to the virus new types, right? What kind of data structures we can use, how we can process and aggregate those events. So as I mentioned, what EVPF is, is this kind of a special byte code, like a small programs, which you can go ahead and insert in the kernel, and which would be executed when different events are being triggered. What is great about the EVPF is what there is a checks done before those programs are included. So it's not impossible, but it is the hard to screw yourself up, right? So it is hard to write an EVPF program, right? The basic mistake when you boom, right? To insert it in the trace point and your system completely crashes, right? For example, if you just put like a tight loop in the EVPF program, it would not be, would not pass a validation and it would not be, wouldn't include. Right now, even for the EVPF itself is based on a byte code, the LLVMC LAN can compile the programs into that in the EVPF byte code, and that is what typically used for as a part of the EVPF platform. Now what is interesting in this case is what that compilation is, tend to be kernel dependent, right? And that is why the installation of EVPF is a little bit more involved, right? In a lot of cases. But for the good stuff with LLVM is that means few of us would actually run, need to run EVPF programs directly. Here is, if you're curious, EVPF code example, which actually comes from a TCP dump, which is another tool which uses EVPF, right, for packet filtering, and it has the need tool where you can actually dump the EVPF programs in what looks like the assembly-like code, right, for the byte code that represents. Okay. EVPF has support both on the kernel side and on the user side. And in this case, I have an image from a wonderful, wonderful resource of Brandon Gregg. If you are interested in EVPF on Linux and using that and tools, Brandon Gregg website has absolutely fantastic resource. But I think it's a very nice illustration out there. What we can see is, one way or another, we will have a user program to generate byte code, which will be verified by the kernel, right, through EVPF, it can connect to the different interfaces in kernels, right, from KPro, ViewPro, TrapsPoint, Perf events. And then what the map is, is what that kind of data structure which can be used to accumulate the performance statistics, right? That probe will run, accumulate data in this case, and then we can use, again, in the user space, some program to read that data in the map, in the display that, as you will actually see in the tools I'll show you. This is a very wonderful image, which shows you all kinds of different tools which can be used to troubleshoot or analyze different Linux kernels of system. And you can see that the EVPF ecosystem is large and there are already many tools which use EVPF so you don't have to write your own probes, right? Here is another wonderful image from the same source which really shows us in a different illustration what Linux kernel versions support, what kind of EVPF features. And I'm not spending a lot of time on that because I think these things are self-explanatory and you can research them later on on the slides. Here is also from Brandon's Greg, interesting, I get a picture which shows the different tools, right? It shows by how easy it is to use, right, to how mature it is. And you can see what there have been a lot of tools like raw BPF, for example, was pretty mature, but that's very bourgeois to use. And that's nice to use what we have in the tools in this kind of corner which become both easy to use, powerful and mature, right? So EVPF, I think, is now at that very great time in Linux kernel that is very powerful but at the same time very mature to use, okay? Now one thing I see from looking at EVPF tools is not all of them are available in the single package. The first two packages which you may want to look at would be Perth tools Unstable, right? Well, in Unstable version there are more stuff, right, for you to use. And then there's also this Iovizer project which has a tools like BCC and PLI which can be used for a lot of other kind of simple shell scripts and Iovize to provide those. So these are both pretty mature tools. You can get them to the installed relatively easily, right? In this case that's just example. You have a repository which has a packages which makes it relatively not too painful. Now I mentioned already, right? What EVPF depends on the kernel that is why you can see it has to pull in the kernel headers, right? To understand exactly data structures you're working in where exactly different offsets for different data and so on and so forth. Okay, let's now look at some tools in actions which are based on EVPF. The first one is Profiler, right? What Profile, and that's the name of the tool allows us to trace the given program and look at where different threads are in the stack. This is different, for example, from Perf because the tools like Perf they typically show you CPU usage. In this case you will see where threads are getting stuck where that is doing some CPU compute where they are stuck waiting on Vue checks, waiting on this guy or whatever which can be very helpful to understand okay, what exactly is going on in the program, right? Why it's not running so fast. Another cool tool is understanding your IO latency, right? Especially in modern storage in the cloud and so on and so forth. You often have to deal not with the averages but with the outliers. And this tool pretty much allows us to understand the latency of your, or of your blog device, right? We're in this case, it's kind of quite natural, right? We're not having a lot of outliers but this is actually interesting, right? We can see for this blog device we had one of the IO requests which took, what is it? I think that's microseconds, right? So that took two seconds to something to complete which is if you, for example, run in the database server and that happens to be something like your low flush request then it's bad, right? It will have a very bad impact here. And other which is a very nice tool which allows us to snoop the blog device IO, right? So if you ever wondered where your IO is sequential or if that's kind of random and what's not, you can rather easy to trace that with the tool, right? And then you can run some program easily, right? To analyze, plot, visualize your IO pattern in more details. This is the blog device. There are also tools to provide you the same information for file systems and there are different tools for, this is for EXT4, right? We can see how much file system main operations like read, write, FC, and take because again for all kind of different storage reasons as well as internal kernel reasons, we may see file systems causing stalls which can impact their operations, right? Of our programs, right? In which maybe hard to travel should IO-wise, right? Because you probably do not instrument every single IO operation in your program. There is other tool which is related to that, right? And if you do not want to trace everything to build a histogram, you can also use this tool to understand what kind of IO operations do not meet your let's call it service level objective, right? So in this case, we can say, hey, what IO operations took more than 10 milliseconds, right? You can put obviously any number you want out there and that tells, well, you know what, there have been those, these are, these requests which have taken so much time. Now, another interesting statistics, right? For me, I was always wondering, right? Well, Linux has this cache, right? But how efficient that is really, right? How do I really see, do I have a note of hits, right? If I have more memory in the cache, is it really getting more hits or less? Well, with EVPF, you can do that, right? There is a cache start tool which shows me how many total requests have been reading from a cache, right? How many hits misses, how many blocks has been dotted, right? And so on, right? And that really can help you to understand your cache performance. Another interesting thing, so if you look at your applications, which are a CPU or CPU bound, right? One of the critical items in terms of where they would be responsive and perform well is when the application is ready to run, how long does it take Skeru to find the CPU to actually schedule it on, right? Because if all the CPUs are busy, then you'll have to wait until the CPU becomes available, right, or preempted and so on and so forth. This tool allows you to actually see that from a run queue delay profile and you can see how long it takes before the program became ready to run to its scheduled. Which is, I think, a very cool way to understand where you're having the CPU saturation, right? It's much better than just looking at your CPU usage averages, right? Okay, the next one. If you ever wondered for your program, right, which you have maybe not in the shell, but in some more complicated languages, what are the tools and other stuff it executes, right? Well, there is this wonderful tool called ExactSnoop where you can look wherever for a whole system or for a specific program, what are different programs that are being started, right? And that can help you to understand what the program does better, or if there are some problem, something fails, you can often figure that out, right, from this tool. The next one is Opened Files. That, I think, is another also very helpful thing. I find that helpful when programs do not have diagnostic error messages set very well. It says, oh, there is an IO error, or can't find a file, what file, right? I mean, some Linux programs are getting the error messages which are as bad as the Windows errors, right? So that allows you to deal with those programs because you can understand what files are the programs accessing, right? And from that you can often infer what could possibly go wrong and travel through that, right? Besides files, you can also trace what the programs connect to in your system, right? This allows us to help to trace the TCP connections where outgoing TCP connections in your system, which, again, can be very helpful to understand how programs work and how troubleshoot them as needed. This one I find a very cool one, right? So in many cases, what we see with database troubleshooting, right? For example, somebody complains, hey, the query is slow, but the query should not be slow, right? Because if you look at the logs at the, let's say, MySQL side, and it's all fast application, it's slow, right? Possibly that was communicated because of some network issues, right? In a lot of the modern networks, the latency you would see is being quite good, but if you have some packet being dropped and retransmit happened, then that can introduce a huge delays before the internet communication. The first two, you can actually see what exactly network retransmit happened, not just in aggregates, which you can get from net stuff, but for specific connections. And then for those specific connections, I can also understand what program write application if a user was involved out there. The next one, DNS lookup. That is a latency measurement, also sometimes not easy to figure out, and I think it's often missing this, not taking account, but DNS lookup may be a huge performance impact for you connecting to the service. Here are all the tools which are available in the version of BCCI used, and those tools are constantly being developed and extended. Moving on, for PLI, that is an other tool, which is wonderful to get some additional scripting, if you want, besides running custom tools. For example, you can run this kind of simple probe, which can say, hey, I'm connecting to this kernel C-Score, and give me the histogram of very toned values. Relatively simple, gives us a beautiful chart. That is example of a PLI language. It uses a custom language, but if you do the program, that is a not too hard language to learn and understand. Okay, this is all the tools I showed you, which are based on kind of real-time troubleshooting analysis, which is wonderful, but what if you want to have some stats you want to analyze in the past? Yes, and there is a tool for that, which the state of art, I believe in this case, is cloud players, EVPF exporter, which is, as the name says, exporter for Prometheus, so you can get pretty much any data from EVPF in a Prometheus, which you can do wherever you can plot it. You can use an alert manager and do all the other good stuff. Now, here is a nice graph in this case, which I took from a cloud player layer stock, which tells us what, while all of those things, they have the same average value on both axes, these are completely different pictures. If you haven't seen that image, I think that's kind of quite cool. That is why we need to go beyond the averages and even the mean values to really understand the histogram. So if you look at EVPF exporter, it uses BCC library to get the data in pretty much, you can take the data from BCC programs and output that in a Prometheus format. One thing you have to be careful running exporter in a production is the overhead. So if you put, for example, a very complex probe, and if you put that on a very common operations that which you run in millions of times a second, you can have quite an overhead. As you can see, closed layer guys measure that can be way over 100%. So be careful what you run in production versus what you run in a test or you only enable when a system is not healthy. So for EVPF exporter, there are boundaries available. You can get that and also provide the configuration information. You can plug it in in a standard Prometheus, of course. I also show you how to integrate that with our open source monitoring tool called PMM where you can pretty much as an external exporter. And what you can get in this case here as example is a IO latency histogram. Just the same as I showed you early on in the presentation which we can then go ahead and plot as a IO histogram over time to visualize that in Grafana. And that can be used for us, for example, to help us to understand the outliers. Like, well, you know what, in this case, we had some requests which have been taken more than 33 millisecond, which is kind of slow for this SSD storage in this day and age. Okay, so what's coming in a BPF? What I'm very excited in BPF is BPF trace which is kind of in this style is very featureful de-trace replacement. Right now it is quite, it is not quite mature yet in terms of having the packages available, but it is really getting there. And I would say that probably would be one of the very powerful BPF tools in the next year or two. Here is for reading list, right? As I prepared, I went through a lot of articles. I thought I'll just leave it on the slide. And that's it. I think I have exactly a minute and a half for questions. Please remain seated. Do we have any questions? Any questions? Any questions? That was so unclear, no questions, right? Yeah. Yes, so the new tools appear and there are de-trace, F-trace, L-T-N-G, tons of that 15. Which ones are likely to go away? Maybe you have compared them. Well, that is a good question, right? I mean, somebody before was saying, hey, Peter, why don't you talk about de-trace and Linux? And I say, well, de-trace was wonderful, but it's kind of dead, right? I mean, de-trace is available, for example, in Oracle Linux. But how many of you here are running Oracle Linux? To prove a point, right? So in this case, I believe what the EBPF, at this point, is the leading monitor-tracing instrumentation framework for Linux for this kind of advanced instrumentation use case, which is there to stay, right? And I think that is where the future is. Makes sense? I have a question, which is a lot of these tools seem very, very useful to run in production systems because that's where you find the really weird problems that you cannot debug and reproduce anywhere. But the question is, I always see these things running as root and being advertised as unstable. So is there a better way for me to not be running things that are marked as unstable under root on my production system? Oh, well, if you... Well, there are many ways, right? But I mean, in the end, if you are inserting some, you know, custom code into a kernel, right? You are doing something relatively dangerous. And as I mentioned, the performance overhead at least, right? While EBPF is designed to avoid you crashing the kernel, you can make it run multiple times slower, right, in this case. So in this case, that is advanced monitoring tools, right, which have to be done. Now, what I would imagine in this case, right, if you want your developers to do, then build the safe tools, right, and let them only to run those from sudo, right, or something for that particular use cases that actually you are authorized. That's what I would think. You have presented a couple of tools which are quite similar in the... well, in the target that they can do with S-rays, for instance. So how do they compare to S-rays? And do you think S-rays should be rewritten in order to leverage EBPF capabilities? The great question is how is that... is that compares to S-rays, right? The overhead. EBPF has, you know, orders of magnitude overhead than what the S-rays does. So should we rewrite S-rays to leverage EBPF? Well, you know, possibly, right? I'm not sure. Maybe there is actually work to do that already, right? But in general, for the tools, right, the overhead, I think, right now is the main difference with what you can do with S-rays. Any more questions? Oh, you're getting your exercise here, right? Up and down, up and down. It's good for me. Yeah, you can... How do you have that? We actually put histogram data of the audio coming off the microphones. Oh, that's very cool. Yeah, so my question is to the audience and to the speaker, is anybody using this in production now? The exporter. No. Yeah, Cloudflare has an exporter, right? I mean, there is, if you look at... There is a lot of people writing about using that in production, right? I know that Brandon Gregg uses that very accurately at Netflix, right? But he's using the tools individually at runtime, not the exporter. No exporter. Yeah, I'm actually running the exporter in production. Oh, yes, asking about the exporter. Yeah, that is a good... So I would have raised my hand to my own question. But I was just wondering if anybody else... Oh, you use the exporter in production. Yes, and it's not 23% slower. I've benchmarked it as well. It's pretty close. Yeah, what I'm saying again is you have to be smart with probe you set up, right? You can set up a probe to fuck yourself up, right? As with many other stuff, I don't think it's like... Any other questions? Okay, well, thank you.