 Okay, well hello everyone. Good, good morning. It is interesting how time zone works, it's still late evening on Thursday here in North Carolina in the United States where I'm speaking from you, but in Japan you are on Friday already and the sun must be shining bright. Well, we will talk today about the BPF Trace which is a fantastic tool and great replacement for Dtrace. And more specifically, we will talk about the quick review of what BPF and Dtrace are. We'll talk about BPF tools landscape in general and what BPF Trace tool gives us specifically. Now let's take a look at the instrumentation and durability basics on a very, very high level. If you think about the observability, this is our ability to look inside the running system and see what's going on. And if you're dealing with complex systems, as we all increasingly are, it is extremely important for monitoring, travel shooting, performance optimization and so on and so forth. And for us to see inside that running system, we need to make sure that is instrumented. There are basically some points in the system which capture information from that running system. There are typically two kinds of instrumentation you can speak about, static instrumentation and dynamic instrumentation and I will talk about those two a little bit later in more details. If you think about high level instrumentation approach, you can think about either tracing, that means you have actual places in the code and when a program executes those places in the code, they emit a particular event when code point is reached. That event may just tell us what something happened or it may give us information about the timing, error code and a whole worth of additional information as a progress of that event. Another approach which is also sometimes used is sampling. So, for example, if you have used the tools like O-profile or P-prof more recently, those tools essentially sample the state of the system very frequently and through that they can understand what's going on in the systems and what states the system spent the most time in. These are generally two approaches to instrumentation. Now, if you look at especially tracing approach, as I mentioned there are two approaches to that, static and dynamic. If you have a static instrumentation, then you have essentially the very specific counters or login points introduced throughout the code and which compute the statistics or log the data you want to be logged. In this case, we have to be mindful about the overhead and with that, of course, we only have so many data points we can compute because if you compute too many, it becomes expensive and frankly you cannot really compute all kind of information stats or anything you can potentially use. But at the same time, static instrumentation is very useful. If you look at the Linux, ProcFS contains a lot of stats which is derived through various stats implementation and many very good basic Linux performance, monitor and observability tools, VM stats, top, IOS stats and so on and so forth, they are using that static instrumentation. The dynamic instrumentation means we don't know in advance what you want to instrument and we can take a look at the running system and instrument pretty much anything we want. Obviously, that means if dynamic instrumentation, we can really focus on very particular detail we want to investigate if you have to. But at the same time, that is more complicated for users because users have to really figure out what is it, what they want to instrument rather than working with basic tools who just show us the right stuff. So if you think about the Dtrace, what is Dtrace and how it relates to those things I explained, the Dtrace is our dynamics tracing framework. So it's focused on dynamic tracing as a name implies and this was developed in Sun Microsystems for Solaris of course starting in 2001. So it has roots going almost 20 years back. It was first released in Solaris 10 and it defines specific trace points in kernel and the user line. So you can specify, oh, I want to trace something which is with some human readable name. Additionally, you can trace arbitrary functions and many more data points. The key, the absolutely genius thing about Dtrace design was, and it was first in its way, it doesn't have any overhead then not enabled. So a lot of those special names and functions was pretty much something like debug symbols which were saying where to insert the trace, but it did not have any overhead. And other ingenious Dtrace innovation was the D-language, which is language inspired by C and Oc, which allowed to really write very cool programs, simple programs to compute and analyze results. Dtrace proved to be so good it went well beyond Solaris. It was added to macOS, FreeBSD, NetBSD, Oracle, when it acquired Sun, they even ported Dtrace to Oracle Linux as one of the differentiators. And later on, they even re-licensed the code to be available for Linux mainstream, but it was too little to late by that time. Dtrace is now even available on Windows, so it has a pretty broad adoption. Now if you look at Dtrace on Linux, it's not really available in stock Linux kernel, and it's not available from major Linux distributions outside of Oracle. So that recent GPL code release was likely too little too late, and Linux over the last decade pretty much figured out a way of leaving Dtrace in the dust. So how is tracing implemented in Linux? Linux approach to tracing, as actually too many things, is to let multiple competing frameworks and approaches to exist in the kernel and see which of them over time wins. Rather than more of Solaris design where it is mandated, everybody uses the same approach. Now for more details about this, I really like this fantastic article by Julia Evans, who talks about Linux Tracing System. And you can really read a lot about that and with some fantastic illustration. But really if you think about that, you can see the Linux tracing infrastructure can be seen as three different layers. One is type of a kernel interface, that's pretty much what you connect to. The other is once you connect and are able to consume that event, what kind of program is going to be able to consume that event. And the third, now you have all that events somehow produced and processed, what kind of front end, what kind of command line tool or other kind of tool you can use to really work with that information as a user. And as you can see in every approach of those, there are multiple different ways how you can approach tracing. So for example, if we're speaking about consuming those events, we can choose to store data in the kernel buffer and then consume raw events in user space. We can connect the kernel model, which can be pretty much any program compiled and inserted into the kernel. Or it can be evpf program, which we'll be talking about more. Now, as I mentioned, evpf is really this emerging Linux standard when it comes to dynamic tracing observability and actually many other things. But where does that evpf come from and what it is? What the hell is that bpf? Well evpf stands for extended Berkeley packet filter and it originated or it goes far even further than detrace. Like almost 30 years ago, it was designed as a tool for efficient packet filtering, right? That's what the Berkeley packet filter stands for. evpf stands for extended Berkeley packet filter, extended version, which is found in Linux. Over time, and almost 30 years is a lot of time, it evolved to be general event processing framework, rather than just a packet filter. And it also acquired now JIT compiler, so it really can be high performance, high efficiency solution. If you compare evpf and bpf on the high level, we can see what evpf has some advanced stuff, right? It has 64 bit registers, it has stack, right? And what's most important, it has what's called map. And these are the different special proposed data structures where information can be accumulated and processed, which makes it much more powerful than playing bpf for all kind of different use cases. evpf was in Linux kernels since 2014, right? I mean, so it has been in Linux kernel now for what, about six years, so it's pretty much sure. And it's at the stage where it gets to the most of their Linux distributions, and it's available in many production installations, which is great. It's still being actually developed, and it's also integrated in the Perth tooling system, which is really one of the standard performance optimization tools we all use on Linux. The work is also ongoing. If you check out this URL, I show you on this slide, you will find what bpf continue to be developed, and even in the very latest kernels, there are some more and more trace points, maps, and some other functionality. Okay, so as I mentioned already, if you look at the evpf, this is essentially a program which connects to a certain point in the kernel. So how does that work? Evpf program is written in the custom bytecode, which Linux kernel can load. The fact that it's loaded in this custom bytecode means also what those programs can be verified to prevent misuse. I mean, maybe they don't get to be verified to protect you from yourself completely, but to remove a lot of risks which come from loading general machine code in kernel, and running that kernel level privileges. So for example, evpf programs, they cannot contain loops because loops are dangerous, the faculty can get the point that they never complete. Now, how do we get that bytecode, custom bytecode? Actually, LVM compiler can compile evpf as a target. So you pretty much can write as C, and in some cases you can get the output. The challenge with this is what this compilation is kernel dependent, because a lot of the data structures which you may access in the kernel and so on and so forth, they change with kernel with kernel, and that is one of the challenges of evpf adoptions in which I know is being worked right now to make it kind of more kernel independent. The good news is while evpf really is this kind of scary custom bytecode, very few will ever need to run evpf programs directly. So here is how the general interaction of the evpf and kernel works. So the user program using probably some of the tools will generate the evpf bytecode, which will pass to the kernel, which will verify that and load it as a program, which will connect to one of you trace and interfaces. K-probes in a kernel, U-probes in a user program, trace points, perf events and so on and so forth. And as it works, it will store data in maps, as I mentioned, which could be something like histogram, right? For example, if I want to collect response time distributions, which then can be asynchronously fetched by some of the tools. Now here is the graph, which shows you different events and different kernel versions, which talk about what has been instrumented with trace points as it corresponds to a different functionality in Linux kernel. And you can see what majority of the interfaces, right? And subsystem is instrumented as well as there is a very good instrumentation at the various CPU performance metrics, right? So you can track cycles, instructions, you can track where different cache misses happen and so on and so forth. If you're really interested in Gtrace, you may want to check out the project Iovizer. It contains a lot of great UEPF stuff, and that is the latest innovation is happening, because while UEPF exists in many Linux distributions, it typically is not the latest version. Okay, now let's talk a little bit about UEPF overhead, right? And the overhead, of course, is not nil, but it is great to see what it is mentioned in the nanoseconds per call, right? Not milliseconds, right, or even microseconds. So what that means is what if you are running medium complexity UEPF program, you typically can have hundreds of thousands or even maybe millions of triggers of that UEPF program per second per core, right, before the overhead becomes significant, right? So, which is great. Now, of course, if you really want to hurt yourself, right, and if you create some very complicated UEPF program and connect it on very highly triggered kernel functions, you will have kernel performance crawl to halt, but that has to be pretty much delivered, right? It's not that easy to make that mistake. Okay. The next thing to talk about is UEPF front-ends. So when you speak about front-ends, this pretty much means the tools which help us to not to write that pesky UEPF bytecode, but instead use some command line tools which do all of that underneath. If you look at the landscape from 2019, right, which didn't change a thing that much, right? We can see a whole bunch of tools being available. And I took this slide from Brandon Gregg, right, which talks about tools based on usability and scope, right? And to what extent the bar is filled is how complete is development of that tool, right? What you can see in this case is what BPF trace kind of really fits in this sort of best zone, right? From one extent, it has a very high scope, right? It can really do a lot. On the other hand, it's also rather easy of use and it is also very mature. In fact, at this point, it is generally available and has been in the stage for more than a year. Another front-end which came to market slightly before BPF trace was BCC, which also was good and really provided a lot of very cool stuff, but that was not as easy to use and essentially when you think about developing UBCC tools. So let's compare those two front-ends. If you think as a user, BCC has great set of a reveal tools, which are very easy to use, convenient and so on and so forth, but developing your own tools, extended modifying is not easier. BPF trace, it has so far smaller collection of the tools, at least last time I checked, but it is much easier language to extend, improve and modify your tools. Let's compare, for example, their program, some very simple which traces bash commands in BPF trace and BCC. You can probably see how easier to read and how less brutal BPF code is compared to BCC code. It's still, BCC is still much better than working with BCC or BPF bytecode, but not as easy. If you look at BCC tools available, you can see there is a lot of tools, as I mentioned, and I absolutely love those, especially when they came out compared to the standard, top VM start and so on and so forth. Those tools give you so much more visibility and without adding any overhead, especially when you are not using them, that is just absolutely fantastic. The next thing for us to cover is to talk about deep trace and BPF trace. How do those compare? Now, if you think about getting things done, right, then it has to do to virus performance analysis and travel through the problem. You can see what if you are in a deep trace ecosystem, you will use a lot of either deep trace directly or deep trace plus shell. In the BPF space, you would use BPF trace for relatively simple one-liners and simple tools, right? But if you want to write the complicated tools, chances are that is where you would need to use BCC, right? At the same time, you can see what the BCC, because of what it offers, it is actually much more powerful than you ever could do with shell and deep trace. If you look at BPF trace, BPF trace was inspired by a deep trace, of course, right? And it is similar in spirit, but there is no direct comparability. And also because there is many years in a gap between design of those tools, BPF trace is really more powerful. But if you have experience with deep trace, you will probably catch BPF trace in no time. So, for example, here are how different functions and variables are called in deep trace and BPF trace. And if you just look at this comparison, you will see, well, it is not the same, but they are kind of similar enough, right? So, converting those would not be the major problem. And in fact, that is not the major problem. It is typically harder to, if you want to pour a program, let's say, from Solaris and deep trace to get the same stuff on Linux, is to find the proper name for trace points then to convert the rest of the symptoms. Let's look at more details in this case, right? So, this is also now the same program which is implemented in BPF trace scripting language or in deep trace language D. Again, as you can see, the syntax is not exactly the same, but there is a lot of things here which are similar. And I would say what BPF trace syntax is designed to be slightly more compact and more clean, if you like. Okay. So, BPF trace, how do we get it and what is required? If you look at requirements for BPF trace, these are our requirements, right? You can see what majority of BPF trace functionality is available with most of the kernels you'll find now running in wild. Here is where different sort of prefixes, trace points and other things would hit with BPF trace. And again, you can see it really gives us very good coverage of different areas of the system and hardware. Here is how BPF trace works, right? As you can see, you get the BPF program which goes and sort of compiles that program, right? The parser then goes through C-Lang processor, right? But in the end, you would get BPF bytecode. As we discussed then, it gets loaded to kernel and so on and so forth, right? It's interesting what when you run BPF trace program, in most cases you will see like two events sort of working at the same time, right? You will have a BPF program in the kernel which is doing all the data gathering and then your BPF trace essentially blocks or just consume results. And then the program BPF trace or which runs the script terminates, then the kernel BPF programs are also unloaded. If you want to run the BPF trace and linear distributions, you'll find not all distributions currently have packages and also development is fast-paced. So many may have outdated packages. So if you're really into BPF, you may want to consider new packages. Now one way to get the BPF trace is to use Snap, right? But we'll talk about limitation what that gives and then you get that you will be able to run this kind of nice one-liners like this, which pretty much allows us to trace their IO request performance by different processes, right? So in this case, we can see actually quite interesting example of MySQL which has this kind of double hump performance distribution with this scale, right? It comes from a fact what some requests come from a cache, right? And they're generally fast and others come from reading the disk. This gives us B model distribution, right? This which I think is quite cool. Anyway, if you figure out something and say, hey, that is a cool one-liner I want to use, you can actually say that in like read-bt file, right? For example, and then run that file instead of passing one-liner, it will work as well. If you look at BPF trace details, you will find what general syntax is kind of quite simple, right? You will have a probe or several probes you attach to when you will have a filter which allows you to filter to more specifically to what to attach to. So for example, you can say, hey, I want to attach to the system called but only for given process ID, not for everything. And then you have an action which gives us the mini program to run which can do some math, store data in maps and so on and so forth. As I mentioned, number of the tools which are initially created with BCC are now being poured into the BPF trace, right? Which are sort of same functionality just to implement that using BPF framework. And here are an example of some of the cool tools which I think not only greatly usable by themselves, but they also provide good insights about who build additional tools. Now, if you ever want to see how a BPF code works, well, you can do this, right? You can actually do minus V for BPF trace and it will show us the byte code, right? Which is kind of, I don't think, super interesting unless you are a developer, right? But if you really want to look into how different things work inside, then you can do that. Okay, now let's do some practice and let's say what we want to use BPF trace to trace MySQL and let's say I want to trace their execution of the different queries. In MySQL, there is this touch comment, which is a pretty much function which is responsible for executing the query, right? Or other comments, right? So generally if you look at printing the second argument of that, that should be the query. If you run this comment, though, we will have the problem in this case where it tells us, oh, what the file MySQL D doesn't exist, even while it's actually kind of does. Well, the problem in this case is, as I mentioned, if I install standard snap package without additional options, it will not have access to random files on the file system by design, right? And you can either do that or I just pretty much decided not to mess with it and instead installed the BPF trace through up packet. Now, when you do that, though, we'll get a second problem. Now we get the problem, hey, symbol doesn't exist, right? Even though if you ever looked at the MySQL source code, you know there is dispatch comment like that. So let's look at the MySQL or actually in this case that was MariaDB symbols and see if that symbol of dispatch comment actually exists. And you would see it would, but it will have that kind of funny cryptic name, right? And that cryptic name comes from C++ functions manglin because C++ allows us to have multiple functions with the same name by different signature, right? So to avoid conflicts, it kind of mungles them in this kind of funny way, right? But now if we find the right functions you want to do, right, which we did, we can connect to a function and we will see the system output on that actually there queries. Which system is Iranian, right? Which is pretty cool. Okay, well, this is a brief introduction to their EPF. If you're really interested in EPF, I would suggest you to check out Brandon Gregg's website. He has fantastic set of materials about EPF, you know, write articles, tutorials and so on and so forth. And actually, I can't believe I forgot to include a picture here, but Brandon also published a book about EPF which you can easily find in a place you find your books for. And that is also a fantastic tool to learn about that. Well, with that, that is pretty much all I have and I would be now happy to answer your questions if you have any. Okay, let's see if I have somebody on chat. Okay, Q&A. Okay, how do we get your, the question is how do we get your presentation material? Well, I will share the material with the conference organizers, right, and I would assume it will be shared to give all the other slides. Okay, let's see. Anybody else? Any questions? Everybody being quiet. Have you guys used EPF? Anybody use the EPF? You can maybe just type your answer. Yes, no wherever in the chat. Nobody home? Let's see. I have a little bit more time still. Okay. Yes. Thank you, Tessuke. I hope I pronounced your name correctly. Well, I probably shouldn't even hope. I'm quite sure I butchered your name. So the question Tessuke has is do you go with EPF Trace straight or use other tools first and end up using EPF Trace? Well, in my opinion, in many cases when you want to get some good insights, you can use the tools with EPF Trace already has provided. For example, there are the tools to look at the IO latency distribution, which is a fantastic way to travel through performance problems. And that is where I would start. But then you may say, well, you know what? I'm not only interested at the performance of all requests in general. I'm interested in performance of requests, which are more than 20 kilobytes in size or something like that. Then you can go ahead and modify the script to a to-fit unit. That is how I would approach that. And again, I mean, I would say this in any case, very typically often a second step after you start looking at some built-in standard Linux tools, that would be very well familiar with all right, right? Again, top VM starts, net start, right, and so on and so forth. Yes, well, that's cool. I hope if you use Dtrace before that was helpful for you. AJ was asking, I was sharing with us what he had experience using Dtrace, but not EPF. And Shenzuki is saying what it's very interesting, but seems what you need to read the material to understand it. Yes, that is surely the case. And I would definitely recommend you to download and check out the slides when they would be available. And also I kind of skip that slide, but I also have a section of a full reading in the slide deck. If you want to read more about EPF, there is some additional information available. So yes, so there is a question in this case. I won't even try to read that name. Does EPF Trace has significant impact of performance of a system? So as I mentioned in the presentation, the cool things about EPF Trace and all the tools, right? What they only add the instrumentation overhead when you instrument something, right? So their instrumentation overhead in most cases is relatively low, like fraction of 1%, maybe 1%, right? So it is actually safe to use in production. In some cases, if you have a trace and something like, hey, I want to trace all the system calls in the system, right? And put some very complicated programs on them. Then in this case overhead will be high, right? But it's on, right? So it is typically a good idea as you're trying to instrument and trace something is to think about how frequent event I was trying to instrument. So Takanori is asking how much is it difficult to write user line applications with EU probe supporting both both EPF and Dtrace? So in this case, it is, you know, hard for me to answer this question, right? I'm not quite sure what is a point in this case, right? Like if you would say, hey, I want to get, let's say, some custom probe and I want to support the users which use both EPF and Dtrace, you, you can probably do that if I was too much of effort because the syntax is quite similar. But what we see typically is users, they would even use one tool or another, right? Or in many cases, maybe even using kind of some high level tool, some graphical user interface, right? Okay. Let's see. Any other questions? First, maybe observations to share? Oh, okay. So Takanori is asking what if MySQL supports the Larison Linux? So MySQL needs to support both EPF and Dtrace. Okay. Yes. I'm sorry. I misunderstood the question. So if you, if you look at, this is not quite how it works, right? The MySQL itself, it does not need to support either EPF or Dtrace, right? Both those tools are designed so they can instrument any program without that program having any support for that. As I showed in my example, I pretty much pick the function I'm interested to trace in MySQL, right? And then I did that with EPF and again, I could do the same in Dtrace. Now, at some point in past, MySQL used to have like a specific sort of trace point. So like some better human names for Dtrace events. But I think that since has been, has been removed because vast majority of MySQL deployment is on Linux those days, right? So for Shunsuke, if there is, is asking if there is a slide deck URL of those presentations. No, there is no slide deck URL for this specific presentation. There may be the one similar to what I have presented before. I typically like to give the slides to organizers, right? So they can distribute them according to their distribution policies as a, as a courtesy, right? And instead of just having that accessible on the side. Well, Tresuke is asking about explicit trace point in the middle of a function. Is it defrecated or cannot be used as a complementary one? Well, I think the point is if you have an explicit trace point which is defined, in many cases it's quite, quite convenient to use, right? More convenient than a function. And another value of a specifically named trace point is what they can be stable where source code may be changing between major versions. The functions may be renamed and so on and so forth, right? And then their instrumentation programs break. While the trace points that typically developers take on themselves to maintain them much better than function names which they consider their own business. Okay, well, any more thoughts, questions? I think we have just about a minute left of our generously allotted time. Okay, well, thank you for all your questions and for your wonderful participation. I know it's not as easy to participate in a virtual event as in a face-to-face event. So that was a pleasure. Thank you.