 Hi everyone, thanks a lot for having me over here. I know I'm presenting over from the West Coast of California, so it's not time over here, but good morning to everyone else in Bangalore. So today I'm gonna be talking about no instrumentation Golang blogging with EVPF. And we'll get into exactly what that means and how we do it and some open source code to make that happen. But before we do that, a little bit of background about myself, I'm Zane Askar. I used to be the co-founder and CEO of Pixie Labs and I'm currently an adjunct professor at the Bureau of Science at Stanford. We recently got acquired by New Relic, so I'm now the general manager of Pixie Labs at New Relic and continuing to basically build out Pixie as an open source project. A little bit of background on Pixie is that Pixie is an observability tool for Kubernetes that basically works without any instrumentation and allows you to access all your application data without having to modify or change any of your code or even your infrastructure. With that in mind, I'll kind of get started on my, I'll go ahead and get started on my talk about how did you log in without manually adding log statements to code and go. But there are any questions, feel free to ask in the Q&A and I'll try to get to them. Okay, so the problem that we're gonna address today is you're an application developer and you're having some issues with your application. So what do you typically do, right? So normally you just go there and take a look and see if you have logs. And if you've got logs and the logs are on the right spot, things are great. But unfortunately, we've all been in this, all been where the logs are just not where you need them to be. You're kind of left with this idea of I really wanna see what the value of this variable is every time this function is called, but we just don't have that available and it's not been added into forehand. So what typically happens is, if you're in development mode, you'll basically jump in and use a debugger, right? So if you use go and this talk is more broadly addressable than just go but we're picking go and it goes as an example over here. So if you're using go, you typically use a debugger like Dell and you can even use things like GDP but there's just less go supporting GDP. Keep in mind that if you have languages like C or rust or whatever, you can still use like their native debuggers like GDP or whatever have you. So for the purposes of this talk, we're gonna take a look at a very, very simple application and we'll try to understand how to use a debugger on this and how would we actually accomplish the same thing without having to go and like manually work with the code using a debugger. So with this, for the purpose of this, there's actually a GitHub link over here which actually has the entire source code for this demo application. But basically believe me when I say it's a very trivial application where essentially there's an HTTP get request where you pass in an argument called itters. So this key will basically point, will have the value of the number of iterations. And what happens over here is that every time this function compute E is called, it is called with the number of iterations that were passed into the HTTP call. And this function just as an iterative approximation of the value of E and will produce a number in the output. And then this number gets returned on the HTTP handler. So it's a very, very small straightforward function. There's an HTTP handler that computes the value of E and it returns the value of E. So let's say we were running this on problems and we really want to understand and we really want to understand what the particular value of I was in one of these loops or what the value of iterations were. I think someone's not muted, sorry. Hi, attendees, this is a thing for you. Could you please mute? Joseph, can you make sure like every attendee is muted right now? Can we do that please? Hey, sorry about that. I was just going on my audio. I understand, sorry about it. Yeah, yeah, no worries. Okay, so normally what you would do is when you were just debugging, you would go in here and you would add like a printf and say you want the iterations. If this code is running, you can't really do that. So you basically would do something like a debugger. In a debugger, you can basically tell it to stop the application at any point and then you can print out the variables you're interested in. So over here, the application gets stopped. You can print out the number of iterations or even print out the value of I. So it's pretty straightforward, right? You stop the application and get the values. You can also get all sorts of other information like what is happening? How did you actually get over here? So by looking at a stack, you can kind of believe what I said where there's a serve HTTP function and that basically calls this function which calls computing. It's a very straightforward call stack, but you can get all of this information in a debugger. So just to level up, what a debugger allows you to do is to inspect the current state of the application but also inspect it in a context where, you know, how you got here. So, you know, with that in mind, like one of the nice things is you get all the information you need, right? You can get this pretty easily without having to go and manually recompile your code and add in all sorts of statements. And one of the cool things is you can actually use this to debug remote binaries. And you can use this to then debug things that are running in frameworks like Kubernetes where you can attach to some remote pod on Kubernetes and get the data that you want. And there are actually some tools that make this really easy. So one of the tools I want to call out over here was something called Solo. So if you use Solo, you can actually go and run a debugger on like a Kubernetes pod pretty easily. So then the next question comes in. Well, you can sort of run this in remote machines. You can even run this on things like Kubernetes but you really want to run this stuff on production code. And, you know, if you are really at risk it, you can attach your debugger to a production binary and you're just used as a remote attachment. Unfortunately, the problem is that most, you know, debuggers, they mutate the state of the application, they stop the execution of the code when you get to a break point and you can lead to like really unexpected failures. For example, you might have like timeout, you might break like other parts of your application just because you're stopping execution. And, you know, this is something you probably don't want to do in production. So with that in mind, you know, what would you do, right? So your options are you go back to option one, go at the log state and see a program, recompile, redeploy, see what happens. You can consider investing in some more comprehensive solution like open tracing. You can just, you know, say, well, whatever, I need this data, I'll just run a debugger. That's your option number two. You can try looking at other Linux tracing utilities, right, but, you know, we're all here to learn about how EBPF does this, because that was the title of the stock. So we'll take a look at option four. So for that, let's try to understand what EBPF is. So the best way to think about this is that EBPF is a sandboxed virtual machine that runs inside of your Linux kernel. And what I mean by that is you basically take a bunch of very restricted code, usually written in C, but it's like a small subset of C. You transform it to this target called BPF bytecode. And then you tell the kernel to run the BPF bytecode. And what the kernel does is it'll verify that the BPF bytecode will basically not break the kernel, right, and it'll go and make sure that all the accesses to memory and stuff are guarded correctly, and that'll terminate in some, you know, number of instructions. And the exact number of instructions is dependent on the kernel version, but it'll determine that this program will terminate. And if some of you are thinking back to the CS theory that involves solving the halting problem, which is not really possible. So that should tell you that you can't write like a generic program, it is actually pretty restricted. So while BPF is very powerful and allows you to write programs in like C, it is a subset of C and gives you a subset of the functionality. So if you go down to diagram, what really happens, right? Normally when you insert code into the kernel, you have to go around a kernel module, it's a lot of work, you have to get someone to install the kernel module. With BPF, you basically go and inject this code into the kernel, the kernel deems it to be safe, and then allows you to access various surface areas inside of the legs kernel. Tools that are, you know, originally, actually, EBPF was created to do packet filtering or network filtering, and it was actually designed for that use case. Hence, it stands for the enhanced Berkeley packet filter. But today, because of this functionality, it's actually been extended to many different use cases. And part of the thing you can do with EBPF is monitor system calls, which means that any interaction with the Linux kernel API, or you can even do things like interact with applications through a thing called Uprobes, which is actually what we will look at in a second. So there are these things called Uprobes inside of EBPF, and what a Uprobe is, is basically a user space probe. And what that allows you to do is if you're executing your application, when you get to a particular point in your application, it allows you to run a BPF program for some small period of time, and then jump back to your application. So we're gonna actually, in this small talk, go ahead and build a Uprobe that captures data from the application and then reads it out in a separate binary. So there are a few different boxes over here, and this isn't like a precise diagram, but approximately what's happening is your application's running, it's running on top of the Linux kernel. You're gonna create this Uprobe hook that basically will run every single time you get to some part of the application. It will write the data to this thing called a perf buffer, which you can think about as a memory buffer space that's available. And then there's a second binary called the tracer, which will read data out of the perf buffer, right? So we're actually gonna read data from the application in a second binary by utilizing these Uprobe functions. So let's walk through how that works. So in order to do that, we need to go through and really understand how binaries work and where the data comes from. So there's this cool tool called OBJDump, which basically allows you to dump objects and dump their symbols out. So you can basically say for this binary, give me all the symbols. And if you actually look over here, there is that compute E function we're interested in. And we know that it's located at this address 6609A0, right? You don't really have to memorize that address because I'll bring it up for you again, but we know that this is where that's located. If I dump this out in a little bit more detail, sorry, there was a lot echo there. Okay, if I dump this out in a little bit more detail over here, so I basically say OBJDump.d, which says this is symbol to code, I can see over here that some instructions are made to read values on the stack pointer. So actually, without going into the details of what's happening over here, believe me when I say that it's actually moving the iterations argument, so we know where the iterations argument is located. The thing that defines where the arguments are located is called the applications ABI, and there's actually a spec for where those are located. Cool, so now we are at this place where we are using U-Probes. So let's figure out exactly what happened. So here is the application binary, remember that address 6609A0, that is where the compute E function is located. Main's actually located a little bit lower, we don't actually care about it, I just put it over here for reference. So what we're gonna do is we'll take that address 0-0-6-6-0-9-8-0, and it will add this U-Probe hook. And what happens is every time this U-Probe hook gets executed, it'll run our BPF program, and then it'll write the data as a perf buffer. Okay, so far this is pretty simple, and the way we actually write the U-Probe hook is we tell the Linux kernel that we wanna modify this binary, and then have it called a U-Probe on a specific address. And the Linux kernel will very nicely do that for us. So what does this BPF program look like? Well, it's actually pretty straightforward, right? So I said there's a perf buffer, so we're gonna call this perf buffer called trace. And basically this function gets called every time compute E is called. And the compute E function basically takes this thing called PTregs, which is a pointer to the registers effectively, or the context in which this function was called. And again, like I said earlier, the API defines where the values are located. We know that the input argument is getting stored in the AX, so we basically read the value of the input argument, and then we submit that to the perf buffer. And the full example of the source code is located down here, but this is kind of like the meat of what's happening. The rest of it is mostly boilerplate. Oh, look, it's time for a demo of how all of this works. So I'm gonna jump in over here. Hopefully my screen is still visible. Cool. I have the app binary over here. I'm gonna just open that up quickly just to show it. So here's the compute E function, and then here's the HTTP handler for E. It basically does exactly what I told you. It parses out the argument, and then it calls compute E on the number of itters. So I'm gonna make this simple, and I'm just gonna run this binary. And then over here, I can go ahead and hit curl. I can see that when I call it itters equals one, it says the value is two. Call it itters equals 10. It says the value is 2.7183. So it's much better, right? It's better approximation with the more iterations. So now what I'm gonna do is, I'm gonna take a look at this thing called the trace example. Sorry, it's a little bit small, so let me make this bigger. Okay, so the BPF program has to get sent in as a string, but you can see over here that it's pretty small. And the reason it's a string is because it's going to get compiled on the BPF bytecode, right? It's not exactly that you're getting, it's getting the go code, it's getting directly translated. Then there's a bunch of boilerplate. Like it says, create a new BPF program, load a uprobe. This is the key part over here. When you call a catch uprobe, it actually modifies the binary and then sticks the function in there. And then we create this perf buffer called trace, right? And then we say, okay, this table is actually a perf buffer. And then what we do over here is that in go, we just write a loop that says loop over while you're reading the perf buffer and then print out the value. Okay, so it's a pretty, pretty straightforward example of how the perf buffer works over here. So now I'm going to run trace. I'm going to make this window a little bit smaller. Okay, so I'm going to run trace and trace needs to know what binary is being used. And it's the app binary. So BPF programs need to be run as root. Okay, cool. So the program's running now. So I'm going to run it, it already equals one. And look, it actually told me that I called this thing with value equals one. And I didn't actually do anything to this application. This application has been running the entire time. Just by merely starting this program, I'm able to intersect all the values for it. If I run it, it already equals 1,000. You'll see it now says the value equals 1,000. If I run without any iterates, you will see the value equals 100, which is actually the default value if you look at the code. Cool, so basically what we have done over here is we have managed to use BPF to build a very, very complicated debugger, right? We can now write code for a debugger and then essentially use that to intercept values. I wouldn't really recommend doing this in practice, but it does show the capabilities of what BPF can do. So I pointed out over here that utilizing this for debugging allows you to actually pretty easily add instrumentation production binaries, but it's not that trivial to do because of all the complexities of the ABI. And especially when you consider that there are all these fancy abstractions like interfaces and channels, this thing is pretty easy or pretty difficult to get right. But one of the things that I did want to leave everyone with is that you can actually go build fairly complicated things. So for example, I will actually show the exact same program that's running in the app over here, but instead of tracing the value of the argument, I'm not gonna start tracing the HTTP requests. So there are a few different ways to do it, but I'm gonna show you one example of doing that over here. Not gonna get into the code, the code's available online for you to try out. This example needs the process ID, so I'm gonna grab the process ID. And again, I should remember to run it as root. Okay, so now I'm gonna run this and look over here. Now I'm tracing the entire HTTP requests. It says here's the status code, here's the length, and here's the response body, right? Which is what we see up here. If I run this with itters equals one, you will see the response body is two. If I run this on some random endpoint, it should say 404. So one of the cool things about this is now I'm tracing a generic HTTP without actually having to write any specific code to this application. So ultimately, I wanted to call out, before I go into what we do at Pixie, I wanted to call out some related projects, right? Like that we have been inspired by Kinbolk. They've done some really interesting work with EVPF and have this tool called Inspector Gadget to check out. Sysdig has also done a lot of phenomenal work with EVPF and we've been inspired by a lot of it ourselves. And then as I mentioned, I work at Pixie and we're actually working on a lot of EVPF related things. And we'd love for everyone to try out Pixie. Pixie will be open source in the next few months. But as of now, there is a completely free version of Pixie to try out. So you can just go to workwithpixie.ai and sign up for an account. It won't actually give me the sign up since I already have an account, but here, if you go to workwithpixie.ai, you can sign up with an account and do a very, very simple install. You can go to our docs page and there's a very simple command you can use to run Pixie on your clusters. And if you do that, you'll actually be able to run this on any Kubernetes cluster and get instant visibility. And since we go and automatically instrument all the code for you, you can go pretty deep into taking a look at your services. Like for example, I'm diving into a service over here and then if I, I'm sorry, I went to the wrong link. If I go into PLC, which is actually our Pixie cloud, you can actually see that we have discovered the service map for Pixie. So there's a proxy service, an API service and several other related services. We automatically get all the information about it. And you can even dive a level deeper and even figure out like what are every single like slow HTTP requests. So including all the errors and messages that you get over here. So yeah, please give Pixie a try. And like I said, it'll be open source soon. And we even have capabilities for you to do dynamic logging without having to edit your code or write massive amounts of BPF code. So we'd love for everyone to try it out. That's all I have. So thanks everyone and please let me know their questions. Okay. Thanks Zain, that was a great talk. A lot of insights, a lot of hardcore demoing. Love that about it. I think we have a couple of questions. I'll first take from the Q and A channel that we have here on Zoom and then go over to YouTube and maybe address a couple of queries there. Okay. First someone anonymous asks a simple question regarding the tool that how are the Tracer and the BPF program able to access the same memory addresses? Sorry. How are the Tracer and the BPF program? Okay, cool. I think I understand the question. Okay. So the question was how are the Tracer and BPF program allowed to access the same memory addresses? And I think the answer is they're not. So what happens is that the BPF program actually runs in, you know, basically in kind of like an intermediate kernel space. So when your application, in your application code when the probe gets invoked, it jumps into the BPF program. And the BPF program runs in the context of the application, you know, by the kernel. And the data then gets written to the perf buffer, which is a separate entity that the kernel gives you access to think about a shared file. And then the Tracer is actually reading from that shared file. So they're never actually seen the two programs together. Makes sense. Yeah, I think that answers the question. There's a related question to perf buffer, which I will take up right now. This is asked by Madhav Jivrazani. Is the perf buffer related to the Linux perf utility or is it something that is EBP specific? The perf buffer is actually used by the Linux for Fertility. So they're all related. And also the perf command of Linux does support some EBPF related things. Cool. So, okay, there is a cute question here. Amitech from YouTube asks that, is it a good practice to log stuff as much as possible? It might increase IO on the server. Also to debug issues which only occurs in production, should we enable more logging on the go to debug, on the go to debug the problem? Yeah. So I think this is again, a more high-level generic question, not any EBP specific thing, but then if you want to answer that. Yeah, I mean, I think in general, I'm a pretty big fan of logging, but the pros and cons of logging are, logging is great, but it also adds a lot of noise in the sense that it adds a lot of data for you to have to fill through later. And it also just increases a lot of the IO load on the server. So at some point, there's a trade-off. There is something that's too much logging, right? You're gonna have a pretty significant performance impact. So right. Also, the other second order effects that are like logging can actually, it can introduce like technical debt and issues in your code because the log statements actually have to match what's happening. Yeah. So it does introduce some maintenance ability problems. And I think in general, it's always good to have like high-level logging that you can enable, but at some point it can get too onerous and that's part of the reason why we like things like the ability to automatically inspect the code without having to go modify it. Makes sense. I think, yeah. I think that answers the question asked by Amitek. Yeah. They want us to add links to the YouTube description. Yes, Amitek, we will be posting the links and the PPTs after the talk and you folks will be able to see the description in the conversation here to grab the links off there. Okay. I think there is another question. Yeah. I don't think there are more relevant questions here. So let me fall back to Zoom and maybe we have time for just one more question. I think this is Darshan asked this question. Is it possible to write a BPF in Golang? BPF program. I don't know. There's like a serious effort to get the actual BPF programs written in Go. There might be. I just don't know. Most people write them in the restricted sieve because that's the easiest and most verified way of doing it. And usually you don't want to write really complicated BPF programs because they're kind of hard to understand and maintain. So just because of that, it's probably not worth it to try to write it in something else. One thing I should mention is that in Pixi, we use Python to drive all of Pixi. And we have like a single command that says print out the arguments for this function and then regenerate the BPF code for you. So it is a little bit more abstracted out. Yeah, I think this has been a question I have been personally wanting to ask for a very long time. When would we probably see like probably this year or next year something sort of a DSL built on top of BPF APIs? Like slightly more higher level where we would be able to sort of instruct the BPF program to start giving us insights without sort of, let's say, knowing the very low levels as CIS calls and the details because if the directives in BPF program are very specific CIS call editors, right? So is it possible to write a DSL to sort of abstract that part out on top of any sort of APIs whether it's Python or whether it's Golang? Is it, would we see that in your future? Yeah, so I actually just posted two more links which is in Pixie we allow you to do that using our DSL and our DSL is basically Python, like it's based on pandas so we have some additional stuff to basically mix in like data processing along with the instrumentation side. So you can actually go and if you get to get a chance like try it out because it's completely free to use and it will be open source shortly but we do have the ability to very rapidly grab data from our DSL without having to go and write BPF code.