 Hello, everyone. My name is Philip Kuznetsov, and I will be presenting root-causing incidents without redeploying prod. Cool. A little bit about me. I am a software engineer at New Relic, where I spend most of my time building Pixi. Let's just jump right into it. So here's the situation. Imagine we're working at an e-commerce company called Online Boutique. We're selling a bunch of hip, trendy items, and things have been shrugging along pretty well recently. We've had no problems, code is shipping, everything is great, until today. The front-end service is panicking. Something is happening there that we don't really know what's going on. What's worse is we haven't released code to the front-end service in weeks, so there's no way that any changes in the front-end service are responsible. This is a classic problem in microservices. You have dependency trees across multiple different services that, when one thing changes in one service, causes a break in another service. So one thing we did change recently, which we highly suspect is the problem, we recently added this new product item called Sticker, a simple online boutique sticker, and we don't know why this is causing us any problems. The search. So we're looking in the logs, and we see this one log with the panic that says one of the specified money values is invalid. It's a bit weird. Our money values is weird. Okay. We keep seeing this error, and it keeps getting returned by this sum function. Before I dive a little bit deeper into what the sum function is, you should understand why we need to have a sum function in the first place. Why don't we just add these together? The way money is represented inside of our service, let me drop a head. It's split up into the units section and a nano section. The units is the values before the decimal point. The nanos are the values after the decimal point, but represented as nano units. So a billion nano units are one unit. Unfortunately, whenever we see this error from the sum function, we don't know what value is actually invalid. It's like what is actually causing this problem. A valid money type, as I mentioned before, is one that has matching signs between the units and the nanos, so both positive or both negative, and that the nanos are in absolute value less than a billion. For some reason, some value showing up through our system is not matching one of these conditions or both of these conditions. Unfortunately, it's not that easy to tell what happened in our environment. Ideally, we just go and add a log in here and just print it out, but in many different environments, especially production environments, that's not really an option. We might try to reproduce this locally, but sometimes a bug doesn't really appear locally. It's hard to reproduce, and again, we're searching for that root cause here. Sometimes deploying a prod just takes too long. There is a situation where you might just go and add this log in and hope that you kind of discover what's going wrong. I guess the severity of this incident is rather low, so it would be okay to go and add this log in, but that also brings in its own risks as well. Maybe while we're adding this log in, we sneak in some code that actually breaks things, such as some buggy code or something like that, and then often, for many of our production environments, we have to comply by certain rules, and deploying a production willy-nilly, especially adding just logs or something like that, is really kind of out of compliance and something we don't want to do. If only there was another way. I wonder where that way is going to come. Okay, so I'm going to show you. BPF Trace is a great tool that's going to help us add these logs in production. BPF Trace is a way to get kernel-level visibility inside of your running applications. As it mentions here, it's also a high-level tracing language for the Linux EBPF, and EBPF is really just a way to run sandbox programs inside of your kernel, sandbox being something that guarantees safety and security for your executing programs. Specifically, EBPF was designed to allow kernel developers and actually people who are not just kernel developers to extend the capabilities of the kernel and get access to the high quality data as well as the privilege context that the kernel provides. The basics of EBPF and I guess BPF Trace as a result are, for us, we're going to add this probe called the Uprobe, and that's going to intercept the running program, run the EBPF program, and then that EBPF program is going to ship the data out that we want to collect. So in our case, the arguments to this is valid function. You can think of this as analogous to a debugger, specifically a breakpoint in a debugger. The breakpoint, you add the breakpoint to your code, your code runs until that breakpoint pauses, you can go poke around the variables, get data, whatever you want, and then when you want, you resume the execution and it continues onwards. So BPF Trace and EBPF are very similar in that regard. Instead of you poking around though, we have a program that specifies the poking around, I guess. So here's an example of a Uprobe, sorry, a BPF Trace script specifically on this is valid function that we want to go and add a log to. We specify the type of probe, Uprobe, we specify where we want to go collect this data on the symbol, and then we have this body of the function which goes and figures out how to grab this data that we want, and then we print it out. Now typically when you deploy BPF Trace, you go SSH into a node and you run this BPF Trace script with a BPF Trace CLI. But in a Kubernetes environment where you have many nodes, this can be really difficult, or at least just tedious. You have to go and find this specific node that is running the specific pod that you want to go trace, you have to find the binary path for that specific pod's server that you're running, specify that, and then you can deploy your BPF Trace probe. At Pixi what we've been working on recently is making this process a lot easier. Instead of specifying SSH into your specific cluster, your specific node, and finding that specific binary that you want to instrument, instead what we want to provide, or what we have provided, is a way to specify a set of pods that you want to attach to. You can specify the labels inside of your pod that specify which pod you want to attach to, and you specify the namespace, and then Pixi will take care of the rest of adding that BPF Trace probe program that you want onto that specific pod. So together BPF Trace and Pixi makes it easy to add print statements in your production Kubernetes environment. All right, let's try a demo. Actually, before I do that, so this is online boutique. You can see kind of what's going on here, just e-commerce site. And then what I'm going to do is I'm going to add this BPF Trace program to go and check our isValid call. And so I'll show you that actually before I do that, here's the data we get out. So if you remember that structure I mentioned earlier that split up the float value for money into two different integer values. We have our units value here, we have our nanos values here. And so you can see actually right at the top here we have 99 cent items up here, so that is the sticker. So that looks good. That means actually when we put in the 99 cent item, things are working. So it probably is not just from that entry point. So something in our code is probably manipulating it to cause some problems. And okay, so I have to go trigger the error. And so here's what happens when you go and try to add the sticker to your cart, you add it and there's a crash. So you see an error, runtime panic, no fun, no stickers for us. But now what we can see if we run this again, we can see that we have this really weird representation. Hold on, let me make this a little bigger. Let me run it again so it looks better. We have this really weird representation. We have units and nanos that are not aligned in their sign. And isValid is throwing an error or returning that this is not valid and then we see an error out in our locks. So this is really bad. But what's really good is this actually gives us a big clue onto what's happening. Now I mentioned that nanos are 1 billion nanos equal to 1 unit. So if you convert this negative 10 million into units, that's negative 1 cent. So you have 1 minus 1 here, $1 minus 1 cent, that's 99 cents. So I think I have a better idea on what's actually happening here. This 99 cent value is entering into some of our sum functions. And it's being converted in a really wrong way. I'm trying to think of where we actually call this, but maybe I won't show you that actually. So in our sum function, we call sum a number of times. I'm gonna skip ahead a little bit here. So I've seen that sum is called the number of times. One time it sees a zero and the next time, sorry, one of the arguments is a zero value. The next argument is that 99 cent value that we've added. And then when I was digging into this earlier, I noticed that this condition right here, which is basically another validity check after we've summed values together, this validity check is actually incorrect. When the unit's value is zero, but the nano's value is greater than zero, we actually don't pass into this if statement here. Even though that value, which would be 0 and 99 cents, would actually be correct. So we're missing this statement and then we end up down here. And so here is where we increment the unit. So one becomes zero, sorry, zero becomes one. And then nanos get shifted over to this negative 10 million. So here's that script again. Takeaways, we're able to insert some logs into our running Kubernetes pod. And we're able to determine the root cause of this tricky incident by just seeing some of these logs come out. And just give us a little bit more visibility that otherwise would have taken a long time or taken a lot of effort to get. There's a lot to learn about these tools. I've shown you one example of where BPF Trace can be used. So specifically, we're using a U-Probe. But there are many opportunities, which stands for a user space probe. But there are other places where you can probe as well. So such as the kernel, a bunch of static trace points that kernel developers added, as well as library developers have added. And there's just many great tools that have been already listed out here that you can go try today. On top of that, I talked a little bit about Pixi. I've showed you a little bit of Pixi and shown you how you can combine the tools together to get a nice Kubernetes experience with BPF Trace. The EBPF application landscape is pretty interesting. There's a lot of great tools from networking to observability to security, and you should also check these out as well. They give you this nice access to this kernel-level visibility that EBPF gives you and add a bunch of great features on top of that. And finally, there's a bunch of great links on learning how to write BPF Trace in a bunch of different contexts. Today, I showed you some Go code, which has some challenges. There's some challenges with writing BPF Trace with Go, but it is completely something you can overcome. And it's very valuable once you have this tool in your tool set. I guess with that, I'll be at the Pixi booth in the Project Pavilion. And you can come and talk to us, talk to me and my colleagues more about what Pixi does, what BPF Trace does. And I think that's all. Thank you. Questions? Just raise your hand. I was just hoping you could go back to your Stack Trace code slide. Sorry, the Stack Trace or this one? I can also show it in the application if you want in Pixi. Sure. Yeah. Yep. That's it. Thank you. Oh, OK. Yeah. So do you have to run a Pixi agent on every node? And how privileged does it need to be within the cluster? Yeah, I think you need the exact names for them. But yeah, you need pretty high privileges to run Pixi. And you need to run a, we have a daemon set, so you need to run a pod in every single node of your cluster. Are there any security risks associated with that? Like, can you see that being abused as an attack vector? It's always possible, I guess. But we try to provide some guarantees for access control and everything like that. You can host Pixi entirely inside of your own cluster so that everything is maintained within that cluster. What should I keep in mind to leave my applications debuggable? I see that you have some symbols there, like with the variable names. And when I'm producing a production quality binary, I typically strip the debug signals. The symbols or everything. So what should I take care of to not... What should my future me wish that my current me would do? Oh, with the symbols and everything like that. Yeah, I think we recommend you keep symbols around to make... Is that it? Yeah, I think that's the biggest thing, as well as dwarf infos, typically helpful. I actually don't know the exact difference between the two, but yeah, I think having those two around is very helpful for integrating with a tool like BPF Trace. And what about for other technologies? So if I want to do something similar for Java, is that that easy or what else is required for...? I know Java is a bit more challenging. There's some benefits that the programs are jitted as well, but I don't know actually the details on what you need to do for Java. We have actually implemented a separate feature, not the BPF Trace feature, but a profiling feature in Java. And so there's an option to go look at our commits there. And we have a Slack channel, you can go and talk to us on Pixi on how we did that. And maybe that'll help you with figuring out the BPF Trace on Java or something like that. Wonderful. Thank you. We have a lot of time for questions. Oh, thanks. This might be a totally insane request, but could we see sort of what like a BPF Trace on the node would look like for contrast with the Pixi thing? I can try that out. Okay, well, yeah. The hard thing is I have to go and find that binary. So I mentioned it was hard. Didn't lie. Let me see. I think there's a easy way for me to grab this data. So process stats. I guess I'll, I think this is gonna take too long. I'll just show you what the rough idea would be. So, let me just copy this part out. So we have this Uprobe here and Pixi doesn't require this, but let's say vanilla BPF Trace does, you would typically add like path to app, path basically. And then you would have to go SSH into your node, make sure BPF Trace is installed and then go and add this, then run the script basically. So I guess, yeah, I'd be running that. I'll just save it. A BT. I can do it on this like dev machine. I just, it just won't create any data. Oh man. Okay. I thought this would be easy. Okay, well, let's, before we do that, I'll just show you how to run the BPF Trace script first. So this is a approximation. Let's see. Isvalid.pt. So you'd run something like that. I actually have to adjust it so that it is. Okay, so that's attached now. And then if I can get the front end service running, then it should work. In the meantime, I could take another question, I guess. How long does the trace? Oh, thank you. How long will the trace that's injected last? We set a TTL for Pixie. But if you notice here, actually, when I was doing it locally, it just runs until you control C. So you send it a signal. So if the pod crash loops and restarts, will it get injected again? Yeah, so we're, I think that's the next thing we're working on for this particular feature is just making sure it comes back on. So we just released this feature. I think we're actually releasing it on Tuesday. So follow us on Slack and we'll update you on that. But it's coming out soon. Perfect, thank you. Oh, but sorry, to that point about releasing it, we have a K-Probe feature already. We're just supporting user space probes with this selector thing on Tuesday. To your demon set, can you start the demon set after you've found there is an issue and you want to debug? So does it not run all the time on the cluster? Yeah, you could deploy the demon set after you've run into the issue. It's not requiring, it doesn't require you to go and run it continuously. So if you want to add it afterwards, you can totally do that. And we've made our deploy process pretty fast. So you can deploy and get data within, I think, three or so minutes. And, or deploy, let's say, deploy Pixi and get BPF trace data in three minutes plus the time it takes to write the BPF trace script. Yeah, this can address a lot of security concerns because if it's normally not installed and then you install it only for the time that you need it for debugging, this would be made in production very easy. Sorry, say that one more time. So if you only install it for a short debugging time in production to figure out a serious problem, it's something very good because then the security guys are much happier. Right, right. All the time. Yeah, yeah, because you can delete it and make sure it doesn't, like somebody doesn't see it as an opportunity, yeah. Do we need to make any changes to running pods for Pixi to be able to observe it or Pixi? As long as your pods have symbols, both Pixi and BPF trace should work. And there might be a few more caveats on that as well. It's in our docs. But I know that symbols are important and to some degree Dwarf Info is helpful, but I don't think actually Dwarf Info is a hard requirement, so yeah. Oh, and to be clear, if you're running Go or something like that, your symbols should already be included. Yeah, if you just use the default build. Oh, I think up here. Yeah. Really just kind of a follow up on the symbol thing. Yeah. I was just wondering if there was a way to use an external symbol store so you don't have to embed them in your production images, but you can have them available somehow for your running purposes. We have not designed anything like that before, but we've thought about it, and I think that's a really cool idea. So it might be something. So actually we have Pete back here. He's somebody who works closely on this and he can talk to you about this afterwards. Yeah. I think there's a question back there. Here? Uh-uh. Here? Of course. Sorry. Routing people. So if I understand it correctly, if you have access to the Pixie agent, or at least that UI, you can run any code you want. I guess I was just curious how you control access to those agents and search it or what, I guess, what's the security stuff for that? Yeah, there's a few different options. So there is this data collection side where we offer the ability to redact key PII containing information so you can run in restricted mode. With regards to BPF trace specifically, we have adding more access controls on the roadmap. So I think a temporary solution for, say, getting the observability power of Pixie while we are working on the access control side of things, you can go and pipe that data into some open telemetry collector and create a view on that side, basically. So I guess you can pipe that data to an open telemetry collector, send it to Prometheus, and then use Prometheus views or something like that, or a Grafana or some other tool like that. And that way you can get that data and everything like that while preserving the permission layer, I guess. You can also just remove certain users, I guess, from your system. You can hide certain users, or prevent certain users from signing up, so. Okay, I can respond about symbol stores. We are, I'm also on the Pixie team, so kind of back to the question about that. We are aware that there are already existing symbol store technologies, and we are, that is not on the roadmap yet, but it would be a really, really neat thing to enable both for BPF Trace and also for the profiler. And in general, it would be super useful. I can see that. So should anybody join us on Slack and want to discuss symbol stores where we are absolutely interested and looking to find the capacity and somehow improve along that dimension? No one else? Okay. Oh, there are some questions over there, okay. By the way, we have t-shirts at the door, I think. Is that true, Michel? Yeah. If people want t-shirts, feel free to grab them. Can you explain a little bit how you target what you want instrument? Because here in the probe, is it gonna probe all the methods that have that name or Pixie's smart enough to use labels to target special pods? And if within the pods, maybe a container as well? Yeah, yeah, so we have this full symbol name, first of all. So I guess it's a little bit more specific. But yeah, you can specify the exact pods that you want to attach to. There is this label selector down here. We have our label name and then the label value. So app is equal to front end. So it's only attaching to the front end service that was having the issue. But we actually, we have this code in a lot of the other services as well. But we wanted to ignore those because those services were not having that same problem. Okay. Yep. Any other hand on the other side? Yeah, we still have five minutes of time. Right here. Folks, who is living, please just leave quiet, okay? Is there a performance penalty in having that as part of, you know, Kubernetes demo set or processing Kubernetes instead of having that as a byte code waving or something like that as part of my build process? You're saying you like insert an agent or something into your code. So instead of running, instead of having that been applied to a running pod, use those, the same script to apply during my build pipeline so that those are injected, you know, into the binary itself before I run it. I've, we've never done a like toe for toe comparison between the two as far as I'm aware. But I imagine you can continually optimize, say the byte code injection and stuff like that. But the same way you can continually optimize the EBPF injection. But at the end of the day, you're kind of doing the same operation. So I imagine that the performance is comparable. Right, so. I don't have the great answers to that, unfortunately. That's good. So just to be clear then, you're doing that process once only. You're not doing that continuously. You're not reapplying or applying the same waving to the byte code continuously, right? When you click run, it just only once. Oh yeah, it deploys only once and then it's kept around for the TTL that you specify down here. Five minutes there. Yeah, yeah. And then it does another processing of the pod or the binary to remove the probes. Yeah, yeah. We are like manager service. You can call it that. We'll go and remove the probe that's running. Okay, thank you. I think it's done. All right, thank you all again. Thanks for the great questions.