 Thanks, Zane. So this is an exciting one. We did give a bit of a teaser in the last Pixie Knot meeting about this feature. And so we're really excited to show off the real deal today. Before I dive in, I did want to reiterate what Zane said. This has truly been a team effort. So a few shout outs. Pete is the newest member of our BPF team and has done some amazing work with getting the profiler up and running and then Michelle James on the UI. And the whole team, we can't call out everyone, but truly a team effort. With that said, let's talk about the profiler. So our vision or our motivation for the profiler was that we wanted to make debugging performance issues easier. It's hard to do even locally with the tooling that you require, the expertise you require, to install things, recompile things, redeploy. It gets annoying, but it's even an order of magnitude more difficult once you start talking about debugging performance issues in production when it's on a cluster that you don't control. And as application developers ourselves, we always wanted to be able to debug our own things, but it found it very painful. And so our goal was to be able to create these on-demand flame graphs or essentially visualizations of what your application is doing without having to jump through any hoops. No instrumentation, no redeploying, no recompiling, just works. That's always kind of our vision at Pixie here, is like, how can we make it super easy? No instrumentation. You don't have to do anything, it just works. And so what we're really proud to show off this time is the always on profiler. It works for compiled languages. So as Zane said, go C++ Rust. We haven't tested other compiled languages. Probably does work for those as well. And then more languages, support for more languages, things like interpreted languages and Java and things like that will be coming later. The basic idea behind the profiler is you're looking at your pod in Pixie. You have all the other information about your pod, like the request throughput, the latency, all the other great metrics we give you, but wouldn't it be really neat if you also could just see exactly what your application is doing? And so you can actually see these flame charts and see what function your application is spending its time in, where your hotspots are, things of that nature. And so that's what we're gonna show off this week. Before this time, before we get into an actual demo, I did wanna talk a little bit about the basics of profiling, just to level set everybody. So basically the way that the Pixie profiler works is not all that complicated. We periodically sample what the CPU is doing, what function it's in. So about every 10 milliseconds or so, we'll kind of interrupt the CPU and say, where are you right now? And we can figure out what function you're in, but not only what function you're in, but the entire call stack. So we can figure out, for example, that you're in DB send currently, but that was called from run query and that itself was called from main. And so we can get an entire stack trace of where your code currently is. And that gives us one sample and then the application resumes. So the interruption was very short, right? And since we only do this every 10 milliseconds, the performance interruption is hardly noticeable. But over time, we collect enough samples that we get a pretty good picture of where the hotspots in your application are and where your application is spending its time. So if you collect enough samples, you'll get that signal out. There's a number of different ways to visualize these results, but one that has been popularized by Brendan Gregg and has a really nice visualization is called the Flame Graph. And that's the one that we're using as well. A Flame Graph is really good for visualizing hierarchical resource use. So things like disk usage or CPU utilization, it's great at showing those sort of things. And so I have a very simple example here on the page. And if you look at the bottom bar, that kind of represents the entire pie. So we have all of our application is spending its time in the main function because that's the main, right? And then from our samples, we realized that about 30% of the time when we interrupted or when we sampled what the program was doing, we were actually in the run query function. That was called from main. And 60% of the time we realized that we're in the right results function that was called from main. And so we're starting to get an understanding of where the application is spending its time. You'll note that that doesn't add up to 100%. And that's actually correct. What that tells us is that 10% of the time, the remaining 10% of the time, it was in neither of these functions than it was actually in the body of main. So main itself might have had a loop or something, some code in there that's actually processing some code. And so 10% of the time when we sampled, it was actually in main itself and not in any of the other child functions. And then this is hierarchical. So we can take a look at run query and realize two thirds of the time run query is actually in this function called DB send and so on and so forth. And you get an understanding of your application's behavior. Now, when you collect enough samples and build enough of a visualization and if you were to color this with reds and oranges and you looked at it, it would kind of look like a flame which is where the name came from, flame graph, right? And that's what Brendan Gregg popularized. We're using pixie colors here. So it doesn't look as much like a flame, but we're still gonna continue to call it flame graph because that's kind of the standard. When looking at these flame graphs, typically what you're looking for is wide bars, right? Because if something's really narrow, it means that it didn't really eat up too much time. So you don't really typically care about that. So generally, if you're looking at these things, your eyes should just gravitate towards the really wide bars and that tells you that you're spending a lot of time there and that's someplace that there might be an issue, potentially maybe not, but potentially an issue or maybe somewhere you want to optimize, but something you might want to investigate. So with that, I wanted to show off a real live demo. So I'll switch to the tab here. And we'll start off at Pixie's homepage. And here we have all the namespaces. And for this demo, we're gonna pretend that we're an application developer. And we are an owner of the online boutique application. It's an e-commerce site, one we've seen before. But it doesn't really matter all that much what it is. It's an e-commerce application and we're going to want to investigate what's going on with our application. So right now everything's good, but we just want to go check on the state of our application. So we click on the online boutique namespace and let's say we're interested in a particular service called the product catalog service. I'm gonna find the pod here for that in the list and here it is. So I'll just click on that and then it's gonna load a whole bunch of information, a whole bunch of really useful metrics. So we see that we're serving about 130-ish requests per second out of this service. That's great, we know from previous experience that's about right. Maybe we can see the latency and we see, okay, that's also healthy. We're consuming about 8% CPU. Maybe not great, maybe something we want to optimize, but for now let's just assume that's okay, that's normal for us. And then if you keep scrolling down along with all the other information you can see, at the bottom you'll just automatically see the swing graph. So if you go visit Pixie today, it's been deployed, you'll get this feature. And so let's look at this for a little bit. So at the bottom we can see that there's, we always have this all bar at the bottom and then next up we have the pod. So we're looking at the pod view, so there's one pod here, it's the product catalog service. And in this flame graph, so at the bottom we're including all these kind of like metadata information about Kubernetes. So we say you're in the product catalog service. This has one container. And so we see one container here, it's called the server container. If you had more than one container in the pod, you would see the other containers here, but we only have one here. And then this container has one process and that's running this product catalog service slash server process. Again, if there was more than one, more than one process running within the container, you'd see the different ones listed here. And then so these dark colors all represent these kind of metadata fields. And then once we get to the light blue, what you're seeing is actually now inside of our product catalog service code, what it's doing. And light blue represents user space code, the yellowish color represents kernel, kernel code. So time spent in the kernel. And so you can see here, first of all, that we have this runtime.goexit function, that's kind of like our main here. You can also see it, look at the tool tip and realize that we're spending about 65% of our time here. And then you can go up. Again, as we said, we really wanna look at the wide bars and you can kind of hover over and see that we're spending 22.5% of our time in this gRPC function that's handling connection. And we can kind of look around and look for interesting things within the flame graph. You can also just some of the mouse navigation, you can scroll up, you can pan through here, see what's happening, hover over different symbols. You can also zoom. So if let's say we can't see what's going on here, you can use your mouse scroll to just zoom in and see exactly what's happening in the code here. This is all kernel code again. So the symbols look a little bit weird for those who are not kernel initiated, but you can go and look at your code here for your application as well and figure out what it's doing. You can see here, there's some marshaling, some protobuf marshaling going on here. So that might be something you wanna take a look at. So you get all this information. Again, I'm gonna repeat myself a number of times that you get it for free. You didn't have to do anything. The only requirement is that you had to have debug symbols in your application so that we can actually show you these human readable symbols. Otherwise you'll still get all the information but it'll look like addresses like this and won't be as useful. Now, while we were kind of showing off this flame graph just to kind of play out a little story here, we got alerted where the product catalog service owners and we got alerted that there's an issue. So let's go back to the top of this page and I'm going to rerun the query and let's see what's going on. Oh, shoot. We were at 130 requests per second but something has happened in the meanwhile and our throughput has plummeted. So that's very disturbing. The latency has shot up through the roof. It's taking 16 seconds to serve a request. Interestingly, the CPU utilization, even though like where our throughput has plummeted, we're actually burning more CPU now. We're looking at like around 18% CPU. So something's on fire here, right? We have to fix something. Other information, you can see the bytes read and bytes written. So a lot of IO traffic is going on. That looks weird. We're not expecting that. What's going on, right? Maybe the flame graph can help us out here. So we come back down to the flame graph and see what's going on in the last five minutes. And you'll notice this flame graph looks very different than the one I showed earlier, even though it's the same application. And what we notice immediately is that, I mean, my eye is going towards, oh, what's going on here with get product parse catalog, read catalog file? This was not in the flame graph before. And as the application developer, I kind of know that the catalog shouldn't be read so often. We should only read it once at the beginning and we should never read it again unless somebody has triggered kind of like reloading, like the catalog file has changed and it's reloading. So what's going on? This has given us a big clue about what's going on in the application. And then we pull up our application code and go take a quick look at this parse catalog function since that was in the stack trace. And we take a quick look and you don't need to understand all this code, but there's a flag here that says read catalog. And that's kind of the only way that we would go in here and actually read the catalog file, either that or the products or the list is currently empty, which we don't believe to be the case. Otherwise we should just return the last one and we shouldn't reread the file over and over again. So we're a little bit suspicious that maybe this reload catalog flag has been raised. And we know there's a way to trigger that. There's kind of a signal we can say to say the catalog has been updated and tell the application to reload the catalog. But there's actually a bug with that. So the code for that is actually up here. You can send the signal and it'll turn the reload catalog flag to true. But somewhere along the way, a bug was introduced where this flag has become sticky. So once you trigger a catalog reload, it reloads the catalog, but it doesn't set this flag to false again. And then essentially it just keeps continually reloading, reloading, reloading, reloading the catalog and you're gonna eat up all your CPU and your throughput's gonna drop. And so, I mean, the fix is obvious here. In the short term, there's actually a way to send a signal and say disable the reloading catalog. We'll probably do that right away to stop the fire and then we'll probably go and modify the code so that when you do trigger this thing, it reloads catalog once and then doesn't make this flag sticky. And then we would have fixed the problem. So kind of the story would go that, within a matter of minutes, you kind of figured out what the issue is. You got code insight into where your application is spending its time. You jumped to your code, you fixed the issue and crisis diverted all through this profiler feature. Now, granted, this is a bit of a toy example, right? We understand that. We were just trying to show off a little bit about the feature, but I do wanna note that we've actually used this feature ourselves internally in a number of cases. We've used it to optimize our own code CPU use. We've used it to debug functional symbols. So there are more realistic examples of what you can do with this sort of stuff. But again, I'm just gonna repeat myself. It's like our vision for this stuff was, let's get this out into the hands of application developers so they can figure out what's happening with their code without having to lift much more than a finger. Just have to scroll down and see in the Pixie platform what your application's doing. So that's the end of the demo. Gonna switch back here. And so just wanted to wrap up with a few, just a few points, some frequently asked questions. And then we can open it up later to more questions. Love to hear what people think. But some frequently asked questions. So when is this running? It's always on. We really wanted to go with the no hassle approach. Continuous profiling is kind of a buzzword that's gaining some traction. People just wanna leave this stuff on so it's always profiling. It's kind of too late sometimes if you've hit an issue and then you have to go figure on how to turn on the profiler and then just adds more burden to the application developer. When there's an issue and you're firefighting, you just want the feature to work in. So that was our approach with this. What languages does it support? We touched on this. Go C, C++, Rust today. Debug symbols are required. That's really the only catch. I will note that languages like go by default leave the debug symbols in. So if you compile with the deep, unless you go to an extra effort to explicitly strip out debug symbols, go applications will have their debug symbols there. So we'll just work out of the box and then support for more languages is in the works. What is the expected sampling frequency? We touched on this. We roughly do it about every 10 milliseconds. We'll grab a sample. So that's low enough such that the overhead is really low. And then over time, we accumulate enough samples to give you the insight you need. And the next question is really touching on like how many do we batch together? So even though we're sampling every 10 milliseconds, we'll collect for a 30 second window, batch those all together in kind of one bundle and then publish this up to the UI. So you get kind of a 30 second visibility at the UI level. And then I think the big one is like, so what is the performance overhead overall? And so currently we're actually less than half a percent overhead. We're actually, in most cases, even lower than that, much lower than that. And we expect to actually see this decrease, this overhead to decrease as we continue to optimize the feature. So but it's fairly low overhead. So that's really what justified the decision to just leave this thing always on. And so, yeah, you should encourage everyone to just, so that's it about this feature. Encourage everyone to go to the Pixie site, play around, just go to the pod views, take a look at it. It's also available in the node views. You can see the profiles for nodes, go play around with it, let us know what you think. Okay, this looks pretty cool, by the way. I have a couple of questions about like, so it looks like this is the CPU profile, right? Like, do you have any plans for heap profile? I've seen a couple of like other folks getting like heap profiles from UVVF, although they said like, it's more expensive and I just have absolutely no idea what's on your mind in terms of future plans. Are you, you know, planning heap profiles as well? Yeah, it's on our radar. It's one we have our eye on. We have been busy with the CPU profiler for now, but definitely it's something that we're looking at. Yeah, makes sense. The other question that I was about to ask, like, are you planning to like make the sampling rate, you know, kind of like configurable because I'm talking to a lot of people and they're like okay to, you know, maybe dynamically increase it for a while and like kind of like, you know, reduce it in order to get like better samples. Like, I think everybody's like fine, unless like the overhead is like under, you know, single digit like percentage, at least if they want to enable it for a while. So I'm curious. Yeah, that's one we've debated. It's one we're waiting to see, like we're trying to figure out whether it actually gives you more insight if you sample more frequently or if, like typically you're looking for those very wide bars and sampling more frequently doesn't really help with that. You're going to catch the big issues with the sampling rate that we already have. But if there's enough, you know, legitimate demands for increasing the sampling rate, then that's obviously something we'll consider. Cool, thanks. I have a quick question. What's the overhead of this? You know, if people start deploying tons of stuff and it's on by default, what is the expected overhead in general, just for someone taking a glance at this? Oh, I mean, do you want to take that? Yeah. You're talking specifically about the profiler feature, correct? Yep. Yeah. So the profiler, the great thing about it is since we're sampling every 10 milliseconds, it doesn't matter if you have, like to first order, it doesn't matter if you have more applications or less applications running. It's kind of a fixed overhead or a fixed cost that we take. So we're going to interrupt the CPU every 10 milliseconds anyways, whether there's something running on there or not. And so you throw more applications on, we don't really change. You have more CPU use, CPUs are going at, you know, 100% doesn't, to first order doesn't really change what we have to do. And so when we say we're less than half a percent overhead, that's under a number of different environments. It can be a light load, it can be higher load. We're going to stay under that half a percent. Thanks. Before we wrap up, can I ask another question? Do you have any plans to also collect, like, you know, kernel phrases, like just overall, like not like specifically an application because people sometimes want to see a broader, you know, view of what's going on on a particular node and then they want to kind of like jump into, like, what an, you know, like application specifically does. I just wonder if this is in the roadmap or not. Yeah, all right. Full of the node view. Great. Yeah, great question. So I'm going to come back to the demo here. So if you come, this is the pod view, right? We were talking about this is the example I had at the end of the demo. If you scroll up here for this pod, there's actually a link to the node that it's on. And so I'm just going to click on that and let's give it a second to load. And so here, this is actually a different page now. We're on the node view. And we actually have an embedded flame graph for that as well. And so here, what you actually see is we have all, which is the entire node. And then you can actually see all the different things running on this node. So you'll see in the online boutique namespace, there's a product catalog service, there is a currency service, there is a recommendation service, there's an ad service, the front-end service. The big one that's eating up this node CPU is clearly the product catalog service in this particular namespace. We have a sock shop application running. We have a cube system itself is there. So you can actually take the node view and just see everything that's happening there. This is pretty cool. Yeah, for the whole node. Yeah, I guess one thing I wanted to add is since we have our entire distributed data system, we can actually process this data in many different ways. So we can take a look at for a given service, like how much is it consuming on different nodes? And then also view that in the flame graph. And so we can basically merge data from many, many different sources and different nodes and connect it all together. Do you have any plans to export this data in terms of like in another data format, maybe people for people who are collecting that data for like long-term retention? Or is it like kind of like right now, everything is here on this dashboard and there's no plan to export it? We do have the Pixie API and all this information is actually in one of our tables. So if you want to access it via the Pixie API instead of through the UI, you can actually pull it all. And it wouldn't take too much massaging to get it into the, like we natively store the stuff in the format very close to the preprop format. We use what's called a full bit stack trace. So it's all pretty much there. And so with a minimal amount of work, you could probably get it piped into some other tools if you really wanted to. Oh man, you can probably pull up the tech up on the bottom just to show the data versus view flame graph, yeah. Yeah, that's a good point. So the raw data that you could access through the API would have things like node, all the metadata stuff, but the main one here that you're seeing in this column, you can see the stack trace. And this is the folded P prof format. So it's essentially symbol, semicolon, symbol, semicolon, like that's how the stack trace is. Yeah, it's very similar, yeah. Yeah, so you could pretty much just take that and then there's a column that says the count, how many times this particular stack trace has been observed. And so you can essentially just take this stuff and we can sort by this and see, like there's some stack traces that are more common, but you could essentially take this stuff and that's what I was saying, it was just a little bit of massaging, you could get it back into the P prof format. Yeah, pretty cool. Yeah, so you can export this either using the Pixie API or also the CLI can export this and write it out in JSON or other formats. Yeah, thanks.