 So as I said, I'm going to be talking about two different topics today. The first is going to be around DNS. The second is going to be around go tracing. So let's start with the DNS. And first, a shout out to Yasheng, my partner in crime and this stuff, for we work very closely with each other, but I'll be speaking to this stuff. So DNS can be a real pain to debug. We all kind of hope that we don't run into any DNS issues, that it just works, and oftentimes it does. But when it doesn't, it can be a real headache, right? It's really annoying to us. And it doesn't take much for you to just Google around and see all the frustration around DNS issues with Kubernetes and how to configure it and things like that. So at Pixi, what we wanted to do is provide a little bit more visibility into what's happening with DNS on your cluster. And just a quick kind of overview of how we do this sort of thing, it's the same way that we trace other protocols that you can get from Pixi. So things like HTTP and MySQL and Postgres, all these protocols that we trace, we've added DNS to the list. And essentially how it works is we use EVPF to trace the message transfers. So what we're doing is when your application makes a DNS request, it has to go through to the Linux kernel through a syscall and we're probing the send, receive and related syscalls to see what sort of traffic is going on in your system, what your application is sending. And if we detect a protocol like DNS or HTTP or anything else, in this case DNS, and we see that it's interesting, we start tracing that data. And we ship the data to the Pixi Edge module which stores the data in its tables and makes the data queryable, right? So you didn't have to instrument anything, instrumentation, we automatically detect it and boom, you're gonna be able to see all the DNS traffic on your cluster. If there's any issues, any performance issues, any functional issues, you kind of have some visibility to help you figure out what's going on in your cluster. So with that, I'm just gonna jump straight into an actual demo so we can see real time what's going on. So I'll start off here. This is the main page of Pixi and I'm also gonna pull up this shell here. And so the first thing I wanna show, actually I'll put this into, yeah, the first thing I wanna show is the DNS flow graph script. So if you go through the scripts here, this is where you find all your scripts in Pixi. If you go through DNS flow graph, this provides you with an overview of what's happening in your system in terms of DNS messages. I'm gonna click here on enable hierarchy which kind of gives us a better visualization kind of keeps it in a hierarchical format. And so just by writing this one script, you get to learn a lot about what's actually happening in your cluster's DNS system. What we see here is on the left, we see a whole bunch of pods. These are application pods that I've deployed. So there's a currency service, a payment service, a recommendation service. All these things appear to be talking to this central service here called theCUBE DNS, as part of theCUBE DNS service of the CUBE system. And then it looks like theCUBE DNS itself talks to two other entities. It's making further DNS requests itself, some of them to local hosts and some of them to another DNS service called metadata.google.internal. And metadata.google.internal is actually off the cluster. We're tracing the traffic that's leaving the cluster and going off to metadata.google.internal. The other interesting thing to see just from a high level is that these Prometheus pods, well, they're bypassing the CUBE DNS system and going straight to metadata.google.internal. And presumably this has been done specifically by Prometheus as a performance optimization to avoid going to the CUBE DNS because what the function of the CUBE DNS is kind of to say, if you wanna talk to another pod on the cluster. So for example, if the checkout service wants to talk to the currency service, it would consult the CUBE DNS to say, hey, what's the IP address for that currency service? And Prometheus doesn't have a use case for that. So it has optimized itself and is talking directly to the metadata.google.internal. The other thing we learned from just looking at this at a high level is that some requests are going to metadata.google.internal. Other requests, the CUBE DNS is actually forwarding to local hosts, so it's keeping internal. And it's essentially serving them internally. So this one script, you'll have it available. You can run it on your cluster. You'll be able to see kind of high level overview about how your own cluster's DNS is set up. So it's cool. It's interesting just to look at it and see how things are set up. The second thing I wanted to do is kind of go into a little bit more detail. Actually, one thing we can do, another thing I just wanna highlight is you can pull up the drawer here and you can look at individual DNS requests. So you can go and scroll and see if you wanna, you can sort it by latency and see what the high latency ones are and see kind of the body of the request and see what it, you know, oh, this one was going to monitoring.google.googleapis.com. It was trying to resolve that and it did get resolved, but it's high latency. And so you can kind of see what's going on in individual requests as well. So that's important to highlight as well. And finally, one other thing I'll show on this graph is you can hover over the edges and you get to see also some summary stats. It's saying, for example, the average latency of this Prometheus pod talking to metadata.google.internal is 300 milliseconds. That's actually quite slow. It's made six requests or you can hover here and see these are much faster. They're going internally 2.7 milliseconds. Right, so moving on to this. So I'm gonna show a second script now that goes into a little bit more detail. And we're gonna look at a problem that's common to Kubernetes clusters and it's known as the end dots problem. And so I'm gonna pull up a different script now called the DNS data search. And I'm gonna run it and initially it's empty and that's okay. I'm gonna pull up the drawer here. You can see the script that's here. And in this script what I'm doing is I'm looking at all the DNS events, but inside there's a filter where I'm saying just show me the ones that are making a DNS request that are trying to resolve foo. And we're gonna change this in a second. So what I'm also going to do in the meanwhile is in the terminal here is I'm actually gonna make a DNS request, a specific DNS request. I'll make it to, I have this one already, news.ycombinator.com. So let's try to resolve from within the Kubernetes pod news.ycombinator.com. And I'll just run that. You can kind of see I already ran it before so you'll kind of know what the answer is gonna be. But give it a second and it finally resolves and it tells us that the IP address was 209.216.230.240. But what is really more instructive is come back to the Pixi platform and let's say we wanna trace all of those requests that went to news.ycombinator.com. So we're gonna filter on this and I'm gonna rerun the script and okay cool, we captured a whole bunch of data. Interestingly, there's actually a lot more DNS activity going on than you might suspect. So if we're gonna, I'm gonna sort this by time. And what we see is what's actually happening here, let me full screen this so we get a better view. What's actually happening here is we see initially there was actually a request going to resolve news.ycombinator.com.default.service.cluster.local. So it wasn't actually trying to directly resolve news.ycombinator.com. It's trying to actually add a suffix to it because it's searching inside of the cluster itself. And the answer to this particular DNS query is empty. It's saying, I can't resolve it. There's no such entity. Also, you get to see that when the Kube DNS server got it, it actually made a follow up request to local post. So it's trying to serve this thing internally and is not able to, its answer is also empty. The latency on this stuff, you can also see this was took three milliseconds roughly to do this first one. And what happens is as you follow the trace of kind of all these requests, if you take a look, you see, oh, the next one goes to news.ycombinator.com.service.cluster.local. This is slightly different if you pay attention than the original one, which had a .default in it. So it's kind of trying the second thing in its list. And it's trying to say, well, what if it was news.ycombinator.com.service.cluster.local. And this takes another 1.2 milliseconds and also comes back with an empty answer saying, I don't know. So the Kube DNS system says, I don't know. And then the third one goes to news.ycombinator.com.cluster.local. That also comes up empty in its wasting time. And then eventually it starts trying some different things. It tries news.ycombinator.com.c.pl. They have infra. That's our own pixie internal network. So it's actually now going off to the outside of our Kubernetes cluster and trying to ask for external domain name resolutions. And it's still failing. Wasted another 7.9 milliseconds there. And then it makes more requests. It makes another one here to news.ycombinator.com.google.internal. That also fails. And eventually it gets down to saying, well, let me just try news.ycombinator.com, the most straightforward thing. And that takes another 23 milliseconds and eventually comes up with the answer. You can see that it finally resolves it. So it's 209.216.230.240, right? And we finally got our answer. So we kind of got all this visibility. You didn't have to instrument anything. We kind of traced the request through and see what it's doing. And as I mentioned at the outset, this is known as the end dots problem. It's the way that the Kubernetes DNS resolution configuration is set up, as it tries all these different things. And there are fixes around it to optimize for this. So there are blog posts on this on how to optimize and get your DNS to not do so many DNS requests so that you get much better performance. And I can send out those blog posts later and you can kind of try them out and see how they go. Cool. Oh, but just to kind of pause you there, before we switch to the next section, we'll be sharing kind of tutorials around this. So that'll be instruction as to how you want to use this script. But first off, I think we heard a lot about this from our customers as well. So Ashwin, is there a scenario recently where something like this would be helpful? And if so, are there gonna feedback on specific visualizations which would help you consume this data in a lower friction way? Yeah, I don't have any ideas on how you can better represent this. I think just surfacing this information in general was extremely challenging in the get go. We did face DNS resolution problems that we weren't able to immediately identify. So just having this information would have been a huge help and able to understand what was going on. Well, yeah, I think one of the things that will probably add to this is some kind of like waterfall visualization. So you can actually see which requests are getting made in which DNS server. That way you don't have to keep like, you're essentially trying to traverse a graph through a table, right? So we can probably make that a lot easier by just building a better visualization. So we'll look out for that in the next few weeks. Cool. We're at 10, 45, a lot of people are joining in most of you are new to this. So we just went through the first demo. No worries for joining late. We will document this, share the tutorial as well. And we just snippet demos, almost demo and kind of post that so you should be able to recap that. So with that, let's just move to the next one. I have a quick question if you don't mind. Yeah, Rupa, go for it. Yeah, sure. When you're doing this, are you able to pull up this table for each DNS query that is going through? Are you recording all of it or are you guys sampling it somehow? No, this is all recorded. This is all recorded. And just to kind of make that point, if we come back to that flow graph, one that we have here that has all the summary stats. So the first one I showed, if you pull up the drawer you can actually see there are tons and tons of requests that we traced. And that other one I was filtering just for Y Combinator. But here you can see every single request that's happening in the system, which IP it's going to, what the latency of it was, whether it came up with an answer or not. Everything is traced. Yeah, so as of most things in Pixie we don't really sample a lot of data. We just make it easier for you to access the data that you're interested in. So we keep pretty much all of it around for at least some period of time. Makes sense, thanks. Cool. So let me go back to the presentation here. Just wrapping up the DNS thing just beyond the look out. We're going to be adding more features to it. So this is really just the beginning. We're going to add more DNS fields so you get even better visibility into what's happening in your DNS requests.