 So we've been focusing on adding more protocols so you get more out of the box observability using Pixie. And we've added two new protocols that we wanted to announce today. So I'm super excited to say that we now have support for Kafka and NATS. Both of these protocols are essentially message-based systems. So you can post messages kind of in a pub sub kind of way. And then you have consumers which can can read the messages off. And these two protocols work the same way that we trace all the other protocols that we have inside Pixie. So this is part of our EBPF-based approach. You don't have to instrument anything. You don't have to change any lines of code. You don't need even to redeploy anything. We just automatically EBPF to snoop all the traffic and detect whether that traffic is Kafka or NATS. And when we detect that we automatically start tracing it for you. And then we surface it up to the Pixie platform so that you can see all the good data. Before I get into the demo, I just want to talk a little bit about what it is that we're actually tracing kind of under the hood. So at the bottom of this slide here what I have is a screenshot of some of the raw data that we're actually capturing. So this is the data table that that drives everything else inside of Pixie. And so you can see that each row is kind of an event that we've traced. And for every event we have a timestamp of when that message occurred. We have a source and destination. So which pod did it come from? Which pod was it going to? And in this case since it's Kafka traffic we have a command. So you can see that there's some fetch commands here happening. Fetch is when it's going when there's a consumer trying to get data from Kafka. There's some produce command which is when a producer is writing data to Kafka. And so you can see all the kind of traffic that's going on. And in addition to that because of the EBPF based approach you can actually get all of the content of the messages as well. So we grab a bunch of information, metadata and actually the body. So you can see some of that in here. Like if you look at the produce request which is the third record you can see that there's in the response there's a topics order. So we know that this particular produce request was just published something to the topic called order. So Kafka organizes all of its messages into topics and this one was posted something was posted to the order topic. You can also see when you click on them you can click on any record you can see more details about it. So same information just zooming in here. You can see for example this is a different example we see a request command that was a fetch. So we've kind of automatically traced that some consumer dispatched a fetch to the Kafka server and then again here we see it's also to the same topic. So if you look down near the bottom you see name order that was the order topic and it gives us all the information like the partition and the message number and all sorts of other goodies in terms of what's happening. So you can debug your applications. Finally on the right we have the request latency. This is the latency at the protocol level so it's like individual messages from the client to the server. And so if there's any issues with that as well you can monitor the latencies. Now while that it looks really like that's great we can trace all the messages the real question is how can you use this stuff to actually debug real issues with Kafka or any sort of message bus system. And so we're going to focus on Kafka here and so we're going to use a demo and I've got a slide here showing what the demo setup is. So we have a simple Kafka broker in the middle that's our message bus and we have a this is an e-commerce kind of website where you can place orders. So there's an order service kind of that's our front end it takes orders from the website and when an order is made it's just going to publish something or produce something to the Kafka broker into the order topic. And so it'll it'll push that there and then we have these two consumers on the right hand side one's called the shipping service the other's the invoicing service and these two services are just kind of constantly polling to see if there's any new orders inside of the the message bus inside of Kafka. And so they're constantly trying to fetch from Kafka and when they get something new what they're going to do is they're going to take some action. So the shipping service would you know initiate the shipping process and then the invoicing service will you know take the order take that information and generate an invoice out of that. Now when you have a Kafka broker when you have an application that has something like a message bus in it one important question that often arises is what's the are my consumers keeping up with the data that's being produced is there any lag in my system. So for example if the producer creates a message with offset seven offset is Kafka's terminology so just think of it like a message number seven and it pushes it to the Kafka broker right. Ideally the shipping service and invoicing service immediately pull that same order number and start working on it right they pull it out of Kafka and start working on this order number seven. But if there's any issue in your application then you might see a lag and it's important to know if any of your services are running behind and so this is often a question when you have a message buses is are my consumers lagging are they running behind and this is sort of so the demo we're going to go into is going to try to answer this question and we'll see how we can use Pixie to answer questions like this. So with that I'm going to switch so we'll move switch go to go to Pixie so I'm going to go to the homepage here for Pixie and then here's my cluster so we have the main page so I'm coming here to the main page of Pixie and what you see is generally a service map of everything inside your cluster and some other information but in this case we're interested in Kafka rate. So there's a number of Kafka scripts so if you click on the script button here you can search for those by typing Kafka and where I usually like to start with like what I'm trying to just understand what's happening in an application is we have these flow graph scripts for different various different protocols in this case we're looking going to look at Kafka flow graph but the flow graph script kind of gives you a nice overview of what's happening inside of your system and so when you click on that the first thing it's going to tell you is oh you require a namespace so it says what namespace are you trying to to look at the flow graph for in this case our demo is in a namespace called Kafka demo that's our application so I'm going to click that and so immediately what you see is kind of all the Kafka traffic in your in your namespace in your application and so I'm going to so on the right hand side here we have is the Kafka server itself and I'm going to move that here into the middle and then on these other circles are different pods so we have an order pod that was our producer I'm going to drag this over to the right to just make it a little bit more clear and then we have our two consumers which are the shipping service and the invoicing service and they're the ones fetching from Kafka so on the left we have order it's it's writing to Kafka and then on the right we have our two consumers which are reading so this matches exactly what we had in the diagram on the slide and we're able to see that instantly with Pixie that like okay who's talking to who and what's my setup what are all the things that are reading and writing took to the Kafka message bus so I find it a really useful place to start and then at the bottom there's some various different metrics there too that you can monitor and kind of see in terms of latencies and throughput and such but again the question we were trying to answer if you remember is are my consumers keeping up with the producer so as we produce orders are we able to keep up and so there's a different script dedicated for that I'm going to type Kafka here again and we have a dedicated script called Kafka producer consumer latency so I'm going to click on this one and so when we come here first off it shows you the list of producers that it knows about and the list of consumers that it knows about and what we want to do immediately is we want to focus on a particular topic so I'm just going to populate that right now and I'm going to say we're interested in the order topic so we know that from our application that that our consumers and producers are writing to the order topic and then we're going to want to say between any producer consumer combination we want to check for a lag so we'll say we only have one producer in this day so I'll just put producer one that was the name I got from the table here and then on the right we see we have our two consumers so let's just check consumer shipping one right and we'll enter that in and now we have a graph at the bottom so we look at this and it doesn't look all that interesting honestly right it's there's actually data there if you look carefully it's a blue line at the bottom and so what this is actually telling us is that the lag is zero all the time now this is good news like if you don't have a lag it means as soon as the producer made an order and pushed it into a Kafka the consumer was able to fetch it immediately and start working on it immediately and so it's telling us everything is healthy with our Kafka setup and our application is doing just fine right now I'm going to switch over to this other service that we have which was the invoicing service I'm going to type consumer invoicing one here and we're going to check that one and so here we see something different it's not at zero and I'm going to extend the time window here a little bit so we can see what's going on and so what we see here is uh whoa it's not zero so this should be sending off alarm bells right it started off at zero but it's gradually been creeping up over time now what's happened is as I've been talking through this demo my colleague Hannah has actually started some traffic onto our web application so she started generating some traffic to our website and it started placing orders and the shipping service that we looked at initially is efficient so it's able to kind of keep up with the rate that Hannah set up and it was able to process all the orders that were coming in and being posted to Kafka but this other service our invoicing service seems to have some sort of performance issue and it's falling further and further behind right the rate of orders that are coming in it just can't keep up with them so it started off kind of zero lag but then it went up to like two seconds then four seconds and eight seconds and 16 seconds and it's it's trying it's fetching data starting to work on it it's trying to work on them work on them and then you know it finishes that work and goes back to Kafka and says what else do you have for me and then Kafka overloads it with a bunch of more work because it's falling further and further behind and so by the end of this time window over this 10 minute window it's already fallen like 60 seconds behind in its work and this is super alarming in this case right it means we have a problem with our application if we actually come to the application itself so in this tab I have the application it's a very simple application just meant for demo so it's nothing fancy but if you click on shipping what you'll see is all the shipments that have been made and I'm going to scroll to the bottom here and we see it has the last thing it knows about is shipment number 1253 and if I go to the invoicing page and we look at it it thinks the last order that was made is 811 so we can actually see this problem manifested in the web application as well we were able to catch it in pixie but we actually see the the effect in the website it's running behind and if I refresh this page you know it is the invoicing service is trying really hard to kind of keep processing these things it's now gone up to 827 but it's really not just able to keep up with the rate and it's falling behind and this can be a frustrating experience if imagine you were a customer and you were using this website you wouldn't be happy because you're seeing conflicting information in the shipping and invoicing pages so what do you do in a situation like this right so once you kind of catch this sort of issue with pixie you would want to go back and see what's wrong with your invoicing service right there's clearly it's not able to keep up so you probably want to go back and maybe thread it so it can process more orders per second or maybe there's some actual performance performance issue in there that you want to go fix so you maybe want to go study it as there's some bottleneck somewhere that you need to relieve but then you do want to go fix it and make sure that this line goes back to zero so that's it for the demo coming back to the slides we have a bunch of different scripts for Kafka so we covered the flow graph we covered the last one which is the producer consumer latency there's other ones that on your own time you can go take a look at and see if they're useful for you and so it's the sorts of questions that you can answer with them