 So just an overview because there's not a single topic. We both work on Big C Grafana plug-in to just make general improvements Cartiq worked on the operator auto recovery feature and I added replica sets and deployment metadata to The metadata that this year can display So what is the Grafana Pixi plug-in? This plug-in allows Pixi to act as a data source to send data to Grafana and visualize that data on Grafana's UI and Grafana is one of the most used dashboarding services on the market and having Pixi as a data source for that would generally improve Pixi's use in that field and the existing plug-in functionality provides a single cluster ID and an API key to query data from from a single cluster on Pixi and Users would have to input pixel scripts to visualize that data collected by Pixi So this is kind of how the old panel looks like and as you can see that there's there's an editor where users type in whatever pixel script they have and The resulting data would pop up in that panel right there and this one appears to be a time series Yeah So some of the problems from that design or that there were no predefined scripts user users would have to manually type out pixel scripts for whatever kind of data they wanted to visualize and Another issue was that there was no column selection or a group by feature that Pixi has but users that use Grafana Wouldn't have those features in their UI Yeah, also as you saw here, there's no code highlighting So if the script is more than like five or ten lines, you'll get lost and no lines So if there's an error you can't actually find where it is Also no dashboard variable So for example, I'll show it later But if you have a dashboard which for example displays the pod information and you have five different panels you can't actually change the pod very easily because You have to go in and script perhaps and it doesn't update the whole dashboard So that's something we added as well Also, there was a need of predefined dashboards like Like all the general pixi scripts like cluster script pod script and everything like that Users usually don't want to pre-define those Manually because that's a very general use case. So we want to supply that to them and also as a demo for our Grafana plugin there we made a Kubernetes deployment so basically you can with a few commands you can deploy a Managed Grafana instance and see the plugin in action right away So I believe we are moving back to Moving to the demo So yeah Nice So this is how like the new panel looks like as you can see it Aesthetic looks a lot better than the previous one and there's multiple options. I can So users can choose from pre-written scripts in this case We have a bunch of different scripts already pre-written for a user to just select from and Depending on the script that they choose the relevant columns would be displayed and users can toggle what columns they want to actually see and Depending on the script if it's a tabular one, they can group by certain columns. So in this case, I can group by timestamp and The visualization would change appropriately and they can also aggregate it so they can choose a column that they want to aggregate by in this case you can do that and you can choose a function as well and Well, I guess it takes a while So now you have your relevant data that's being popped up and you can add More columns and functions to group by as well The entire idea is that users would have an opportunity to modify the data They're visualizing without actually touching the pixel script and just playing around with the different select set user hats in front of them And using this new functionality We also pre-made a bunch of different dashboards and I'll just demo service dashboard So this my demo cluster was the pixie sock shop front-end pod front-end service This dashboard pretty much just replicates what we have in pixie UI, but using Grafana and all of these Panels need to be defined with a separate script And you can like change what service you want It's automatically pulls the data from our viziers. And so you don't actually need to go in and find the names so for example here you can I don't know. Let's do Kelvin service and it just updates the whole dashboard. It takes a while, but Do do Well, it seems like there's no information at all about Kelvin, but as you see pod updated maybe there's some Belt and traffic. There's a lot of slow requests and As another example is a pod Dashboard which has information about the pod you can do the same thing here as Like what pod you want since it's slow and do that has container list everything basically that we have on the UI it has it here So as you can imagine this would drive the Usage of our plugin in Grafana by a lot because instead of just having a simple Text editor, which is not really even an editor. There's no code highlighting. No nothing So people probably wouldn't be able to use it much was much comfort, but now it's way easier just as a Concept I did myself and it was quite enjoyable, which is a good sign that we did a pretty good job with this And then we'll move now. We'll move on to some of the work we did after we finished with our Grafana improvements So the next thing I worked on was having the auto operator auto recover in failure states So if I ask what does the vizier? Operator do so the vizier operator deploys vizier into your cluster and manages vizier and Monters the state of vizier and it can also restart different pies based on their failed dependencies So it's kind of a diagram which shows how the vizier operator interacts with different parts inside of the cluster So once a pixie user deploys pixie into their cluster you have your vizier operator and vizier and and the vizier operator monitors vizier and its different other components such as gnats meta data pams cloud connection and control plane and It periodically tries to figure out what the state is of the different components within vizier So some common vizier states that results in unhealthy clusters Which I addressed and helping auto recover were the gnats pod failure and a persistent volume PV Failing to mount so the current way of fixing those these two issues were to basically delete your cluster and redeploy it and That seemed pretty tedious and instead the operator could go in and do that work for you So the waiter auto repair a gnats pod failure would just to have the operator Monitor when the gnats pod fails and simply delete it and the way to repair a persistent volume failure to mount Would just to make sure just for the operator to monitor when the PV claim Fails and then switch to an xcd backed meta data store instead So this diagram kind of shows how this you functionality plays in with the entire all of these other systems So the main difference is that the one the vizier operator receives a status from vizier If that status is like a failed state that failed state would go and trigger this functionality Or this function would try to see whether or not this failed state is something that I can repair if they can repair it they'll repair it and I'll update the status to send it back to the operator and Hopefully the cluster would stay alive All right And I worked on some meta data service updates So the motivation is that the deployments and replica sets are some of the most common resources that people usually use in their Kubernetes deployments and since we Don't support supported it yet before I added them to our code Users weren't able to like see the state of their deployment for example, and they didn't have dashboards to see that So I'll just give a quick like overview of how the meta data service works Because I don't think like not everybody here knows how it works. So it's just kind of an educational thing So Kubernetes API can generate different events like for example a pod update or maybe deployment update and Our service can subscribe to those updates and get them whenever something that changes Then the service processes the update from Kubernetes and then sends the update to storage so we can have some fault tolerance if Pam's and Calvin's fail we can just pull in the updates from the storage and have long-term Metadata update and savings and also the metadata service sends it to Pam's and Calvin's so they can So they can query the updates and display them when you run the pixel scripts In the Pam and can but Calvin's the updates are stored stored in the internal state And how we expose it to the users is through the UDFC PI Basically, you can say give me replica sets give me deployments and the Pam will query Pam or Calvin will query the state and will basically just display like the name of replica sets or replica set ID or deployment and Pretty much how it works just a pipeline and Exposure through UDF API Anyways, if you're interested of how the update might look like it might have like a name ID Start stop time the replicas in the rep in the replica set updates Maybe some conditions that they might have and then you can query that from the UI and here's an example of UDF For example, you can query replica set from the context or get the status of replica sets using a specific function Okay, and I also made a few dashboards to actually see if it works. So the first one would be Dashboard ways replica sets. So You can see that you can query like the usual how many pods are ready in the replica sets the names of their replica sets you can see perhaps which Replicas sets are performing the slowest in your cluster example here the front end is being the slowest one for example You may wonder why This specific replica sets use it uses over a terabyte of present set size Yeah And There's also dashboard for specific replica sets. I just decided to look into the front end replica set because that seemed quite interesting. Maybe why that's a It's so peculiar And then there's a dashboard for deployments, which basically very similar you can see the Like how many pods are ready up to date available and then you can see the names you can see the whole deployment throughput the Air rates and so on. So basically this will allow you to Have a status of your deployments as well as maybe pods and services as we had before So That's it. It was the demo and was our presentation. Thank you so much for tonight