 Hi everyone. Today I want to tell you about some of the latest things we've been doing at Intuit, having to do with machine learning and analytics at scale in Kubernetes. At Intuit we have a major initiative to continuously improve operation excellence, particularly the reliability performance and cost of all of our systems. Our strategy is to leverage technologies like observability, analytics, progressive delivery, and most recently, generative AI in order to accomplish this. And the goal is to enable systems and our operations people to make better decisions and to automate more. What do we mean by better decision making? Well, basic questions like, I have a lot of applications running on my cluster, how much memory and CPU do they need? How do I auto-scale the application so that I don't have to manually configure the memory? When I do a deployment, how do I know the deployment is good and I should roll forward rather than roll back? In order to make these kind of decisions, you need data, right? You need a lot of ideally real-time data, and then you have to analyze the data, and then you have to derive insights from the data in order to take action based on the data, whether it is to make better human actions or to automate these operations. So the approach at Intuit is to leverage Argo's ability, which is a feature that was contributed by Intuit about a year ago. Also Argo Royals, particularly with custom analysis steps, because the key to getting the full power of Argo Royals is by using custom analysis steps where you analyze your data and make the best decisions, right? And also a new project, which we have just released a 1.0 GA version of, which Intuit is contributing to OpenSource called Numaflow. And this is the key analytics, runs the key analytics pipeline and allows in-cluster analytics in order to automate all of this in your cluster without having to send the metrics out of your cluster. So first of all, what is it? Well, if you go to the Argo CD tab, you'll notice that there's a tab called metrics. And that allows you to access any metrics you're currently sending to Prometheus in cluster, right? So whenever a developer does a deployment, they could click on the metrics tab, see what the metrics look like for that deployment and whether it was a good deployment or a bad deployment. Argo Royals, obviously, how many people have used Argo Royals? Well, quite a few. When I asked that question a few years ago, there were very few hands raised, but today it looks like about 50%. And you may also notice, I think it was CircleCI, they recently started supporting rollout as a default option. And some of the user comments or things like, hey, really love it. We can now really deploy with confidence. We used to be scared when we deployed, we're less scared now. So people are liking the automated failure detection and rollback capabilities that Argo Royals provides, especially the custom analyst steps. Okay, NUMAflow, what is NUMAflow? NUMAflow is a Kubernetes-native real-time streaming and data processing engine. It runs on Kubernetes just like Argo workflows. It provides exactly one semantics. So when you get events, no event will be duplicated or lost, even when pods are going up and down, even when your cluster is autoscaling, even when your cluster is being upgraded. No matter what's happening, exactly one semantics. You create the pipeline, it runs forever until you terminate it reliably. And all vertices are steps in the pipeline autoscale. You don't have to say how many pods to run. It will autoscale from zero to whatever you need. And we've used this technology, NUMAflow and Argo Royals and so on to implement progressive delivery using machine learning as part of this whole process. We use NUMAflow to curate the data that's needed to train the ML models. We use it to actually train the ML models. We use it to do the in-cluster inference on the cluster itself. So if you look at this extended Argo CD pipeline, you can see we're using Argo's ability to show the metrics. It can also generate anomaly scores. And then you can make Argo in a row for row back for decisions based on it. But also on the bottom, you see there's a summary of the Kubernetes errors, logs, and events. That summary is being generated. You could either use the open AI model or in our case, we trained our own model, a much smaller model, a 300 million parameter model, far cheaper to run than something like open AI. And if you train it properly, it works just as well. And we just run that in-cluster. You don't have to go outside the cluster. It uses data from in-cluster. It trains the model in-cluster. It does the inference in-cluster. You don't have to rely on any external resources outside the cluster. Okay. Why did we create NUMAflow? Well, actually it was born from the Argo workflows and events projects. The initial creators of NUMAflow were actually maintainers on the Argo workflows and events projects. We noticed that a community started having use cases that were more closer to event processing. They're getting a steady stream of, in many cases, high-frequency events. It was just inefficient to spin up an entire workflow just to process one event and then destroy all of the pods. That also creates a huge load on your SCD service on your Kubernetes cluster. You do that too much. Your cluster can grind down to a halt. This allows you to configure a set of pods that run forever. They scale up and down as necessary. Very efficient for high-stream event processing and completely reliable. With Argo workflows, you know certain pods die or something. You can screw up your workflow. You have to maybe manually restart it. You may lose events with NUMAflow. You never lose events. You don't have to scale anything. It does it all automatically. Okay. So that's the thing. I hope you chat by our booth. You can learn more about the technology, watch cool demos, and participate in Turoa's photo booths. To get a commemorative photo taken, please enjoy the conference.