 Perfect. So welcome everyone. Thank you so much for attending our lighting talk. We're really excited to be presenting today at K native con and it's lighting talk on building machine learning inference with a K native service infrastructure and architecture. So without wasting any further time, let's get started. So quick introduction. Thank you so much for doing that already. But I'm sure I'm a research and I primarily focus a lot on rust and web assembly and machine learning inference with MLOps. I'm rich. So I was in high school. I just completed it. Okay. All right. So first, let's go over the standard infrastructure through which we operate and create a machine learning tech stack, right? So of course, you'll have a general MLOps life cycle where you have your model preparation of your data, then you go ahead and train a model, you evaluate your model, and then you go ahead and like, you know, productionize it. And then you're also running a lot of things such as from this metrics to be able to monitor your model post the deployment and when once it actually goes into the production. So there's a lot of things that goes into not just looking at once it has been like, you know, in the production, but there's this entire tech stack that you'll need to use in order to evaluate and, you know, like, you know, understand how is the machine learning inference going ahead and so basically that involves deploying your machine learning model scaling up your model and then also updating your model in case you require to change certain parameters. So you can see over here this kind of the standard tech stack where you can refer to any of the different types of machine learning model from coming from TensorFlow, PyTorps, or SQL and then you'll need all these things such as like, you know, Kubernetes, Istio for traffic management and K-Native which is the serverless infrastructure for being able to run a serverless functions. And if you take a look at how machine learning deployments are done on Kubernetes, you need to, you need expertise in all of these different things, right? That means being able to configure your HTTP and GRPC requests, then deployments and how you'll define your YAML files where you'll be defining your services, your pods. And then you'll also need persistent volumes for any kind of model updates and model changes and also of course where you'll require to save your model files themselves. And then you'll need things such as CPUs, TPUs, or GPUs for doing that inference on the go and you'll need dedicated model servers. So this to kind of deploy your machine learning model on Kubernetes, you'd require, you'll basically need to set up all of these things. And of course it can take a lot of time and especially if someone who is just getting into the machine learning world, right, were primarily focusing on using, let's say, data science. As data scientists, you don't want to really focus a lot on all of these, right? All of these are primarily focused on DevOps. So that's where we are going to be like, you know, going ahead and take a look at how we can add a layer on top of our existing inference stack and how we can reduce all of these dependencies that we just saw with the help of cave serving, which Rishith will talk about. Thank you. So I'll be talking more about cave serving, which is also bundled with Kubeflow and what you can do with cave serving is simplify this process of deploying your machine learning workloads easily on Knative. And both of us have actually been contributing quite a lot to Kubeflow and we have been using cave serving for quite some while. So some of the things that cave serving allows you to do right off the back is brings all the pieces together. So you have an intuitive and consistent environment. It's simple enough to deploy a machine learning model without having to worry about all the logistics or how something like scheduling or hardware would be given to run the machine learning jobs. Which is also a big problem at designating hardware for your machine learning jobs. What kind of hardware do you need? What what kind of processes in your model need? What kind of hardware? You might need to use a mix of GPUs and CPUs in a way that a lot of your data processing or data ingestion happens in the CPU. And all of these are problems even when you are running the model inferencing with the model, not just the training part where it is pretty apparent. So all of these are definitely problems when you are inferencing with the model as well. And what we'll try to do is because this is a short talk, I'll just try to give a quick overview of how cave serving works. And I'll also share a couple of demos which are pretty easy to run, but we'll not run them right now due to time constraints. So we just show a sample example of running a scikit-learn model and that's as basic as it can get. But you can also apply the same ideas or how are cave serving to pytorch models or TensorFlow models or to probably more complex models where you don't use your where you don't use custom where you don't use prebuilt layers at all too. But right now we'll start with a simple example. So we start with a scikit-learn example and I want to highlight some things because we don't have time to go through all of this. So some of the things this allows you to do and if you see this is very similar to the experience you have running workloads on Kubernetes as well. So many of the ideas are pretty similar to that as well and I just specify the minimum amount of hardware my model would need. I also specify the number of replicas I need and I also have the canary deployment options and all of this in a simple YAML file. So now you can do all of this with Kubernetes as well which is interesting right? If you can do all of this with Kubernetes as well in some way then why do you want to bring cave serving into the pytorch and why talk about all these YAML files and a new tech stack over there. So a lot of the things that I talked about even if you see the previous example they were machine learning focused ideas which you would have to go through a length to implement them with Kubernetes. Some of them something like replicas you can do it very easily but some of the machine learning specific ideas which are probably not very applicable for other kinds of softwares those are pretty well supported with cave serving. And this is also an example of using TensorFlow or pytorch and all I want you to see from your because this is quite some code is that we are simply changing the models we have over here. And another thing I showed is you can make canary deployments as easily in the context of machine learning with this and this is an example where I do it while also changing some of the things for the canary deployment customizing how I want my deployments to be. So I would you like to talk a bit about explainability sir and what you can also do is that when it comes to writing your models and then once you of course we are talking about inferencing. So another part of the inferencing is also the model explanation. You want to describe what what exactly does your model do and in terms of once you start to infer you can do that with the help of alibi explain which is by seldom. So q flow and the case serving was built with Google engineers Bloomberg and seldom as well. So you can use alibi explain to understand what are the model pattern model metrics and how does the model perform when you run these inferences. And you can also do post processing and pre processing of your model to understand more in depth about how the model behaves under a given load of data is being sent to. So all of these can be defined with the help of these simple email files with the help of case serving. So what you're seeing is that it supports multiple types of machine and models. It is able to do model explanation. You're able to do post processing and pre processing of your data with this and of course not having to worry too much about selling up up all of your infrastructure that you'll normally require with your communities architecture and then to kind of summarize. Okay, so we are almost at the end of a talk as you might have seen the time. So so we didn't go a lot into depth of how kf serving works. And we just wanted to give you a quick overview of what kf serving does and how it can be integrated. If you want to deploy machine learning on K native and we also talked a bit about explain ability which we think is pretty interesting how easily it can be done with kf serving. And some of the aspects of kf serving. We also talked a bit about canary deployments and that was what we did in the talk. So just going to summarize the ideas that kf serving has been built on top of k native framework. So where k native allows you to create these service functions and for your machine learning inferencing, the kf serving adds a layer on top of it to kind of give you an ability for the machine learning. Ability for very easy deployments of your machine learning model and infer from it with all these different functionalities that are provided to you with the help of kf serving. With that we'll conclude and thank you so much. You can connect with us and we'll be open to any questions. Thank you.