 Hello, I'm Rashid and I'm Shavai and first of all sorry for keeping you waiting because we have just been juggling around today with a bunch of talks, so I hope you're able to gain from today's talk. So in the next 30 minutes, we'll be talking about Prometheus in the ML Ops lifecycle. We want to talk a bit about things that have worked for us using Prometheus. Both of us have been working closely with machine learning things that have worked for us and some of the things we'll talk about monitoring especially in the context of machine learning. So I'm Rashid, I'm a student at the University of Toronto. And hi, I'm Shavai, I'm a contributor and maintainer at Layer 5 which is a service missed community. So when we are talking about monitoring, the traditional software monitoring, a lot of you might know about monitoring in softwares or you might have done it a lot earlier using Prometheus or maybe some other tool. But if you don't, usually monitoring in the context of software includes that you think about SLOs, monitor if your SLOs are being made, monitor system failures and so on. There's a lot of things that monitoring means. And if you talk in the perspective of machine learning, because today's talk is all about ML Ops. So if you take a look at this diagram for the ML Ops or the machine learning lifecycle, there are of course multiple steps right from getting your input data and cleaning it up and then using that data to train your machine learning model and going ahead and evaluating your model, testing your model and then finally productionizing it. And post the productionizing of a model you're also looking to always test your model from time to time to ensure that the model performance does not degrade as the model is being used in production. And post deployment you're also continuously monitoring the model and all of these are steps that are part of the ML Ops lifecycle and that's where specifically when comes to the monitoring and observability side of things, we'll see how Prometheus plays a really key role. So we want to talk about monitoring in the context of ML. Sure. Would you like to talk a bit about that? Sure. So if we take a look at the important use cases as I mentioned that once you productionize your model, there are a lot of hidden variables that might proper because when you're working with the production data, there might be cases where you might run into its cases where the model does not perform. Then over time, there's also a possibility that the data distribution on the basis of which originally your model was trained on that might shift. And we basically will be explaining two different ways in terms of the time and in terms of the data drift. So we have particularly two terms that if you are in the ML ecosystem, concept drift and data drift, a couple of items that you'll come across from time to time. And there could be situations where a model might not be configured properly. So the idea is that model will still make a prediction, but of course it's not always going to be making a good prediction. That's what we're also trying to monitor from time to time that the prediction does not go like the prediction rate does not go down. So two things I particularly want to mention is the aspect of different challenges. The monitoring in context of machine learning and what we want is pretty different from how you might have been using Prometheus earlier. So these are the different challenges I particularly want to highlight on. And so the ideas are a bit different here. What we want to achieve is a bit different than traditional software. And I also want to highlight that the system goes through properly. There is no error as such. You don't have any system failure. You don't have any SLO failure, nothing at all. The model still makes a prediction, but you're still doing something wrong. It could be you are still doing something wrong. The model does not make sensible predictions. So this is also a very interesting use case. Everything works perfectly. There's no error, but you still need to be able to monitor in those situations. So the ideas are a bit different here in context to software. So what we want to do is not just monitor system metrics and resource metrics, which we already do. We introduce a third term over here called model metrics, which is trying to figure out all of this. The model is still giving you predictions, there is no error, but you have to figure out something, make monitoring work just by getting the model metrics. So that is what we want to do. And in this presentation, also, we'll particularly be stressing on model metrics part because system metrics, resource metrics, you might have been doing that since a long time or some other talks even before us, some other talks even before us do cover this. So we want to talk more about the context of ML. So we have more in this presentation, we'll talk about monitoring model metrics. So two of the main things we want to monitor, and just to give you context or give you an example of why you might still want to monitor if your system works well, there is no error. So it could happen that environment changes affect the model. So for example, there could just be a different environment from when you had created a model and now when you're deploying the model, the environmental variables are all different. In that case, your model works well, your model works, but it does not work as you expected it to be. It does not give you the right predictions. There could also be a change in data distribution. So the thing we want to do with Prometheus in the ML Ops lifecycle is have a model trained on some data and then apply it and then make predictions from it. Now a lot of you might tell me that's not exactly what we want to do. That's a very simplified way of seeing it. You would also iterate your model and do all that, but let's just stick with the simplified definition for right now. And one thing that could majorly go around and still have your model work, but we want to get that its case in our monitoring aspect is that the data distribution on which you trained your model is different from the data distribution on which your model is working. So so think of it as the data on which your model is making predictions should be in the same distribution or it should be in the same likelihood of distribution as your original data. And there are multiple mathematical models to measure this multiple ways you can do this and will not go into the machine learning specific aspects in this talk. But these are two ways very popularly known as concept drift and data drift. So we also want to monitor these. We'll think of them as edge cases. We still want to monitor all of this. So can you talk a bit about how Prometheus comes into play? Sure. So this is where like, you know, we're talking about Prometheus. So so far what we saw was that the system metrics are something that probably was already covered in some of the earlier talks and some of you who might be Prometheus users might be already doing that. So this is a quick sample of how you could potentially use Prometheus because it's one of the most promising tools to be able to monitor your model performance and also the model health. So you can use a combination of both Prometheus and Grafana and that's what we're showing in today's demo as well. So primarily what we are doing is that Prometheus will allow you to scrape your data. So even if you're having a time series database or times data, that's what you will be able to capture with the help of Prometheus. And then you'll be able to see the Prometheus logs and using that at least you'll be able to see how does the model performance vary over time. And that will give you some insights to understand whether the performance of a model is actually degrading over time as you kind of analyze these logs that you get from Prometheus. And of course, over that you can put in some alerts with pay-through-duty or email and then you can also go ahead and view them on a Grafana dashboard that we'll be also covering in today's demonstration. So it makes it easy for us as a data scientist or as an MLops engineer to integrate Prometheus to see how the model performance works over time, especially because we might have some edge cases that might not be considered when deploying the model initially. So as part of the model serving process, it plays a really important hand in determining how and what changes will you be making inside of your model post going into production? Great. So next up, we'll actually, oh, let's go one slide back. And next up, we'll actually take a look at two demos and we'll get more context for these demos. So we'll try to show how do you do this different kind of monitoring or monitoring in a different context, like we talked about, monitor all the aspects that we talked about. So Shivai, would you like to start out with a demo of showing Prometheus on Seldin? Let me just quickly connect. So this quick introduction about Seldin itself. So Seldin is a bunch of tools provided to you. It's an open source tool specifically for model serving, model performance, model monitoring, model deployment. So it works very closely with Qflow. So you can definitely check it out. But what we are going to be covering is the demonstration for how you can use Prometheus. So it's going to be a little difficult for me to kind of stretch to go to the screen. Let me actually try to monitor. Let me try to just go ahead and mirror my screen just once again. I guess now it should be better. Yeah, perfect. So what we're going to be covering is an example for being able to view the Prometheus metrics with the help of Seldin. So some of the prerequisites that you will require in case you also want to use that you have to install the Seldin core. So for that you need probably a Kubernetes cluster. You can use kind if you're running it locally, or you can also use services like Civo or GKE or any other cloud provider to just have a Kubernetes cluster running. So in my case, and I'll just open up my terminal as well. So in my case, I have, and let me just quickly show you my cube config. So I'll just go to my.cube, all right. So if you're able to see that right now I'm running my Kubernetes cluster, it's a GKE cluster. So it's running on Google Cloud. And what I've done is that I've already configured my, and I'll just quickly show you my cubectl parts as well. So let me just go ahead and do that. While he does that, what's the right way to say cube CTL? Anyone wants to go first? I say it as cube cuttle. If you say it another way, please don't hate me or hate Chewbacca. Go on Chewbacca. So as you can see that currently I have a number of different services that are pre-configured in my cubectl. I hope that everyone is able to see the screen, right? Yeah, so you can see that I have, of course, Istio, because Istio Gateway is used for monitoring and kind of like looking at all the traffic that is going on. So we have an Istio Gateway. And apart from that, we have a bunch of Selden related, like, you know, pods that are running. So we have the core Selden. And then we have a few monitoring ones as well. So these are essentially what is running. So we have a Selden Prometheus operator. So if you want to go ahead and run any particular command and run any machine learning model, you can use the Prometheus operator that works on top of Selden to monitor all your requests. And you're able to see that in a dashboard. So since I already have my monitoring running, you can see that this is my dashboard Prometheus dashboard. And over here, so far, I have just run a few sample requests. So the idea is that if you follow along this documentation and we'll share the link for this documentation, first we have the setting up Selden core. And then we install the Prometheus operator and we set it up with Selden. And once we basically do that, then what we're going ahead and doing is that we are just deploying this example model that you can see it's just a simple model that echoes. And you can kind of then see the results as we will run this. And as you go ahead and check this particular docs out, you'll see that how, as you let's say from time to time, you run your machine learning model, you'll see the changes coming in this live time series data. So over here, I already have one of my windows open. And I have a call request that is heading to my endpoint, which is running on localhost 8003. So I've used port forwarding to kind of run it locally and you'll be able to see it in that GKE cluster as well. So as I run this, you see that it run it successfully. Oh, okay. And now let's take a look at our Prometheus dashboard. So let me just quickly refresh this. So as you can see that we have now a new request that has come up as compared to the last time. So we were up to 12 requests and now we have the 14 requests. So this way, what you can do is that, of course, since we are using like, you know, you can put in any query into your Prometheus dashboard to kind of then also monitor to get a somebody version. In this case, what I'm doing is that I'm looking at on a per a second request basis that how is my model working, right? So how many requests have been sent so far? But you can also create your custom dashboards as well to kind of monitor the actual live prediction rates that are going on for a particular model. In this case, the example that we showed is a simple model, but you can run it on any model that you would want, a scalar and TensorFlow PyTorps based models. And this we can monitor your live machine learning performance on the Prometheus dashboard. And you can further expand this by also, like you know, going ahead and creating a Grafana dashboard to kind of monitor your logs that we'll be seeing in the fast API demo that Trishet will be showing to all of you. But yes, apart from this, I also wanted to quickly showcase another one. So how many of you are aware of machine learning pipelines or might have heard about this term ML pipelines? Anyone has previously probably heard of Kubeflow. So there are a number of different data orchestration platforms like Kubeflow, Flight, and which allow you to basically take your entire MLOps cycle and then convert them into specific workflows. So you can think of like, let's say the model training as a separate workflow. You can have your model testing and model deployment as specific workflows, right? You can basically dividing them into various tasks. So another example that I would just want to quickly showcase is with the flight. So over here, basically, let me go back over here to the docs section. So, and I'll just quickly search for Prometheus. So as we showed an example for Selden, if you were like, let's say using Flight for an example for being able to run your workflows, machine learning workflows. So Prometheus is directly embedded into your machine learning workflows with Flight. So you get metrics out of the box and then you'll find that there are some published dashboards to monitor the flight deployments. Again, this is related to your MLOps with the help of Flight by using the Flight workflows. So you can take a look at this as well. And again, we'll be more than happy to share some examples. But what we just want to showcase through these examples is that today, like Prometheus plays an extremely important role in the MLOps lifecycle. So if you're looking to embed that, you have a number of different integrations out of the box that you can set up locally in your systems and start to monitor the performance. And we'll also see an example for a native fast API that Trishit will be showing you to all of you. Okay, so I'll show another demo and let me get set up. I think it's up now. Yes, it is. So now we'll take a look at a fast API demo and the idea of these demos is actually to show two very, very popular ways of deploying machine learning, of deploying machine learning or doing MLOps. And we just want to show you how we have integrated Prometheus into that. But again, you're not limited to what we, of course, show in the demo, you can work it, make it work with other platforms. We just want to share how we have done it with these platforms and apply these ideas to other platforms. So we will take a look at fast API, which is probably one of the most popular ways to deploy a machine learning model as well. And so setting some context for this demo, what we want to do is create a REST service to expose the model, not something we'll be doing in this talk. Let's just say it's been done for us. What next we want to do is instrument the server to collect the metrics, which might probably be exposed by a separate metrics end point from the REST API we have created for our model. And we'll use the Prometheus fast API instrumentator, which allows us to collect metrics from a fast API deployment. What we'll also be showing will be how to show the data distribution your model is right now working under the data distribution the model was trained on. So essentially all the aspects which we just covered in the slides. So we'll also be showing that. And then of course we want to use Prometheus to collect and store metrics. And we'll just add a layer of Rafaana to visualize the collected metrics, not something we'll focus a lot on in this talk, but we do have the code for that and we'll just be showing it. And if you want to try this out for yourself, so especially some of the things like data distribution, what data distribution does your model operate under? It's pretty hard to see with a single request. So we also have some code, I've actually been, so it's actually, I've been running some code since morning, which is essentially using locus to simulate requests, hundreds of requests, it has been simulating since the morning. So that will also allow us to kind of see some of the other aspects of machine learning in action, which we particularly wanted to see with this talk, which we particularly wanted to show with this talk, sorry. No, not thank you. Let's go to the demo. Yes, not yet. Great. So let's start with the model. We don't want to talk a lot about the model, but we'll start with the model and then work our way up. I'll directly start with the Dockerfile. I have a fast API deployment up and I also have a model up. My model is trained and I have a Dockerfile up to make an image out of it and now I'll be using this fast API deployment. So because we don't really want to talk about training a model and all that in the demo. So yeah, so then what I want to talk about is the dash, then what I want to talk about is the model.yaml file. We have a deployment here which makes a fast API for our model, which is pretty standard. And we also have a service here. The service allows us to expose the rest API and then use it. So those are two of the things we are doing and then we also have a service monitor. So of course, the service monitor should be in the same space, name space that Prometheus is running. So Prometheus can automatically get and collect the data from it. So that is what we'll be, so that is another service, another resource we have. We'll also deploy a service monitor. So okay, where are these three resources deployed? So now of course you can use something else. I'm just using Civo because I've been using it. So I've deployed, I've made a Kubernetes cluster, deployed the resources I talked about, the rest API, a service monitor and a service. I've deployed all of that on this Civo cluster. Of course you can use whatever cloud you like but just for this demo we'll be using Civo. So with that, let's go full screen and okay. So we have our model.yaml file and then we actually have the dashboard itself. So some of the things we'll be doing in the dashboard to make sense of the data given to us by Prometheus. For example, is identifying what distributions is data falls under and to do this we'll actually use, and we'll actually leverage some of the functionality by Grafana to do this and which is pretty simple. You are making a heat map out of it and then you can simply compare it with the original distribution the model was trained on. So this is one of the things that we are doing on our Grafana dashboard. So with that, now we have set up a rest API deployment for our model. We also have a service monitor and I've already deployed Prometheus. I'm using Prometheus stack here. So I've already deployed Prometheus for us in the cluster. One of this is pretty intuitive up until now. This is the deployment part, the part which was in I just showed that on the screen, but at least deploying all of these is pretty intuitive. So that's what we have done until now. We have also created a dashboard using the model.json, which is what Grafana will be using to make sense out of the metrics collected by Prometheus. So I already have, so I'm just forwarding port here to get the Grafana dashboard and let's actually take a look at the Grafana dashboard. I particularly want to see one of the demos. This is, all of this is, by the way, open source. So you can feel free to try out everything, see how we have implemented other things, but right now the part which I talked about, which distributions are the request under, is something we'll be showing, but we also have some other things which we'll show in the Grafana dashboard, but not go into everything right now due to time. So another thing before, so we have deployed all of this until now. What we have also deployed just to better understand is locusts. So let's close this and let's open our load test. So we actually have a locust file which is another pod we are running to make continuous requests. So we can just understand and see some data on the Grafana dashboard, see the heat map being made for different distributions. So that's what we have until now and let's actually go to our Grafana dashboard which I was talking about and you can actually see requests have been coming in since quite some while. It's actually on since morning and this shows what the model score is, especially the model prediction distribution. All of these are like random requests I'm sending. So I can see the different bins, all of the data is in. I can see the different bins the model has been predicting the data is in. So as we were talking about in these slides, this is something pretty useful for machine learning use cases, which is why we want to see it. So some of the other things we, so this is of course what we would call the model metrics and then we also have the service. So in the model metrics, we are also showing model score. So model score is, so the model score is essentially a way to say how good my model is performing, get an estimate of it. Of course we haven't seen the data earlier trained on it, but it at least allows the model score at least allows us to get an estimate of how this works. We won't be going into the mathematical models right now. We just saw it for model prediction distribution, but all of this is open source. You can take a look at it. Okay, so with that, so those are the model metrics we have also, we have also monitored service metrics and resource metrics, which are pretty straightforward. So I'll not talk a lot about them, but with that demo, I come to the end of my talk and this time for real, thank you. And just to kind of summarize one last point is that you can use Prometheus as a very effective tool in all the different life formats of, during especially the time in your training model and then post the training when you have put it into production to view the live metrics. And then of course monitor the overall health of your machinery models. So yeah, if you have any questions, we'd love to answer them now. Yeah, if not right now, then you can always find both of us on Twitter. And thank you.