 So we have a lot of stuff to cover today, and so if I go a little bit too fast, I apologize. I'm going to give you my email address at the end if you have any questions, feel free to ask. So today we're going to talk about end-to-end ML Ops with MLflow and Kubeflow, but you already knew that, and that is why you are here. Otherwise, you are in the wrong room, and you are welcome to go. I will not hold it against you nor will I embarrass you on your way out. So let's look at what we are going to talk about. We are going to talk a little bit about why you should be thinking about ML Ops and why you should even care about it. We will talk about what MLflow is and how you can use it. We'll do the same thing for Kubeflow, and then we'll talk about the two of them together and how you can kind of get the best of both of them by integrating them together. Yes? I would not, and I'll tell you why. Thank you. We are going to be doing some coding here today. So thank you very much, Alex, I appreciate it. So this is going to be part slides and part coding and part looking at interfaces and rather than sort of giving you death by PowerPoint for 20 minutes and then giving you a breakneck demo by, you know, where you can't keep up with everything, we're going to essentially just kind of do the demo as we go through. So all right, now I see people are filtering in. This is great. Come on in. Sit down. You know, we won't embarrass you. It's okay. All right. Let's talk a little bit about who am I and why am I here? My name is Nick Chase. I am Senior Director of Product Management for Cloud Geometry. I am also the head of our AI practice and in charge of building out our AI ML offerings. So I am very interested to hear what your guys' needs are. I come from an enterprise background. I've worked with other companies helping to integrate AI into what they do and also mostly to work with their infrastructure and so on. I am also the product manager for CGDevEx, which is our platform engineering solution, which helps you to deploy Kubernetes and all of the things that go with it to make your life easier so that you don't have to do it yourself. Now, on the AI side, I've been working with AI at ML for a long time. I'm the author of a bunch of technical books, including Machine Learning from Your Immortals. And I also work with the Generative AI Committee Commons models and data work stream. I'm the head there. And I also work with the Education and Outreach work stream. Okay, so a real quick show of hands. How many of you are data scientists? Almost nobody. How many of you are developers? Excellent, because that's what I was expecting when I put this together. So we should be in good shape. Perfect. To you data scientists, I apologize if I say things you already know. Well, just forgive me. All right. So let's talk a little bit about what MLOps is and why we should care about it. So as developers, you're probably all familiar with DevOps because you have not been living in a hole. So you know that it involves keeping track of the code, making sure that you do automated promotion with testing, and so on. It's a secure supply chain. And it also involves taking care of things both before and after your application goes to production. So before, obviously, when you build it and after making sure that while you have the shift left that takes care of, while you have the shift left that takes care of doing planning and all that stuff, come on in and sit down. You don't have to stand in the back. It's all right. And then there's shift right, which means that you should also be looking at things like monitoring to make sure that your application is still up. Is there something wrong with that? Is it taking down other people's applications, which is always a bad thing? Now, but at the same time, MLOps is not just DevOps for machine learning. For one thing, it is not just code. It's code and data and models. Oh, my. There's a lot more involved than just the code. In addition, we have to think about things like regulated industries and scientific pursuits. These are situations where things must be extraordinarily precise, particularly when it comes to things like provenance of the data. If you're training a model, you need to be able to know where it came from, where did that data come from, and how that all happened, because at the end of the day, we had, for example, we had a customer who does scientific research, medical research, say to us, look, this is science. The hallucination rate has to be pretty much zero. So these are things that you've got to understand when it comes to MLOps. There are tons of tools out there on the market. These are just a few. I just went to Google and said, MLOps tools, and I just started randomly pulling, randomly pulling logos, and I was like, this is going to take forever, forget it, especially because we're really only going to do with two today. We're going to do with Kubeflow and MLflow. These are two of the biggies, particularly in the open source space. Not to say that there aren't good stuff, there isn't good stuff here. We're just not talking about it today. So today, we are going to start by talking about MLflow. So all right, let's start by saying, all right, what is it? So MLflow is an open source project. It's managed by Databricks, and it's sort of like a digital lab notebook. I don't know if any of you had science in school to the point where you had to have a lab notebook where you wrote down literally everything that you did, and you did that so that you could duplicate your results, and more specifically, you did that so that other people could duplicate your results. And that is exactly what it's like in the world today. We want for people to be able to duplicate what we do. So I've gone ahead and I've started up an instance of MLflow, so we can take a quick look at it and what it does. And basically, MLflow gives us a place to track the experiments that we do, the training that we do, and what the results were so that we can compare them and sort them and filter them and search them and so on. So for example, they're grouped into experiments. So for example, this experiment has five runs in it. I ran this training job five different times. I can see it. I can see the results of each one. I've got all the artifacts. Here are the parameters that were used. In this case, I may have been doing hyperparameter tuning. So I was looking for where did I start, what was the results of my metrics, and so on. So it also has a registry. We're going to talk a little bit more about that, but that's the general idea. But the idea is that it's sort of like a digital lab notebook where everything that you do is there, so that you can pass it on to other people. Basic functions of MLflow, the ones that we're going to focus on, are these first four. There we go. So tracking, so the first and most important thing it does is it tracks what's going on with your experiments so that you can see everything that's happening. And then it saves the models that are the results of those training runs. It also provides a way for you to put them together into projects, which you can then distribute and even, quote, unquote, run. And it provides a registry, maybe I should go into presentation mode. And it even provides a registry for those models so that you can search them and serve them and provide them for yourself and for other people to run predictions against. There's also ways to provide plugins for MLflow, and it does authentication, but we're not really going to cover that today. Okay, so let's look at how it does tracking and let's also look at models. So I went ahead and I went ahead and installed MLflow before we started. So as you can tell, because I got it running over here. But so essentially I've got it running on port 5000. So I'm going to go ahead and I'm going to import the library and I'm also going to import this in first signature because we're going to need that. And then I'm going to set the tracking URI. This is going to tell the rest of the script where to send the results and where to look for the results later on. Just a little tip here, you don't have to set this tracking URI, but if you don't, you'll have problems later. It's the long story. Don't worry about it right now. Just remember that I said that when you can't get your models back out. Okay, so let's look here. So we're just going to look at setting up a basic routine to run. It's what we're going to do here is we're going to use these hyperparameters. And this is what we're going to test out. So we're just going to put them into an array just for convenience sake. And then we're going to do the simple, this is just the iris data set. You've probably seen this a thousand times and the reason is because it's easy. So let's us do it here too. So we're going to do logistic regression. Here's our parameters. We're going to train it and we want to get back out the accuracy. Okay, so let that run. All right, now we want to set an experiment to group these runs into. So we've been using this. So let's do, I'm going to say live demo now. I should spell it correctly however. There we go. So I'm going to say it's live demo. So it said, all right, it didn't exist. We created a new one. There we go. Lovely. Now, we're going to go ahead and do the run. So with MLflow as the context, these parameters are what we want to log. That's what we want MLflow to keep track of. And the metric is the accuracy, which we specified earlier. We're also going to go ahead and set this tag training info. This is completely arbitrary information. You can set it to whatever you want. But the important thing is you could search on it later. So for example, you could set it to be after we stained the cells with Pepsi. I don't know, whatever. Okay, so we're going to get the signature. And then we're going to go ahead and set the model. In this case, we're doing sklearn. MLflow supports something like 20 different flavors of frameworks. Or you can use your own. We'll see a little bit about that. But we're basically passing in everything that we need. And we're going to go ahead and register this model as well. In fact, let's just call this live. We don't have to do that. We can register it later, but let's save ourselves some time. Okay, so that is going to take a second to run. So let's look at, there we go. So successfully registered the model, created version one, because there wasn't a version of it yet. Okay, so that, whoa. So now we'll come back here. And if we look, if we go ahead and, now we have our new experiment. And we have this run that we just did 31 seconds ago. It's got the tags. It's got the metrics. It's got everything that we just put in it. Okay, it's got all the artifacts. Okay, so that is tracking. That's the basics of tracking. Obviously we can't possibly cover everything, but, you know, that's the basics. Now, if we want to make, we want to, we can then go ahead and take that model, which we train using SK learn and pull it out as a generic Python function. Okay, so I'm going to go ahead and say, all right, this is where I'm storing it. I didn't actually have to print that out, but I did for my own edification. I've loaded the model as a Python function, as you can see here. And now I'm just going to make predictions with it. This is just like I would have done with anything else. Okay, and we're running this or not. There we go. And there we go. And there's our results. Okay, so that is the general idea. Okay? All right, now, the other thing that we can do is we can then package all of this stuff up. And one of the things about MLflow that is great is that it allows you to share your models with other people because it's a standard format. So we can package it up into a project, and that project can be a directory, a Git repo, and so on. You can have an entry point that is say a Python script or something like that, so that you can literally run the project. Those environments that it can run in can be local, you can even run it in Kubernetes, which as I'm sure you have guessed is going to come in useful later. Okay, MLflow registry is one of the places where MLflow really shines. As you saw, we automatically created a new version when we registered that model. You can then call them from anywhere, for example. For example, let me get the right one. For example, we could say, all right, MLflow models serve, and in this case it would be live virus demo. Okay, in this case it would be version one because we only have one version right now. Okay? All right, so, and the nice thing is you can do this sort of remotely across environments to kind of tie things together. You can bridge this between people who are using different frameworks, different environments, and so on. All right, so that is the basics of MLflow, and I know that I'm kind of like barreling along here because you only have 13 minutes left, actually 12. Kubeflow, let's see if we can get through the whole thing. All right, so Kubeflow, what is Kubeflow? It is essentially MLOps from Google. People who gave you Kubernetes now give you MLOps based on Kubernetes, which is great if you are using Kubernetes. If you are Kubernetes-phobic, maybe not so much. It is based on custom resource definitions, which if you have ever used them, you know that that can be somewhat complex creating those operators, but it's not the end of the world, especially because Kubeflow people have done all that for you. All right, so basic components that we're going to look at in this case are notebooks. These are the same as your Jupyter notebooks. I wasn't going to talk about them because I was like, well, everybody knows what notebooks are. But when I talk about them, you'll know why I'm going to mention them. Obviously, models are important here as well. Koteeb, which I've probably mispronounced, for AutoML. Training operators, which is how Kubeflow actually does the machine learning training. K-Serve for serving out those models. And Pipelines. All right, so let's talk about notebooks really quickly. As I said, I wasn't going to talk about them, but these are actually notebooks that are based in the Kubernetes cluster itself, which doesn't sound like a big deal, except that that means a couple of things. One, you can work with and create and manage Kubernetes resources. And also, because you are just basically pulling up an image, it doesn't have to be the JupyterLab image. Theoretically, supposedly, it can be any IDE image. So I haven't tried it. If any of you have, I would love to hear whether it worked. But that's pretty awesome. And basically, it runs in a pod. Okay. Koteeb, it is an AutoML tool. What that means is that it tries to automate some of the processes that you normally spend a lot of time doing, like hyperparameter tuning, which is doing things like setting the learning rate, et cetera. It also does neural architecture search, which is related to hyperparameter tuning, but I don't have a PhD in math, so I really can't explain it. But it's there. If you want it, it's there. Okay. Training operators, as I said earlier, this is where the rubber hits the road in terms of actually doing the training. These are the six operators that Kubeflow has. TensorFlow, Paddle, Paddle, PyTorch, MXNet, XGBoost, and MPI. This is many, many less than natively supported by MLflow, but there's a caveat there which we'll get to in a second. That is the caveat. So KServe is a serverless framework for serving out these models. By the way, it can also be a standalone tool. You don't need to use it with Kubeflow. You can use it on its own. But one of the things that it can do is create CRDs for other frameworks, besides the ones that are natively supported by Kubeflow. So that kind of gets around that somewhat. KServe also does two cool things. It helps you with pre-processing, which is something we all obviously have to do, and explainability, which I don't know how it does it or how well it does it, but it's something that is becoming increasingly important as we go along, and it's something that we need to pay way more attention to. All right, so Kubeflow Pipelines. So Pipelines are to Kubeflow what the model registry is to MLflow, in that this is sort of what Kubeflow is known for and best at. It works through a tool called KFP, which can also be a standalone tool. These pipelines are native Python pipelines, but they can be specified in a sort of intermediate YAML format so that you can pass them around, and you can kind of load and share components between those pipelines. So let's take a quick look at that. I also have Kubeflow running. I used CGDevax to fire up a Kubernetes cluster, and then, oh, that's me. That's not, oh, don't do this to me. Okay, well, screenshots, it is. Well, yeah, that's true, but at least I had enough sense to take screenshots of what I needed to show because I've done this before. Yeah, so what I did was I took CGDevax, I fired up a Kubernetes cluster on AWS, and then there's a set of manifests which you can run against the cluster that then create all the Kubeflow objects. So at that point, Kubeflow is then running in your cluster. So I did that, and then what we do is we wind up with your Kubeflow tool, a brief look at some of the things that you will find in Kubeflow. Like I said, I took screenshots. I've been here before. Trainwreck, I told you. So you can see some of the things that are here. It's a little bit more extensive than MLflow, but probably a little less intuitive. So notebooks, tensor boards, volumes, models, and so on. You can see runs. You can even set recurring runs. We're going to look at pipelines. One good thing that is nice is it does come with several pipelines already installed and deployed when you get it. So you can see the source code and you can see how everything fits together, which is a good thing. What we're going to do is we're going to create a pipeline of our own. Now, they can be actually fairly complicated. For example, this is one of the ones that comes pre-installed. You can see there's lots of branching and circular and this and that, and this would be a nightmare to code this on your own. Fortunately, you don't have to. What you can do is you can create a Python script and then compile that script into a pipeline. For example, here is my pipeline script. I'm going to compile a pipeline from the function. My pipeline has a set of logic and that set of logic has two functions. Web downloader op and merge CSV task. It's just this one is really simple. When I compile it, it turns into that, which is just a YAML file with two workflow objects. There's download the data and merge CSV. If I then upload that into TensorFlow, it will give us something that looks like what we had before. Then we can go ahead and run it. If this were live, I would click create run and then you would see that this would disappear and then it would pop up download data while it was doing that and then it would come up green and then it would come here and do merge CSV and I'm sorry that we're not seeing that because I'm proud of the fact that I got it working by Golly. Now, moving right along. That is Kubeflow. Now, let's talk about the two of them together. It is silly to compare them. There are two different things. They do two different things. It's not which one is better. It's which one is better for whatever you're doing right now. MLflow is great if you have an existing process and you want to add a layer of trackability to it. Kubeflow is good when you have sort of green fields and you don't need to integrate with anything and you're okay integrating your process to your MLops. MLflow is a fairly lightweight thing. It's easy for one person to put together and use and do all that. Kubeflow is good if you're already on Kubernetes. That's good. If you're Kubernetes shop, more power to you. Go for it. I've been using Kubernetes for a long time and yeah, it's a good thing. Okay. MLflow, great for model management. You've got the registry. Not so good for pipelines. Kubeflow, great for pipeline management. It's got your KFPs. Not so good for model management. It stores metadata and they call that a registry. Maybe not so much. MLflow helps bridge different environments by letting you share your models around. And Kubeflow helps bridge different environments by serving different frameworks. Kubeflow is also great for scaling. Okay, how can we use them together? We can run MLflow on Kubernetes. We can package MLflow models as Docker containers, which I didn't show you. It's here, but you can do it and you can just deploy them straight as... You can deploy them as containers, as deployments. Here we go. So, build Docker, grab your container, grab your model, run it, push it, deploy it. Okay. Or you can run your KFP pipelines from within your Python code. Simple tips. Understand what you're trying to do. Just track your stuff, people. Start simple. And the main thing, keep testing your models because they will get out... They will drift, okay? The real world will get out of sync with your models and then you will need to retrain them and you will not know that unless you keep testing them. And that is the end of my presentation and it is exactly on time. Thank you so much for being here. I am Nick Chase and I am with Cloud Geometry and let us know if we can help you in any way. Thank you.