 Hi, can you all hear me? Is this? All right, I'm just going to talk loud. Oh, for the people at home. That's right. Hi, I'm Trevor Grant. And this is machine learning from lab to production with Kubeflow. Nope. All right, nice. We're going to talk about us. I think this is a 20-minute talk, and we're a little late going in, so you can read. That's that. Go ahead. I'm holding, I'll do an About Me slide in my next talk, which is right after this. Hi, I'm Trevor. I work at IBM. I'm a PMC on a few Apache projects. If you are interested in visiting lovely Chicago, we're putting on a road show there in May. The CFP closes on the 15th. Please put stuff in. This is the first time I've ever done one of these, and it's terrifying. So Kubeflow, have you ever gone to a job interview, and they said, you don't know about some esoteric data science tool that you've never even heard of? And so we can't take you. Well, prepare for those days to be behind you once you start working with Kubeflow, because every single data science model, library, pipeline tool, you name it, and it goes in Kubeflow. So some background. What is statistics? For the purposes of this talk, long ago, there were some very wise people who were able to answer problems that couldn't be answered with tarot cards and tea leaves by using something called statistics. The problem with statistics is statistics was really just a thin veil over math. So they came up with machine learning, which let you do all the cool things of statistics, but with a line of code. But the problem with machine learning was it was still very much like math. And the only people who aren't terrified of math are the people who hate math. So we needed something more. And the marketing people went up to the mountain, and they came down, and they gave us artificial intelligence. And artificial intelligence is so removed from math because it's just an ill-defined abstract concept. And you might be wondering, OK, what do these things have to do with Kubeflow? Well, the upshot of Kubeflow is they can work with all of these types of models. It is basically magic in GPUs, or what all of these things are. And Kubeflow can handle that. Oh, also, call out for Google. If you're on the Google Cloud, you can use a Tensor Processor unit. And that's what they look like. How many GPUs do you need? This very scientific-based graph, you will see when you first start out, the young data scientists doesn't have any idea what they're doing, and so they think they need lots of GPUs. As they grow and they become the intermediate data scientists, they realize GPUs, they don't need about 99.9999% of the time. But then a very few data scientists will go to this level where they actually do need GPUs. They understand the use cases they need them in, and they need a lot, but only for a little bit of time. Kubernetes, for the purposes of this container, is magic in shipping containers. So here we see a data science dev doing, oh, it's not animated. You have to imagine, on the slide version, the cat types, it's really good. When we post the slides later, I recommend checking it out. But you have a dev, you have data scientists who want to code on their MacBooks in their coffee shops, because the whole point of being a data scientist is for the image, and you want to look cool and make sure everyone knows how cool you are. So the problem with it, it's hard to train models on a MacBook. These things don't have a lot of power. They're weak at best. So you want to push that training off into a server farm somewhere else. That can be tricky at best. And then we haven't even talked about deploying to production yet. Now good data scientists are smart, and they don't want their models to deploy to production, because the data scientist's natural enemy is the manager who wants quantifiable results. And then when you go to production and you see how terrible the model is, it's bad news bears for the data scientist. So unfortunately for the data scientists, but good for everyone else, Kubeflow makes all these things very easy. So what is Kubeflow? That's, yeah, cool. So besides a large collection of buzzwords, which is very useful, it is notably not just magic and containers and TensorFlow, even though the name perhaps might have some of you thinking that it's simply magic plus containers and TensorFlow. It is much more than that. It has so many more buzzwords. It is the buffet of machine learning. And this is really good. You, much like a buffet, you probably shouldn't eat all of it. Buffets are delicious, but if you attempt to eat this buffet, you will have a very bad time. And similarly, if you attempt to use all of the components inside of Kubeflow at the same time, you will have a very bad time and you will be asked to leave your cloud provider. Or at least your manager will perhaps come to you and ask you to justify your budget. And you'll be like, well, I wanted to train PyTorch model and TensorFlow model at the same time. But no, and so Kubeflow gives us the flexibility to choose the pieces that we need. And this is awesome and cool. And so our container buffet is not simply, unfortunately, delicious meats. It is a collection of various machine learning tools. And some of them are things that are more like data prep tools. And some of them are more like traditional machine learning tools like the PyTorch job or Seldin, which can do the surveying of some of these trained things. And so at school, there's lots of options here. And you can totally put them into your pipelines together. And you can train all kinds of different models with it. If one allows a certain amount of flexibility for the word production, you can use any of your favorite Python libraries provided that you don't want to parallelize it. Probably not actually good for real big data. But you can totally use whatever you want. Other jobs like PyTorch and TensorFlow have their own implementations and MXNet, which provide distributed implementations in Kubeflow. And you can use these operators. And then you can train on your arbitrary data. And there's also add-ons from other people that you can totally use with Kubeflow. It's not just the packages that ship with Kubeflow. Although the one example that I did find had a failing CI test. But to be fair, my PR at the Kubeflow repo also has a failing CI test, because Kubeflow is under active development. Politely referred to active development. That's good. OK. And we can also do data prep, right? Unfortunately, and this is the really sad part of machine learning, is that it's nowhere near as cool as you think. You end up spending most of your time getting your data together. I did a talk this morning with Nova. And we spent most of our time, besides fighting with GRPC, getting the data together to train our model, right? The actual training of the model, that was not that complex. But getting all of the data to train it is really painful. And so right now you can use Packarderm or Shellscripts. Yeah. And there are other tools coming as well. There's pull request for Apache Spark. It is failing in CI, but that should not stop you from using it. I have a test Shellscript, which totally works on my machine. And so that will totally work in production, right? Just check out that pull request, rebase it on master, and have a party. And TensorFlow Transform is also some really cool tools. Unfortunately, for now, if you're not on Google Cloud, it only supports local mode. But it's getting better there as well with Beam's Flink support coming. Once you've prepared your data, trained your model, if you want to put it somewhere, you can just stick it in a storage bucket and be like, hey, what's up? That's normally what I do, because I don't have to put stuff in real production. But if you have real production, you probably want to put it in something like Model DBE. And I don't know why I put Packarderm here. Well, whatever. You could use Packarderm to copy your model somewhere. That would be cool. OK. And now we have our model, and now we have to serve it. And there's a whole bunch of built-in options for serving. Here we go. So Python Flask, if you just want to write a bunch of Python code by hand. But there's also a built-in TensorFlow model server, OpenDino, NVIDIA interface, inference server, and all sorts of other things. Selden Core has a bunch of really cool features if you just want to do that. Yeah. OK. So maybe you want to use this. Maybe you were like, I could use six of those components to train some machine learning pipeline. And the other seven sound really useful to have on my resume. So let's go ahead and use it. So what's next? The first thing you want to do is reconsider how badly you want to keep your job and by putting developmental products into production. But let's say that you decided after a while that you really don't care about your job that much and you're willing to roll some dice. Then think about the types of models you want to use. Looking at the components directory, and there's new components coming on line all the time. That's why everything's failing the CI build. When you have a very actively developed project, that is a problem you run into. You can use Jupyter. And if you can't find it, really? OK. This is how easy it is to get up and going. So if you're familiar with Kubernetes, you're familiar with something called case on it that manages handle or deploying applications. It's a few lines at the shell script and bada-bing, bada-boom. You're good. But wait, Trevor, we have a special Fubaz initiator or a special yak shaving tool that we have to work into our pipeline. Well, don't worry. Building a container with, again, just a couple of lines, you can add whatever special tool. And if you're feeling then very generous, you can contribute that back as a PR. And everyone can use the special yak shaving tool for their pipelines as well. Like anything in open source, always appreciate pull requests. Don't just take. It's always nice to give something back. If you want to see some live demos, we have them recorded so that we wouldn't have to worry about, but I guess they're not really live. Well, we have recorded demos. We have recorded demos. And realistically, there's no way that we could demo how to use Clip Flow in the remaining five minutes of the talk. Even the whole 20 minutes on this Wi-Fi would be a little touch or go. And unlike all the machine learning stuff, the cool thing isn't the thing that we do. The cool thing is that in 20 minutes, you can take an entire pipeline from training to a deployed production grade model. And that's the real magic with vein here. So questions, how do we do it? 15 to 15 minutes. Do we? Yeah. That was unexpected. We should have told more jokes. All right. We can try and get the GIFs working if people don't know. Questions? OK, I'll work on the GIFs. OK. If anyone wants to ask questions while I'm working on the GIFs, though, please feel free. That's OK. Do we have our? Yeah, you have a question. Got a question. The question was, do you have anything for monitoring the model once it's been deployed? I was a cat. It's such data science. Seldon Core is another one of the components, one of the model-serving layers. Seldon Core has a lot of really, really cool monitoring, how well the model's doing, A-B testing, redirecting traffic from model. Like you could have like 12 models out there and be constantly trying the different ones and seeing how they're going. So yes, there is a pluggable tool that will give you model monitoring if that's something you need. Yes? Sorry, what resource manager do you have on your? Slurm. Slurm. OK. I don't know how to run Kubernetes on top of Slurm. I don't know if that's the thing you can do. Kubeflow does depend heavily on Kubernetes. So if you can't get Kubernetes on your HBC cluster, you probably don't want to use this. It's going to be a bad time. Yeah, yeah. What? Oh, can I include a feature store? I mean, there's ModelDB if I want to store like the features from my libraries and models and stuff. Wait, feature? I'm guessing by feature store you mean like just some pre-canned off the shelf so people can just drag and drop it in? Oh, feature store. OK, right, sorry. No, no. You can put it in GCS or S3, sure. Yeah, sorry. I was thinking model store in my head. My brain is a little fried. But yeah, you pick your favorite cloud vendor slash NFS file system and you store your data there. I don't know. Cool. OK, we've got two more minutes. So I'm going to show you the picture of, yeah, this is what deploying to production looks like. It goes about that well on a good day. Does anyone have a question about deploying to production? Any more questions? Yeah, that's a great question. So Kubeflow, oh yeah, so case on it is quite complex, is Kubeflow relatively friendly to it? And so Kubeflow actually has gotten a lot better at hiding the complexity. And so there is the site that we showed had you do a KSNNIT and stuff. But there is also another entry point, which you can do, which will just abstract a lot of that away from you. It's called KF-CTL for Kubeflow Control. And it gives you, it's still terrifying underneath the hood, but it hides all of the monsters underneath the bed. OK, cool. Yeah, one minute. I feel like we're good. One last question. OK. Thanks for telling me our talk. Cool.