 OK, I'm going to get started. I hope everyone's here to learn about Kubeflow. My name's Trevor Grant. I don't know if any of you all have been to a lot of the talks here, but sometimes I work for IBM, and a lot of times big cloud vendors will shamelessly plug their company's product. I feel like anybody can do that, but it happens a lot from big cloud vendor speakers. And I will be also plugging a lot, but I'm going to be plugging for this book we're writing. And so you're going to keep seeing that website. So make sure to go to that website, put in your email address, and get updates about our up and coming Kubeflow book. So with that, this is a beginner level talk. I have already been just kind of hanging out. There have been some really, really good deeper level talks. If you don't like beginner level welcome to Kubeflow sort of things, and you want to go somewhere else, that's OK. I get that. This is just to kind of help get you started. And what is Kubeflow? Why are we doing? Why is it a thing? Why are we doing this? I'll talk about me. What is Kubeflow? Kind of go over some big components. If you looked at the schedule, you might have seen that I had a co-host presenter, Holden Kerrow. She couldn't make it. She had some house troubles. So she was the one who knew all the spark stuff. So there's a couple of things that we're going to have to adjust there. First of all, I'm not going to try to just fake my way through the spark stuff. I don't know that. Second of all, me and Holden tell lots of jokes. And so without all of our jokes, that significantly shorted our presentation. So we had to add some other stuff back in. I still have some jokes, but if you were hoping to see Trevor and Holden's comedy hour, again, it will not hurt my feelings if you leave. OK, so I'm Trevor. I'm from Chicago. I do IoT at IBM. Though I am not doing IoT today, I am putting on, because I know there's some IoT fans in here. Or hopefully there are. You should go to Apache County, North America. It's in Las Vegas in September. And I'm putting on the Apache IoT track. So check that out. I am a PMC on a few Apache projects. That's like the community leader. Check all of those out. I was a medic in Afghanistan. And that's relevant, because I will be making some analogies that kind of tie back in the military history. And if it seems like that's coming from a really weird place, that's why. I blog, though I haven't done so much recently. You can pretty much find me all over the internet at rockintrevo, is how we pronounce that. And yeah, check out my website. And hopefully I'll start blogging again soon. You can read all about me and my adventures in technology. So some background. What is Kubernetes? How many people here aren't really familiar with Kubernetes? Haven't heard of it. OK, we've got a couple. Yeah, right. Sorry, this guy back here also gave a talk on Kubeflow earlier today. Kubernetes, in general, is an orchestration thing for dockers. You've got a bunch of docker containers. You don't want to keep track of the DNS and all that. It's kind of a dirty whatever. It's a thing that's worth looking into. But for the purposes of this, we can suffice to say it's magic in docker containers. What is machine learning and artificial intelligence? So throughout this talk, I have a, I don't think it's a bad habit, but it certainly is a habit of ripping on data scientists and artificial intelligence. I got a master's in applied math and statistics and probability. This was my job for several years. It's all in good fun. So if you're a data scientist and you think I'm being too mean, I'm sorry, I'm just gently ripping you. But so since the dawn of time, people in positions of power have wanted to find someone with some esoteric knowledge who could see things that they couldn't see and ask them for advice and then blame them when things went wrong. And originally, it was like the forest major oracle. And then we kind of got into tarot card readers and neurology. And in the more recent iterations, say like industrial revolution plus, it got scientific with mathematicians and statisticians and probability and all that fun stuff. But then we had machine learning. Now, machine learning and AI is, whereas like a statistician and a probability kind of person, they need four to eight years of schooling and a lot of training. A data scientist, you send them off to a medis boot camp. Eight weeks later, they come back. Boom, data scientist. You can turn them out 10 at a time. People are coming from all these different whatever prior career paths. And boom, they're data scientists with just a little bit of work. There is an analogy in the past to this to archery and the invention of the musket. Archers took 10, 15, it took a long time to train a really accurate archer. A musket man needs about two weeks of drill and that's just so they march right and they don't run into each other and you can kind of control them as they come out on the battlefield. The musket man is the data scientist of today, whereas the statistician is like the archer. Takes a long time to train, they're more accurate, but they get, they take forever to, if you lose one, it takes forever to get a new one. So, well, you could say, but Trevor, wait, we don't use archers anymore. There's a reason the musket took over. Well, yeah, it's because we came up with better and better tools for muskets. We came up with better muskets is what happened. We came up with rifling and this is my flicker image of modern infantry. And you know, there was a lot of advancements to that, but really the idea remains the same. Two months of basic training and you've got yourself an infantryman versus 20 years to train a rifleman. And the only difference is today's infantryman has much better tools at their disposal. And so with that, what we have is, and that's where Kubeflow kind of fits into this ecosystem. Kubeflow is one of a number of tools. It's actually kind of a package of several tools that make that data scientist more efficient and we're giving them those tools they need to be better with their eight weeks of Metis Boot Camp training. Now, problems that data scientists have. The first problem that the data scientists have is that they don't want to be locked in a cubicle, working on like a big mainframe or a big powerful computer. They want to be in a coffee shop working on their MacBook making models so everyone can see how cool they are. And this is a data scientist in a coffee shop building models. The problem with that is that, well, okay, I'm gonna jump to this one. Laptop, MacBooks don't have a lot of power. They never have GPUs as far as I know. They're just not great machines for building deep machine learning models on. So that is a problem. Also, model serving. You can not really serve your models on a MacBook in your coffee shop. They need to get pushed somewhere into some sort of production environment so other systems can hit them. And that gets into a whole other rabbit hole of problems that are associated with model serving. And most of this is just so far beyond the data scientists' purview that they don't even realize the problems they have. But those of us who have to build systems for them to do things with now have to put up with these problems and solve them for the data scientists. So with all of that, let's say, what is Kubeflow? Check my time. Okay, so what is Kubeflow? Kubeflow is like a big delicious buffet of all these different components that will help you solve the aforementioned problems that the data scientists have. And by that they have, that I mean that you have because you have to support them. Though you might be a data scientist who also has to solve these problems. More succinctly, here's this chart with a source to the person who made it. But it is, again, kind of a big package. You've got pipelining, being able to run experiments, model serving, training, notebook serving. There's just a lot that kind of goes into it and Kubeflow's this sort of big tent that covers all these different packages. The upshot of Kubeflow is that you can, instead of having to install these different things, you can install it and let's hope that my slides are where I thought they were. Nope, they're not. And about three lines of code is all it takes to do that. And we'll get into that in a little bit. But then you get the nice web UIs. It all goes on Kubernetes, which is great because then it can be scalable. You can deploy it to various different clouds or on-prem as needed. So that's what that is. More specifically, if you want to see the actual components, here they are. But that's not nearly as pretty of a thing. You bring your own libraries in this. Anything that's in pandas, you can run in Kubeflow. Jupyter libraries, PyTorch, TensorFlow, your MXNet, all these kind of models because every data scientist is like a unique snowflake. And they have their own favorite library. And they can't possibly do their job unless they have their one library that they like. So you can get them to not sneak out of doing work because they don't have the right library, but it's whatever you'd install it. And also, there's some model management or library management in there. There's also Dataprep. Because another problem, if you've ever had data scientists come and go, is that they won't log how they got their data or how they prepped it. And they're like, here's the model and here's my results. And I'm like, OK, cool. And then a new data scientist comes in. I have no idea where any of these numbers came from. And so you can also build in some ways to keep track of where did this data come from, how did I prep it, what transformations were done, so that there's a little bit more actual science that data scientists. And by that, I mean it's reproducible. And then model persistence and deployment. This is another thing that, again, your average data scientist isn't going to think about. But the value isn't that you came up with insights. The value is we have something that has a rest endpoint. And whatever the rest of the application is, can call that and hit that model all day long. And that's like a whole other field of study. And there's all of these great products that already exist for you to do that. And so you don't have to reinvent the wheel. And because really, the whole point of this is also like every good open source project, a lot of people are trying to do this and they're hacking it together with shell scripts and flask docker containers. And instead of doing that, someone said, why don't we just do it once and do it really, really well? And then anybody who needs to do this doesn't have to home roll their own solution. More model serving. So how does it fit in the ecosystem? I kind of already touched on this. But the idea is research is great. And if you've got a job where you're just doing research and you just have to come up with some insights for a board meeting once a week and then they're going to do whatever, great. You got an awesome job. You don't need Kubeflow. You just kind of just hold on to that job as long as it lasts because it's not going to be very long. But there's no reason to leave the party early. For everyone else who has to put models in production, this solves a lot of problems in one easy step. And there's a learning curve, but it's solving a lot of problems. And it's probably solving problems you didn't even hadn't thought that you were going to have. And it's taking care of them for you ahead of time. Because, again, that's what good software does. And gets you into production, less shell scripts. So now that you've heard my intro 15-minute pitch on Kubeflow with five minutes of introductions, you still want to use this, huh? Let me talk you out of it. Oh, wait, maybe not. OK, I'll talk you out of it later. These are the three lines that basically takes to set up your project. It's a Kubeflow cuddle in it, my awesome project, name your platform. Go in there, generate the platform, and apply. And that's it. If you're familiar with Kubernetes, it's a pretty similar API. No big deal. What does that get you? Out of the box, you're going to get a Jupyter Hub, TensorFlow Job, TensorFlow Search, being PyTorch, Cateb, which is like a hyperparameter tuning thing. I don't know that ambassador is still a thing, or if it's it's you. I might have to double check on that one. Pipelines, which is Ardgo. So OK, OK. And you're probably thinking, who is this guy? He doesn't even know what the hell's in the product. What am I listening in for? There was recently a big version bump from 05 to 06. How many people are familiar with Ksonnet? No one? OK, well, the other guy did the good. So there's this thing called Ksonnet, which was like Customize or Helm. It's another thing like that, but it got deprecated. Thanks. It got deprecated like a year and a half ago. Everybody who worked on the project, they're like, I'm not going to do this anymore. We're just walking away from this. So they got to halfway through building this product, and they're like, oh man, we got to get out of there. So the whole thing had to be refactored to work on another like deployment manager. At any rate, along with that, there were some other things that got swapped out. And I've been busy with my day job for the last three, two weeks, and I haven't seen what all the new stuff is. OK, pipelines. So you might have been thinking, what does that mean, pipelines? The short of it is that you have this directed analytic graph that this is an example that just comes with Kubeflow. But imagine you flip a coin, its heads are tails, you get a random number, and then it prints a thing. The output looks like this. And the idea there is, if you're going to run a million experiments, you've got to know, OK, what are the hyperparameters I'm going to tune? You could think like SK Learn, you might be tuning just the alpha on one of your models. But if you're trying 12 different models and you've got different hyperparameters and you're also trying different combinations of input data, there's other things you could do. And so this is how you build it. And I know, again, that's really high level. If you're interested in more, hit me up after this. And I can point you to some really good talks that dive just into this specifically and give examples about it. This is what a pipeline might look like more for a job that you are trying to actually create and deploy, where you're going to validate your data coming in, preprocess it. You'll have a training that puts out analysis and prediction, and then deploys it if it's good, and then it's looking at confusion matrices. It's just an example of maybe a more real world problem than a coin flip, which that has no practical application. You can also put it on schedules. So if you want to retrain your model every so often, that's a thing you can do, because that's the thing that people normally do want to do. There's Boris back there. I stole the slide from him or the screenshot. Go watch his video, because I think it was recorded. So you've got a graphic editor. When you deploy that Kubeflow cuddle, you install, you get this web UI that you can build your pipelines. You also can build them with, oh, man, really? Did I not? You can also use code. I think I've got a screenshot of the code later. I'm going to hopefully that. CrossCloud, I was in sitting at another Kubeflow talk earlier, and someone had a question, and they said, can I just seamlessly move this between two clouds? And I laughed, and he said, yes. I said, no. That got broken in the 0.6 refactor. It will probably be fixed soon. It may have been fixed in the last week or so, and I wasn't paying attention. But 0.6 will only deploy to GCP as best I know. No? Can you get on AWS too? Oh, God. Boris, who's a very good hacker, can also make it work on AWS and whatever at any rate. What else, though, on 0.6? We refactored out case on it. Small side effect, I mean, if you're an end user, you're probably not going to notice a lot of difference because they'd already obfuscated a lot of that. But where it will come in handy for you, the user is when the project keeps working in 12 months, or when they version bump Kubernetes to, like, a new version. So that's good. They're also, ah, the Python. They added some functionality to writing your pipelines in Python. This is just a screenshot of code. And what that looks like, I'm not going to line by that. Istio. It's an ingress manager. It also supports a lot of multi-users. Boris, this is another thing that they went into in a lot of detail on their talk. The point being, though, is it's more resilient with Istio. So, and the metadata, which is alpha. But the idea that you are running these pipelines and you're doing these experiments and you want to be able to check in on them, now there's an API that will serve data about, you know, we've run this pipeline. We've run 400 experiments and here's these results. And you can get that in API and make your own pretty web UI to watch your results and see how things are coming about. Yeah. So it's an alpha, but then that's another thing, too. The entire projects in like, thanks, the entire projects in like 0.6, there's a lot of, it's moving fast. It's breaking things. There's a lot of breaking changes. But it's a cool thing to keep an eye on, which is a perfect segue to why you shouldn't use this. If you're doing simple things, maybe you should just do them locally. This is a lot of overhead. On most of the Hello World examples of this, you're usually going to be paying a few bucks worth of Kubernetes time just to get this thing set up and running because it's not built to do something you could do on your laptop. If you could do it on your laptop or your desktop, that's where to do it. But if you need to do a huge model that's running on a lot of data spread out over multiple nodes, this is what you want to do. And you pay a little bit for that in the terms of the overhead. Active development, the web UI change colors. That isn't even a complete slide. But we talked about that. You can listen to three talks on Kubeflow and people can talk about three totally different tool sets. Not me, because I didn't talk about any tool sets. But the point is that because there's so many tools that get wrapped up in this project, it can seem like a lot and very overwhelming. It's also not a cloud-native computing foundation project. This is another thing that I always say, just be aware of proprietary open source that someone owns. It's my two cents, but there it is. So if you're still interested in Kubeflow, there are some workshops we did back when you used to be able to go cross-cloud. Discussion docs, slides, you mentioned a book. Maybe please sign up for it at Introduction to ML with Kubeflow.com. And questions, and there's not going to be a half-baked ML. Any questions? Can anybody tell me what Kubeflow is? All right, well, thanks for not leaving in the middle of my talk. I appreciate it. I thought.