 I'm going to go ahead and get started right on time. Thanks everybody for coming. This talk is called Machine Learning on Kubernetes, or Machine Learning Made Easy on Kubernetes. Basically, we're going to use Kubernetes and other open source projects to make machine learning easier. We could call that ML Ops. That seems to be the term that everyone's chosen. Apparently, if you put Ops on the end of anything, it clearly just makes it better. So now we're just doing that with everything. But I support it. I get the idea behind that. So let's get into it. First, my name is Brian Redmond. We got through the name part that my name is Redmond, and I work at Microsoft. I've been there about 18 years coming up on 19. I'm a part of the Azure Global Block Belt team. We basically help customers adopt open source solutions on our Azure Cloud. And that can be anything and everything in that area. I'm based out of Pittsburgh. I spend my time traveling around, hopefully finding time to run and be outside, and things like that. So anyway, I actually want to start with the demo. We're going to talk about what I'm here to talk about. But it might be fun to actually do a demo first. I like to do as many demos as I can in these kind of talks. Basically, Game of Thrones had ended recently. We're not going to debate here about the last season and what it meant. But sometimes you're watching the show, and this kind of character comes on the TV, and you're like, who is that guy? I mean, we don't even know what's going on. So I built an image classification model using TensorFlow to basically allow you to put a picture in of anything and have it come back and tell you who it is. And again, this is kind of a demo. And you guys can try this yourselves. If you go to this link, aka.ms slash thrones, you can try it. You can take a selfie. It'll tell you which character you are. Can't promise what you're going to get back. But I'll show you how it works here. So let's take a look. So this is what it looks like. Basically, it's just a web app. But it is talking to a trained TensorFlow model that's using TensorFlow Serving running in the cloud. Apparently, I need to end the show. Thank you. So basically, what we do is we select a photo. And in my case, if I were to pick that Benjen character, if we didn't really know who he was, it'll actually run that against the model and comes back and says, yep, that's Benjen Stark. I'm 98% sure. And we can run it again against something else. Like, for example, I could pick a picture of myself here from a trip I took in Denmark. And it's 82% sure. I'm Robert Baratheon, which is flattering. Also, spoiler alert, he's dead. So I got that going for me. So feel free to try that out. The demos I'm going to do later after I explain a few things are all based on, how did I build this? How did I train the model? How did I actually incorporate this into a pipeline and actually do this on a consistent basis? Again, the predictions are kind of a little bit bogus just for demo purposes. But you get the idea. So that's what we're dealing with. So let's start out by talking about what is machine learning. I would imagine most of you in the room know what machine learning is. I like to turn to maybe a scholar in this field, Arthur Samuel, to find it as the ability to learn without being explicitly programmed. When we do programming, we typically program the rules around a set of inputs. Machine learning, we give it a set of inputs and a set of outputs. And it tells us the rules. And that's actually the really interesting part around it. We actually get the rules back and actually then may not even know what rules we're looking for. If we're doing predictive analytics on financials, we know we're trying to predict maybe the future price of a stock. But in some cases, the data, we don't even know what trends we're looking for. Machine learning can come back and actually find the trends for us. So it's actually super powerful. There's some really interesting examples out there. These are four that I looked up. The first one around open AI basically created a machine learning program that can compose musical songs, pretty complex songs with multiple instruments, different every time. Really good songs, by the way, especially if you like classical. You can go try this out on Open AI's website. Really, really powerful that that's even possible. We've certainly seen Microsoft as well as other companies reach human parity around speech recognition. And so we're seeing that computers can recognize what's said in speech just as good as humans. Certainly self-driving cars are happening all over the place. I imagine we have a good way to go on figuring this out. That's a pretty complex behavior that the computer has to emulate. But people have made tremendous progress. There are some interesting examples, even all the way back to 2015 around self-driving cars. And certainly in the gaming world, we've seen the Google's example as well as a lot of other interesting examples using AI and machine learning around gaming to actually play games like a human would play. Very powerful stuff, lots of things that we can do with machine learning and AI. But what we found is it's fairly difficult. In general, training these models involves a pretty hefty amount of hardware. Some of these jobs may run for many, many hours, sometimes even days, to actually train a model and get a result out of it. We may have very large, complex data sets that we need to distribute across a bunch of machines to actually incorporate the training across those machines. Imagine that if we're doing machine learning on a cluster of, say, 10 machines, which is a very common exercise, how do we make sure that data's available to all of them? And suddenly we want to train it on our data center and then we want to trade it in a cloud provider's data center. How do we move that data around? How do we not have this cost of fortune for something like a GPU that does cost a hefty amount of money? So managing all of those resources and coming up with a plan around that is fairly difficult. We can turn to things like Kubernetes to help us. We hear a lot about Kubernetes. I think we all know what Kubernetes is. It's a container orchestration platform. It's certainly seen a huge amount of growth. And these are some of the reasons that came back from a study by the CNCF, why we saw that growth. Portability, scalability, and agility. And certainly those things apply to machine learning quite a bit. We want to be able to train in our data center, but I work with customers who their data center can't handle the training workloads that they're bringing to what they have. They want to be able to burst that training out into the cloud and bring it back. Kubernetes could allow them to do that. They can run Kubernetes in their data center and they can run it in the cloud and you frankly can run it anywhere. And certainly from a scalability standpoint, if we have a lot of CPU cores in the cloud or in our data center and we want to use them as effectively as possible, it would be nice to have some sort of scheduler and orchestration platform to take care of that for us. And certainly being able to do things like that that we do in DevOps in machine learning, I think ends up giving us a greater sense of agility around that. So we think about machine learning and what we need in a platform. A lot of people really think about building the model. If you look at TensorFlow and you start to learn TensorFlow, a lot of it is about building deep learning models, structuring a model, and maybe even training that model. But anybody who's worked with this knows that there's quite a bit more to it when you're doing machine learning. And this is just some of the typical steps that you'd go through to build and really use one of these models. There's a whole lot of effort around the data. The data is probably the most important part. You're gonna get data from wherever the sources are that you have it. You've gotta bring it in, maybe even do some level of analysis and transformation around that. The data may be coming in in various formats. It may not be prepared for the type of model that you're building. You have to have validation data as well. This is actually one of the biggest mistakes I think people make with machine learning is their model looks really good based on the validation set that they had. And maybe the validation set looks a lot like the training set. Well, then of course it would work out. In this Game of Thrones demo I've done here, that's exactly what I've done. I've trained the data against a set of pictures. I don't really have enough pictures to build something super interesting so it tends to work against the pictures that I started with. But we really have to have a better way to validate these models so that the results that we're getting that we trust are actually accurate. Certainly we wanna maybe do things like training at scale and then take that model and roll it out. And in the bottom part of this picture, it probably looks a little bit more like traditional CICD around web applications. We've got some model. We now need to deploy it maybe as a container to a set of a web platform of some kind and make it available to our web applications, our customers, some sort of public API, whatever it is that we're building. And so this is kind of what I would consider a typical machine learning workflow. We need a better way to do this to make it easier. Well, the Kubernetes community at KubeCon 2017, which at this point is at least a year and a half ago, introduced an open source project called Kubeflow. And Kubeflow's charter really effectively was to make it easy for everyone to develop, deploy, and manage portable distributed machine learning on Kubernetes. So entirely Kubernetes-based have a platform that adds that extra layer on top of Kubernetes orchestration that actually is focused on machine learning. It did, the Kubeflow name effectively started out around TensorFlow and Kubernetes. It's expanded well beyond that. The idea kind of, if you think about it this way, you have all these different environments. You have a cloud environment. Maybe you have a training environment in your data center. You have some experimentation desktop type environment. You can run Kubernetes across all of those. So I can get this consistent layer of orchestration no matter where I'm running. And then for that machine learning workflow, I can use Kubeflow. And then I can bring, because I have Kubernetes everywhere, I can easily bring Kubeflow across those and do all those same things that I wanna do around machine learning in a consistent way. I get these layers of abstraction in the same way that Kubernetes helped me abstract my applications and programs as containers from the hardware. I get Kubeflow to abstract my machine learning activity from the Kubernetes infrastructure. And in the long run, we really want people, like data scientists working on their code to not really be thinking about pods, Kubernetes deployments. We want them to be focused on the machine learning, but yet take advantage of what they can get out of something like Kubernetes. So effectively what they get, maybe we call it MLOps, cloud native ML. I'm not too caught up in what we call it, but I certainly wanna see some of those kind of benefits to make this easier. And I do think Kubeflow is a great platform for this. There are other open-source solutions out there for these kinds of things. I'll focus kind of the rest of this talk on Kubeflow. What do you get out of Kubeflow? It won't go through all these features kind of one by one. It, again, started as just something centered around TensorFlow. They started layering any other kinds of platforms like PyTorch, MXNet, layered in tooling for things like serving. I mentioned TensorFlow serving, but someone might wanna use Selden for their model serving. That's a platform or project that you can use. They're basically trying to bring in various other projects and make it a complete package that's integrated and easily managed to deliver these things. They added a pipeline feature, which I'm gonna show you in a demo here to basically give you an ability to actually create a pipeline that runs this entire workflow or that picture we saw earlier, let you orchestrate what that pipeline looks like and automate that and allow all those steps to actually run in your Kubernetes cluster. The pipelines that I mentioned, that one thing I wanna spend a little time on before I go into some of the demos, you can imagine if you have a few steps that run in some sort of order, we could write some big Python or Bash script that would run and execute all of these for us, but we don't really know that each of these run in the same language. It may be really difficult to cram all of that into one place. And we also need to pass information between each step. It would be great if we had some kind of pipeline system that would help us coordinate all of that. If we had just a simple way to write or script this kind of workflow and allow something to execute it for us. So that's what Kubeflow Pipelines are about. To me, this is really one of the best sort of solutions that they've added into Kubeflow. This idea that I can create this complex workflow with lots of different sort of paths through how things occur and I'm not writing a bunch of if, then logic and code. I'm really just saying, here's the order of the tasks that I'm working on and then containerize each of the steps as we go so that Kubernetes can end up executing all of this in the background. And that's the advantage of Kubernetes that sort of gets a little bit hidden in this picture, but all of these different activities we're doing and some of the things you'll see in the demo, they all get submitted to Kubernetes and execute. And Kubernetes is handling all the scheduling. I don't really have to think about, do I have resources available? I could turn on auto scaling in my cluster and ensure I always had resources at the right time as I executed all of these steps as I go. So again, that's the sort of high level picture. I actually think it's better to see some of this stuff in action. And so we're gonna do a little bit of a demo using this Game of Thrones web app. Again, there's the link if you wanna try it out. But basically we're gonna build the model using a TensorFlow application. We're gonna train it using this concept of a TF job in Kubernetes. We're gonna serve that. We're gonna do something around what we call hyperparameter optimization. And then I'll show you kind of the pipelines and even something like Jupyter assuming we have time. So, sound good? Any questions before I cut over to that? Okay, cool. Well hopefully you can see everything here. What I'm gonna start with is actually the code. So this project is actually, it's out on GitHub so it is actually out on that demo app. You'll be able to go out to GitHub and find this. But basically the training program, it's called Retrain. It's actually one of the TensorFlow samples. And the reason it's called Retrain is it takes the inception image classification model and retrains it against a set of pictures that I have here. So you can see basically for each of those characters I have a set of pictures that essentially labels them as, I know these pictures are accurately these people. And this program just basically turns through those and says, okay, if anything looks like these, I'll know that I'm gonna use that as the model and train that. The training program gets into a Docker container and basically I'm just using the TensorFlow based image. It can be done with a GPU or not a GPU. And I run this Retrain script with Python. And I'm a big fan of containers so if I were to do this locally I would run this as a container more than likely. But certainly if I'm gonna bring this to Kubernetes I need it as a container image. I'm gonna store it in some container image repository. Now I could just run that as a Kubernetes job but the place where Kubeflow comes in is they add these custom resource definitions. Kubernetes supports an ability to expand upon the default API and add your own custom resources. In this case there's this concept of a TF job. So in my case I only have a single worker so this is pretty straightforward. But TensorFlow jobs can look a lot different than this. I could have one parameter server and a series of workers, a distributed TensorFlow job that's running. It's nice to be able to define those here as one object or one TF job and let Kubernetes take over and execute that for me. It would be a lot more effort for me to spawn all these distributed jobs and manage it myself. I can just specify this as this resource. And effectively here's the container image that I'm using. Again it's stored in this case in our Azure container registry. I pass in a series of arguments and parameters. This is where I would specify if I wanted a GPU in this case if I wanted to do that. So let's actually go run one of these in my cluster. And by the way down here in the cluster you can see this is everything that's running. I've just got a kind of a watch of the pods and services. So I've run some of these things before. This is the web app that you guys are hitting when you're trying this out. If we take a look at TF jobs in my cluster you can see I don't have any actually at the moment. So we're gonna go and deploy one of these. I've got a little cheat sheet here so I don't have to type these. So as soon as I add that TF job first of all you'll see that we now have a TF job. It did if you noticed not at the bottom it actually spawned a worker down here. And that's this one here. And we can actually take a look at the logs. You can start to see the beginning activities that it's doing. It actually goes and downloads the inception model and goes and it'll start to work against the data that I've passed in. Now if you think about this you might be looking at this and thinking data scientists this looks a little tedious. Do I want my data scientists to learn YAML, Docker and sit here in front of an interface like this? And effectively we don't. I mean that doesn't sound like that's gonna help them progress forward. It actually sounds like it's gonna set them back. And really the goal is for them to be able to actually write the Python code and push that to some sort of get repository and let this kind of stuff actually happen and some sort of CI CD style automation and kick this thing off. And that's the way a picture that's happening but we're just kinda walking through it here at this interface. What Kubeflow also gives you is a dashboard as you might imagine. So if we come to the Kubeflow dashboard that's running in this cluster and we click on TF jobs we'll see all the TF jobs that are running in my cluster. In my case I only have this one that we just spawned here. It tells me it's currently running. I can take a look at the logs and see what's actually happening. And so someone could interact with this and actually manage everything probably in a little bit more of an effective way without learning Kubernetes. I could actually come here and manage jobs that I'm working with without really learning any of the YAML or any of the Kube CTL commands and things like that that Kubernetes requires. And so that's super helpful. What ends up happening here is once this is complete over in my case in Azure I'm using Azure storage to store all the output from this so that I can actually use it later. So for example, in this storage account I have I have a file share and basically the output is training summaries in latest model. So when I ran this earlier I ended up with the training summaries which is used in a tensor board so I can see the logs and the accuracy around the training that I ran. And then the latest model is where I've stored the model so that it can be used with TensorFlow serving. So that effectively when that training is done I don't need that container anymore. The output that I cared about is now in the cloud stored in storage and I can spawn other containers that can mount that and use it. And that's what we'll do later in this environment. But you can see what one of these tensor boards look like. This is standard TensorFlow stuff but we get one of these outputs here that shows us kind of the accuracy for training and validation that we see. This is what they typically should look like though in some ways this model is not very interesting. It's sort of is a little too accurate if you really think about it. So that's what the tensor board looks like. The next stage that we would want to do now that we have a trained model I'm not going to wait for that sample we ran to finish. Now that we have a trained model we want to serve this thing. And what this will look like in our YAML for serving is a standard Kubernetes service and deployment. The service handles the networking aspect and then the deployment is actually the container itself. And so for this deployment I basically used the standard TensorFlow serving container image. I pulled that from the TensorFlow repository. And notice here in my volume mount without knowing a ton about how we do Kubernetes volumes and storage and so forth I'm mounting that latest model folder. So that's going to take the model that we build out of training and mount it into TensorFlow serving and provide it as an API endpoint. And TensorFlow serving basically provides that wrapper so that I now have an API that I can talk to. And let's try that out, see what that looks like. And in my cluster here you can see I do have that serving running and the service is running at this IP address on ports 8500 and 8501. So TensorFlow serving actually supports HTTP and GRPC for those playing along at home. So basically if I do a curl against that I've got this picture of Daenerys Targaryen I've encoded it as a JSON object because this actually requires that. But it's just basically a simple curl call here against that. I'll get back a payload that actually tells me what the model's prediction is. This is exactly the same endpoint that the web app is hitting if you're trying out the web app. And notice the fourth one down it's 0.94 that's essentially 94% sure that this is the actual character that I've submitted. And again I did submit Daenerys Targaryen if we look the fourth one down that's actually the one that it predicted. Which of course is what we expect. And so when we tie these things together with TensorFlow serving and test it out that's how we're able to actually deliver this web application that we tried earlier. And we can pick another photo. I think it's kind of fun to pick Drogon if you're a Game of Thrones fan, the dragon. If we do an image classification here we do Drogon, notice we get back it doesn't know that it's Drogon but it picked Daenerys Targaryen. And as we know most kids look like their parents. So in that case it's actually fairly accurate. So I like that. So the next thing that's important to note is when we ran these models there's a series of parameters that drive the training exercise. The learning rate, the optimization model we use the number of steps or iterations through the training that we run. We call those hyperparameters in machine learning. And so what you often have to do is pick the right hyperparameters to get the right result. And what I found is when I first started working with machine learning I thought that there was a lot of science to picking these hyperparameters. I thought that there was people that really understood the math and they just thought .01 is the learning rate and they did a bunch of scribbles. And what I found was they're frankly just guessing. They really are. They're guessing at what these hyperparameters are and just trying it out. And it's a bit shocking to me but that's actually how it works. So what ends up happening is if we have a lot of different hyperparameters we end up having to run a lot of different combinations. Imagine if we have three different parameters every time we try maybe a combinations of 10 we could be running hundreds of various trainings using a lot of resources to get the best result. And so what we end up doing is we can sort of provide better ways to optimize the right result from that. Now the way I did it here is I used something called Helm. Helm is a way to package up jobs that we submit to Kubernetes and provide parameters for them. And it's a good fit for something like hyperparameter optimization. What I have here is my two hyperparameters for this training that we ran are learning rate and training batch size. And I've given it three possibilities for each. And so what this will do is it'll go out if we looked at the underlying Helm chart which isn't worth learning here today it would go out and run all the combinations of these. For those doing the math at home how many is that? How many do we end up with? Nine, yep. So we end up with nine different containers. And again if I'm gonna submit this to Kubernetes and allow it to run these nine containers it's sure great that I don't have to think about what nodes have what resources available where can I find resources for these? And potentially see them all running at once and decide to kill a few of them because they're not really leading to the right kind of results that we want. And so this actually helps me do this. And again underneath the covers my TF job looks like this. There's a bunch of extra little funky code in here that will essentially create that and create all those nine images for me. The other thing that I'm doing here it's kind of interesting is I'm using this thing called the virtual kubelet. So in my cluster over here if you saw I had a number of different pods that are running but if we take a look at my cluster and the number of nodes I have I actually have five nodes and this thing called a virtual node. The five nodes actually represent virtual machines so I'm paying for them all the time. Whenever they're running I pay that my cloud provider in this case Microsoft Azure to run those. But when I'm spawning these hyper parameters do I really want to spin up? Maybe I need nine nodes to run that. Do I really want to scale this out to nine nodes and then pay for them and then maybe I forget and leave them running or things like that. So the virtual node is not a virtual machine at all it's truly a virtual node it's actually just software. And if I submit things to the virtual node in this case it'll spawn them as what we call container instances in the cloud. And they actually will just be containers in the cloud and I only pay for them while they run. And so if we go over and take a look at my Azure subscription you can see I don't have any container instances running and so we're gonna run this thing and actually see how this thing kicks everything off. But the key is in this node selector again if you're not too deeply familiar with Kubernetes doesn't matter too much but we said we wanted the nodes that are labeled virtual kubelet and that's what represents that virtual node and that's what really drives and controls this. And so when I do this helm install here I basically say install this chart use the values that I gave. I've got a crumb under my return key if you have a Mac you know how that feels. So what this is gonna do is again go spawn all of those pods for me into the cluster it'll take it a few seconds or so to kick off but instead of assigning those to one of the nodes in my cluster it's gonna put them on the virtual node and they'll all go run as container instances. And then I can actually again still capture all that data into my cloud storage and actually look at things like the tensor board and figure out which of the values actually was the right result and actually decide and then use those going forward. And actually I'll show you the tensor board here while we're waiting for those to start because it looks a bit crazy. I don't know if this is the best way to decide the result because it looks a little bit too hard to follow but you can see all of those different jobs are run and I can start to narrow it down and know which ones map to which values and so forth. So you can see them all starting here. You can see some of them kicking off and creating if we go over to the container instances view here you can see a few of them have actually kicked off and started. The beauty of this again is while these are running I'll pay for them as soon as they stop as soon as they're done I stop paying for them. It seems perfect for machine learning and that's kind of the idea around this. If I have a big Kubernetes cluster and I'm willing to pay for those VMs and if it's a good fit then great and do it that way and we have that choice and flexibility. The other thing that I think is interesting and perhaps in the long run a better way to do this is something called Catib. So again, projects that have been added to Kubeflow this actually came out of some work done by Google to essentially do this exact thing. Let's figure out a better way to optimize hyper parameters for a training exercise and let's take the same one that we're looking at here. In this case I've added an additional hyper parameter the number of training steps and basically we give it a feasible space or a range of options for each of those parameters. So you can see for learning rate I've said it's either between 0.001 and 1.0 training batch size, learning steps. I've given it a big range and it's actually gonna go out and start to spawn all kinds of pods with combinations of these. But then it's actually gonna use its own learning to come back with some validation and try to figure out on its own the right output. Now that's actually super powerful because we're effectively using machine learning to figure out the right set of outputs and optimize them. And this is pretty powerful technology. I don't entirely have this working for real so that's what it would look like if you set it up but the random example actually shows it pretty well if you take a look at the study. This is just kind of like the hello world for Catib but I think this is pretty interesting. It basically comes back with the bunch of workers that it spawned and you can see accuracy and validation accuracy come back for each one. So each one of these workers was a Kubernetes pod with a container running in it and so it spawned all of those and actually tried to analyze that and come back with the right result. And again, Kubernetes is taking care of where do I have the resources available? I could use the virtual node here if I wanted or it could actually spawn them and run them and orchestrate all that for me but as a machine learning data scientist this is actually what I care about and I'll be able to pick at the end if validation accuracy is most important then apparently this set of hyper parameters is the right choice and so that's pretty cool. All right, next thing I wanna show you is pipelines. So again, this training actually Game of Thrones has a bit more steps to it than just kind of training the model and serving it and so it could look like something like this. We could pre-process the data, maybe adapt some of the image files into some format that my model can understand. I run some kind of training against it like we ran, I can score the model and then in my case I'm just for the sake of the demo I'm converting the model into a couple of formats that can be used for different kinds of purposes and so I've converted it to TF Lite and Onyx maybe for using it in some mobile or lightweight environments and then I've got a step that copies it some other place and whatever your pipeline steps would look like you actually can go in and define them and the way that you define them is actually in code. This could look a bit daunting for the first time but basically Kubeflow Pipelines uses a Python based definition language and it's fairly simple. I mean effectively each of these steps you define a container image, a set of arguments and you give it a name and so this is essentially container operation and then the steps you can say things like after the previous step run this one and effectively allows you to create whatever workflow you want. You can add logic here to say if this output happens do this but you do it in a language that's a little bit more discernible it's just based on spawning these different kind of steps. We can do things like define the input parameters that we want, the storage characteristics that we need for each of the container images. We basically upload this into Kubeflow Pipelines and we end up with exactly this picture and then when we execute them they end up looking like this. And so you can see they're all green because they all ran. If we take a look at training for example and clicked on logs we see all of that output from that training exercise. We see that it ran in this case 2,500 learning steps. I came out with some level of accuracy and maybe have some scoring exercise that tells me my average accuracy for this model. Whatever those steps are for you you can plug that in here and customize a Kubeflow Pipeline to get what you need. And so this is fairly powerful we can easily run one of these it's actually so simple to do. We give it a run name. We pick which pipeline we're gonna run. You can create different experiments which effectively groupings of different exercises that you're running. And this is where I provide the number of steps. Maybe I wanna do 5,000 this time and I wanna do a learning rate of .01 and I click start and off it'll go. And again these will actually show up over in Kubernetes as containers. And actually if we really wanna see them they'll actually end up in the Kubeflow namespace. You'll actually see kind of all these steps running. And somewhere in there for that one that we just kicked off one of them will appear here and start and each of those steps will run. And again Kubernetes is orchestrating all of that for us. Super powerful. The last thing I wanna show you before we wrap up is the notebooks. So not the notebook, the romantic movie of course that we all know and love. Jupyter notebooks really drive quite a bit of our work in machine learning. Before we really get to the point where we're ready to orchestrate end to end pipeline and process we really need to experiment and iterate on what we're building. And a lot of that work is typically done in Jupyter notebooks and that's a pretty familiar term to most people. But actually creating environments for a lot of people to use Jupyter notebooks starts to become a bit cumbersome. They maybe do need a GPU and they only need that notebook for a short period of time. And so we provide this notebook platform in Kubeflow where someone can first of all come in and say I want a new server, they pick whatever image they want and these can be custom images that the administrator defines, the memory in CPU, storage and so forth and they spawn one of these and then they'll get a notebook which actually will look just like this previous screen. But in the background, again it's running as a container in Kubernetes. So then I can say connect and as the data scientist I get this familiar interface and I can go in and do some sort of additional fun Game of Thrones analytics around this particular one which I pulled off of open source somewhere that someone did, they basically are trying to analyze which family had the most success in all the different battles because this is what people do with machine learning and all their free time. But this is actually pretty fascinating, the amount of analytics these people have done around Game of Thrones battles and so forth. But again, if someone needs a Jupyter notebook to do this kind of work, pretty simple to request one and make it available and get to work on it. And that's kind of the idea behind Kubeflow's implementation of notebooks. So that's it from a demo standpoint. I am gonna take a couple of questions. I'll kind of wrap up with that. We have at least a few minutes and I don't remember if it's lunch or not. But I see a question in the back, ask it away. So the question is, I did this in Azure, could you do this on any cloud? Absolutely, the place that you have to customize, maybe a couple of places you have to customize, the main place is storage. So that's the area in this that varies from cloud to cloud, whether it's private cloud or a public cloud like Azure. So you have to adapt that part of it. But in that pipeline, you saw a part where I kind of defined that in code. I can actually make that parameters of that program and make it pretty easy to say, run this one here, run this one here. The other thing is I stored my container images in the cloud, that's probably not a big deal. There's container repositories everywhere. But I said at the beginning early on, one of the huge benefits of Kubeflow and Kubernetes is that portability cloud to cloud? I wouldn't call it cut and paste magic, I just pick it up and run it here. But the amount of time to adapt it from cloud to cloud is fairly small. Question here. You're not supposed to catch details like that. No, he asked basically, in the first training example, the images were stored in the container and then later they weren't. Yeah, that's very observant. Some of these things are things that you do in a demo when you're doing them up in front of people and you only have 25 or whatever minutes. If you copy them across from remote storage, it's not very efficient. It may actually make sense to store data super close to your container. There's probably a lot of other ways to make that efficient. From a demo point of view, you can imagine the things that you do. It probably doesn't make sense to store all the data in the container. It makes the container super big. And so that's not really the way you would do it, but it's certainly good for demo purposes. But yeah, good question. Back here. Yeah, I guess the question is, can you do that parameter optimization in notebook? Yeah, I mean the notebook gives you the ability to do kind of anything you want. Now in this example, the way the notebook works is I have a Jupyter notebook, that's my container. So it actually runs my notebook in a container. And I can do a lot of different parameter optimizations within that, but if I actually want to spawn a bunch of them in parallel, then I probably can't spawn that from my own single notebook. I probably don't have that power as a user. And so that's where it might be interesting to try these other methods. And that kateeb is an interesting idea. It would still have someone need some knowledge of kateeb. The reason my Game of Thrones model isn't quite ready to do that yet is I need to provide the output in the right format for kateeb to pick it up and decide was it good or not. But yeah, people could certainly manually run a bunch of examples and come to the right idea. And that's probably what they do early on before they come to the cloud and test out the final set of good guesses. The feasible space they might determine on their own. One more question, and then I think I have to get out. But I couldn't hear. So the question's about ETL and adding steps around bringing in data. I think that's a fair characterization of the question. Yeah, absolutely. In fact, Kubeflow, I mean, you can certainly do things manually, just importing data, but Kubeflow does provide ways to use common tools that people use for this kind of thing, like Pachyderm is an open source project and a tool that people use for data manipulation, ingestion, all kinds of powerful capabilities around Pachyderm. But they help you deploy Pachyderm in your Kubernetes cluster and take advantage of it for steps like that. There's probably a bunch of other ways you might do ETL around machine learning data, and you can incorporate those steps into either the pipeline or use tools that you get. The idea is if something's not there and you want to use an open source project, Kubeflow is open source, so bring it to that project. But so far, they've done a pretty good job bringing in most of the common tools. It's at release .6. I was actually using version .5 in my demo here. It's not at release version one, but version .6 actually came out. So they're getting there pretty quickly on Kubeflow. So again, I'll take questions as we wrap up. I think we might have another speaker or not, but I appreciate everybody's attention. I hope it was a good, valuable talk, and thanks a lot. Enjoy the rest of the talk. Thank you.