 hey everyone who very loud is there a volume on here my mic is not on this is not bad I'm very very very loud is there any way I can turn this down you know what they say no know yourself just hold it out here how about this good everyone have a great lunch barbecue fried chicken I don't know the lines are brutal anyhow people on the live cast probably don't want to hear about that hi I'm David Ron check I'm product manager at Google I until recently was a product manager on the Kubernetes project and now we're working on something brand new which we're really excited about this is Vish Vish would you like to introduce yourself how about now I guess everyone I'm Vish I work at Google I work on Kubernetes I've been working with goodness for the last three to four years more recently I've been focusing on like enabling high performance applications specifically machine learning with Kubernetes in a portable fashion and also like looking at more of the application stack and making empowering Kubernetes users essentially yep so like I said we are here to talk about something brand new specifically Kubernetes and machine learning which we think is one of the best opportunities to really extend and use the functionality that Kubernetes provides under the hood and so whenever you approach something new you always want to start with a question what is machine learning we're not going to do one of these like look up in the dictionary things the definition of machine learning nope so the idea is first you start with a question and in this case I don't know how many people have bought a house recently or sold a house or something like that but you may say well how do you determine what your house is worth you come up with some metrics in this case square footage and overall house price and you start drawing some points on a graph and you start and say okay well here's a point on the graph maybe I'll add some more maybe I'll add a lot more and then all of a sudden it looks like a pattern and you're like this is magic then you answer your question how much should you sell your house for well my house is this big therefore I go up to the chart I go over and I get a number presto machine learning right except congrats your machine learning expert and on the way out the door you can't get a certificate with certifying that but things can get complicated right you could have nonlinear groupings based on neighborhoods and environment and crime rate and so on and so forth things can be multi-dimensional here obviously I had something incredibly boring just two features in a linear regression or things could change over time and so as a at a high level what we think machine learning is is when it's a way to solve problems without explicitly knowing how to create the solution okay do we believe at Google this is like our biggest passion in the world I don't know if you know we have some large data centers and they're pretty expensive to run the number one price ends up being power most of the time and we have some pretty smart data center engineers and we said hey data center engineers do you think you could wire up ml to you know the way that we run our data centers and improve things so they did and for those that don't know the the benchmark for m excuse me for data centers is PUE which stands for power usage effectiveness you ideally want it to be one one watt goes in one goodness comes out nobody ever runs at one so at Google it looks like this this is how our ML or excuse me our data center efficient power usage efficiency was working we literally hooked up the data center to this ML thing that ran the water and fans and so on and so forth and it looks like this and then we turned it off and it looked like that it literally was like flipping a switch and that's the power machine learning and because of that we think machine learning should be for everyone that would be a great thing except ml needs DevOps and by that what we mean is DevOps needs composability portability and scalability and I'll get to these in just a second there anyway we could turn this down just a little bit even holding it away from my face it's feedback that's fine so by DevOps like I said by composability I mean the following a lot of people look at ML and they say well great you know I'm gonna go I'm gonna download PyTorch or TensorFlow or a CNTK or something like that and I'm gonna go build a model who thinks they are done when you finished building your model no hands go up good you're all very smart the reality is that real ML looks a lot more like this you got a whole bunch of steps and each of those steps are individual decisions and services and so on that you need in order to get an answer that's what we mean by composability being able to pick and choose those various items the second is a quote by Joe beta of literally a few weeks ago every time you move from one environment to another you need portability unit system that understands how to move with making as few changes as possible because every time you have a change as he says here that's a likelihood of an outage so we go back to this path over here and you look at all those services and each one of these may be on a different system and every one of those systems has an opportunity for change conflict something going wrong in particular in the ML space you see here like you'll have containers those containers on premises may run against CPUs but then when they move to the cloud these GC GPUs of various sides they could be a six or FPGAs or you name it even ML frameworks often operate differently in different environments and you need that stability as well and then finally scalability I think you're going the opposite direction on the volume is there any like I'm now like about two feet from my face down here okay so as far as scalability is concerned same deal you often want to have the highly scalable so you you start your ML and one minute later you realize you're not going to you know converge for the next 450 years maybe you'd like to add some more machines that's a perfect opportunity for scalability so what's really good at composability portability and scalability yes that should be clear it's Kubernetes right like it's already doing all of this and we just like we're just going to talk about like how it helps with machine learning specifically here so here's like a really high-level picture right like I want to give this message across and it may be obvious to some of you but I just want to like drill out a little bit more because I want people to think of Kimberl's this way in that like Kimberl is abstracting out a whole lot of things right it's abstracting out your your your cloud whether it's like a public cloud or a private cloud you don't have to worry about that because there are people worrying about it and so you have like separation of responsibility and then you have a variety of hardware resources that that you might want to consume you may not want to consume all you may want to consume some but you may want to like extend it to consume something else so Kimberl's abstracts all of that and then you have a whole bunch of like operating systems and like kernels and versions and so on it's like that's another whole beast and like host demon and the Kimberl's provides you primitives where you can have separate teams managing that and still like Kimberl's continues to work for you and it like shields your end users in this case like data scientists or say your ML practitioners from all those churn that's going on right like they don't want to know all of that why should they know that and then like you have the core Kimberl's itself which I'm not going to get into in this talk but hopefully like people have some idea of what Kimberl's is and in addition to that you also get a whole bunch of other primitives that are very important for managing infrastructure at scale like you get primitives like authentication and authorization with our back you get like built-in monitoring that's extensible to almost all of your applications you get like built-in logging Coda namespaces so these are like common set of functionalities that you can apply for a whole bunch of workloads right so now given that like you've got into this level that you have this common infrastructure that you can deploy on now what are you going to do like the more interesting part is like running the real applications right like that's that's the reason why all this infrastructure even exists so you can go and like choose your application you're not like restricting you to some specific set of applications you can go choose your own like distributed storage you can go choose your own database or your block store you can go choose your own like I don't know a map reduce solution or you can go choose your own like ML training operator or you can you can choose your serving service curious doesn't restrict you but the good part is it's pretty easy because like all the rest of the problems are solved so you can have like individual groups of people focusing on just that and they don't have to worry about the rest of the infrastructure that's pretty cool right like you have you have a large organization and now you know how they can all like cooperate with each other and still add value to the rest of the direct end consumers so in this case the end consumers are most likely going to be data scientists and they are going to focus on their workflows which is like say Jupiter or it's like 10th floor or cafe or whatever it is they only think about that they don't think about the rest of the infrastructure right and that's sort of the Google model and like it's totally possible with Kubernetes and we are starting the journey now so like specifically Kubernetes already supports accelerators in an extensible manner in that like you don't have to wait for upstream to support everything GPUs are already supported and like there's more and more enhancements happening along that in 2018 where we'll be having like performance optimizations and all of that funness and there's also going to be support for FGAs and like high performance snakes and so on so upstream is like enabling all sorts of extensions that unblocks you from doing what are awesome things you want to do and in addition to that there are like a lot of application deployment parameters already in there there's like jobs and there's like staple sets which lets you like describe some of your common machine learning workflows and so you don't have to like go and rebuild all of this you can just like you can become productive in a day or two with a pre-existing Kubernetes cluster rather than like going and buying a machine from like from store nearby and like starting to install drivers and all then spending like a month and then now you're starting to go do the cool thing right you don't have to do that like there's already a solved problem and it scales to like thousands of nodes so if you get to the point where you're going to run many experiments at the same time then you can actually do it with Kubernetes and and if you are running in a public cloud for example you you don't have the provision for your peak utilization like you just go use your single machine and when you're ready to like go and launch say a thousand experiments the resources there for you and Kubernetes still works you don't have to shift to a different platform and container packaging is already a standard right like you don't again you don't have to reinvent the wheels there so all these things are common problem for machine learning as well and Kubernetes would solve that for you and there's actually more right like I hinted a little bit on like auto scaling so auto scaling is like working in like most Kubernetes managed services today we have support for like priority and eviction but I like say if you have an intern and you it's intern hiring season and you have like 100 interns in your company each one is like doing an ML experiment but you have the shared resource and you have a researcher who's having a conference deadline and they have to get there get their training and like get their models built are you going to have the interns come and like ruin their conference no right like do you how you going to create a separate system for that why Kubernetes provides all the parameters for you already where you can set priorities for individual users and and make sure that like a low priority user does not go ahead and like take away all the other resources that are meant to be consumed by more important people or more important things that are happening within your organization we also I'm going to I'm just going to state this but if you have more questions about this I'll talk I'll be I can I'm happy to like explain further after the talk Kubernetes is assuming adequate network bandwidth so this is sort of a fundamental distributed system design assumption that Google has evolved to over time so like our general recommendation is like old person your network don't try to solve like locality issues too much so that's the kind of model like upstream is moving and moving towards and it was already in that direction but we still support data gravity for scenarios where you don't just cannot play the old person that puts and we also provide like a whole bunch of primitives like labels and and software drivers and like device plugins and so on which lets you like manage the life cycle of the hardware itself and under software that goes on to power that hardware like for example the immediate drivers you can manage all of that using Kubernetes right so it's like this because it's like one common API on top of which you can layer everything and so management becomes much simpler and you don't have to learn new systems on manage new systems through the same boring job right so that's the main message here but right you you want to use machine learning on Kubernetes they talked about all the awesome infrastructure stuff that like curious can provide but like I said earlier data scientists or ML practitioners they don't want to see that infrastructure right like they don't want to learn the Kubernetes API like that's that's not like that's not the specialization right like that's not the thing that's most fun for them so are you going to like tell them hey go learn Kubernetes go learn Docker go learn containers and then like now now we can start doing whatever you want to do like look at the sheer number of things that you have to learn in order to start becoming effective in using Kubernetes for machine learning it's like I mean it's it's a fire hose right like you probably have to do like ten courses before you can like even get to the point of like doing some some some interesting machine learning and production you don't want people to do that right so wait what do you do now so with that we are proud to announce Kubeflow we are not talking about this incredibly broadly just you here and those on the stream we're going to be talking a lot about a lot more shortly but the summary is we want to make it easy for everyone to learn deploy and manage portable distributed ML on Kubernetes everywhere just like Kubernetes we very very strongly believe that this should work everywhere in the world not you know on-premises on your laptop on your cloud provide a choice so you know we were we don't have a ton of time but the idea is we very very much embrace the philosophy that we had said earlier around composability portability and scalability Kubernetes and Kubeflow should provide this out of the box speaking of the box inside the box at the start and we are just getting started here we give you a Jupiter hub thank you very much to the folks at Jupiter I saw a UV back there a TensorFlow training controller which auto scales depend not it doesn't auto scale it deploy scales based on what you have available to it so if you have CPU or GPU or multi GPU you can configure it very easily TensorFlow serving deployment and the wiring and service endpoints between them to make it very easy to get going out of the box and what's in the box you saw this earlier it is just these steps in it over time we would love to expand it we'd love to make it each step deeper offer a lot more options but that's up to the community and the people who want to take it there using Kubeflow will ideally look like this you do kube control kube control and that's how you pronounce it kube control apply components and it will create it on your laptop you do the exact same command on your mini cluster you do the exact same command on your larger cluster and it will work and we'll skip over just to get to the demo the interesting part yes yes I'm gonna have a hard time saying kube control well you could say it wrong I mean that's fine all right so I'm gonna try to do a live demo I'm not sure how this is exactly gonna turn out but let's see hopefully the demo got to the thing so what I have here is a mini cube cluster so you can make sure that it's obvious that it's a mini cube cluster I'm gonna what I'm gonna I already have the kube flow repository cloned so I mean I could probably like show the kube flow repository for you just to give you an idea of like what we mean by kube flow here we go so this is a repository I've cloned this I'm at the base directory what I'm going to do is I'm gonna apply all the components in there okay I need to change there we go so I just went ahead and like deployed kube flow the next step is like I'm going to show what pods exist in here so what we have here is like we have a TensorFlow service and then we have a Jupyter hub and then we have a TensorFlow operator I'm gonna go and like try to show how we are going to use each of these things in just a bit but I just want to show this is what is in the box as part of kube flow as of now right the next step is like I'm going to open up open up Jupyter hub oh god this echoes like so let me also like list the services that are here and then so I'm going to use a mini cube command for opening up service which is mini cube service and then pointed at a load balancer and so here we go I I'm opening up well I mean cool as these awesome proxies that get in the way of demos let me turn off those proxies and then that should work now so here we go so you have Jupyter hub just working out of box for you right like and by default we have like a dummy authentication system which you can like replace with your own organization's favorite authentication system be it like Google author or GitHub and so on so let's say I'm just going to log in I'm going to start my server I'm going to we have two options now with TensorFlow you have a CPU and a GPU image since this mini cube I don't really have that much of resource on my teeny tiny Mac so I'm going to like stick to the minimum amount of resources I need for this demo I don't need GPUs I'm just going to go ahead and like launch I'm going to go ahead and like launch my notebook so you have a notebook here which has a few prebuilt models that's part of the intense flow garden and it's just there to like show that you can get going pretty easily and that like you have all the tooling you need in order to in order to like start start working with TensorFlow right and to demonstrate this what I'm going to do is like I'm going to I'm going to run a benchmark there's a specific reason why I'm choosing a benchmark for this demo it's because if I were to like go ahead and like try to train a real model like inception or rest that it's going to take quite a while and it's probably very boring for you so what I'm going to do instead is like choose a benchmark that shows them like really nice numbers and also like show the fact that you can iterate locally on your laptop and when you move to the cloud you get this like infinite bandwidth whether the cloud is like a public cloud a private cloud the same workflow the workflow remains the same but you get like much more powerful hardware and much more scalability all within like the confines of the same interface so I'm going to go ahead and like start I'm going to go ahead and like start a benchmark here and the benchmark is very simple all that the benchmark is doing is like it's going at it like trying to run a rest net training model if you're not familiar with rest net that's you can totally forget about it just consider it to be like an image image model where it can predict what's in an image and and here I'm training with like synthetic data like pulling data from the cloud in a demo environment is again like pretty boring so the synthetic data here is all like part of the standard TensorFlow benchmarks and what's happening here is like behind the scenes it's it has generated some some synthetic data and it's starting to train and you can see here that like the training rate is like literally 0.5 images per second so it's pretty slow because I got like a laptop which is not even like connected to power so all sorts of like awesome things are happening behind the scenes to keep the demo going so the training rate is like pretty low here so this is not much fun right but there is value here because if you want to just like iterate on your model and like just play with the very basic architecture you don't have to be in the cloud you can be in your airplane or you can be at the comforts of your home and your couch and you can still like do your work and when you get back to work the next day which is what I'm going to show now which is what I'm going to show next you can let's say you let's say you have a cumulus cluster already running in some cloud in this case I'm conveniently using Google cloud because that's the thing that I'm familiar with and it's pretty awesome so here we go so I have a I have a cluster already set up in Google cloud and I have like GPUs provisioned as part of this cluster and I'm going to use the same cube control workflow right just to prove that like it's pretty portable and I have these five nodes which have GPUs in a GKE cluster and like drivers and everything are pre-installed right like you don't you don't have to like worry about that so I'm going to do the same cube control apply here right I'm literally the same directory on my laptop and I did keep control apply the next step is like I'm going to go to my I'm going to go list the services here and I have the same TensorFlow Jupyter hub load balancer and as you can see here the external IP hasn't been allocated yet so we got to like wait for the external IP to be allocated while that happens I'm also going to explain to you about the operator which is the other part of this right like Jupyter gives you this interactive workflow where you can train interactively you can pull down data you can like do some data manipulation but that's not great when you want to take it to production right like when you want to take it to production you want like auditing you want to like checkpoint your workflows you want to know you want to make it reproducible and you probably want to have like some CACD pipelines especially if you want to do like ETL or batch or like online training and all those kinds of like interesting scenarios right and you can use Kubernetes for that as well and and there's an interesting project for this which is part of TensorFlow it's coincidentally its name TensorFlow slash kates and what it does is like it's basically a operator if you're familiar with the corus operator terminology and what this operator does is like it gives you a a Kubernetes style declarative API where you can go ahead and like express what sort of training you want to do right like whether it's like training on your laptop with a single CPU or this like training in the cloud with like many many GPUs it's the same declarative interface all you do is like make sure that your TensorFlow runtime is available within a Docker image and we have some sample Docker images for that and then you bring in your own like TensorFlow code like the code that you have written on Jupiter just transfer it like maybe like load it load it on to like say your object store and pull it down or make it part of your container and then you use this declarative API to go ahead and like tell Kubernetes tell this operator that like go ahead and like start TensorFlow training right and like do the training and then go ahead and like push the model somewhere else and that's all like automated for you and you also get like TensorFlow code integrated with that so I'm going to go ahead and like deploy one of those models as well I'm not sure that you guys are able to see this it's probably pretty hard to read during the demo I'm going to run through this but like please go ahead and like take a look at this after the demo the thing that I want to highlight here is like I'm going to deploy a distributor TensorFlow which is considered to be kind of hard and that's one of the reasons like people keep playing with like larger and larger boxes so here you have like distributor TensorFlow where you are configuring a master which does not have GPUs and then you're configuring three workers which has GPUs and then you have a parameter server right and you're expressing all of this in a declarative API and I'm running the same benchmark right the one that I ran on the laptop I'm going to run the same benchmark and I'm also like go going to go ahead and like deploy this in my GKE cluster let's see Cube CDL apply dash I'm going to go ahead and do this so I'm going to move back to my work cluster here I'm going to go look at the workloads view and what we see here is like we see we see a bunch of pods being created right so I went ahead and like I went ahead and like applied a manifest and that manifest ended up creating that manifest ended up creating a custom operator specification and this object and the controller watching this object went ahead and created created the TensorFlow entities that was expressed through the declarative API so you have a master running here you have parameter server and you have a bunch of workers running here right and these workers are going to take some time in that like they're going to take that info they're going to go and like warm themselves up like initialized TensorFlow one and start training but until then like let's switch back to and switch back to our original yeah here we go so let's switch back to the Jupyter site where we're doing interactive training and now let's say that I have come back to work I did something at home I've done some training now I want to like actually take you to the next level where like I want to continue doing some interactive training but I want to like start processing a lot more data and at this point I need more memory I need more CPUs I need more GPUs right so at this stage okay so here we go so I'm right now I'm in cloud I went through the same process of like setting up Jupyter but like Jupyter caches my identity so it put me in a notebook because the notebook already exists right I'm going to go ahead and like run the exact same benchmark but this time I'm going to run it with GPUs so I've changed the I've changed the command line the benchmark command line here a little bit to make it work with GPUs it's mostly about removing a bunch of other options that made it work with CPUs but the point that I wanted want you to take here is that you can see like in just a second you'll see that the the training rate is like so much faster right like this is not something Kubernetes is doing it's like it's just a power of like GPUs and the optimization between TensorFlow GPUs but the point to take away here is that like your stuff is portable right like everything is portable your workflows portable everything remains the same and now here you go you move from like point for images to like 200 plus images and when you move to the distributed setting which which we are seeing with the operator your your productivity level improves even further let's see if the if the distributed job actually completed if not we'll probably have to wait for that so I'm going to go ahead and drill into the pods view and I'm going to look at the logs for the pod okay so the command did run successfully and it's spamming me that it ran successfully but the training rate here is like 240 images per second right so so you can start scaling linearly and you don't have to worry about like Neuma and like with them on the same PCI socket like screw all that you don't have to know all that like just assume that you have some finite amount some some some usable amount of network bandwidth then just keep scaling like improve your developer productivity right and so this is not the only part the other part that we want to show is like let's say the whole point of this demo was like I want to know if it's a hot dog or not right for folks who know what hot dog or not means let's say that I'm being very naive but like I haven't I don't know any machine learning right like all I do is like I go ahead I try to find I try to find a picture in this case it's clearly not a hot dog but I built a stupid system which just looks at the name of the file and it says that it's hot dog or not right like clearly that's not a great system no we want to make it smarter and so let's say let's apply some machine learning right in this case I'm using like pre-existing models that have been trained for for doing this kind of task and if you're curious it's like then it's an inception model so I'm gonna do the same thing right like I'm gonna go ahead and like deploy I'm going to ask it like is this a hot dog with a real hot dog image and let's see what it says is it one of the same thing maybe the service is still coming up or is this a demo fail cross likely a demo fail do I have time to do like debugging I don't know I think we're all I guess I have two more minutes so let's see so let me show what's happening behind the scenes while the demo may come up so there's a model service here right like the model service what's what's going on in the model service like it stands for serving and it's preset up for you where with the model server you can configure it to to make it point at like your own model in this case like for the demo we're making it point at a model that is hosted on GCS for us so it's it's being pointed at a model that's that's stored in a GCS bucket but imagine that like you have serving available and then you have your your your Jupyter workflow available you built a model right and the model is like stored on some up some object store that shaft storage or like your local laptop all you have to do is like go ahead and like change this deployment and say okay I'm changing the command line arguments of that and making it point at my new model and then you got serving going right you don't have to think about that just have to do that one small thing and that's it let's see if let's see if you can give this another try just one more time David this did work let's see if the smart server is actually a sponsor it's saying it's it's a hard dog right like we all know it's hard dog but it's saying that wait I might be tricking you let's see if I'm tricking you right like what if I what if I throw at it something that's not a hard dog and it says so the thing is you can do awesome things without having to like reinvent the wheels and without having to like know everything from scratch so yeah okay so we are incredibly out of time but I will be very quick so oh shoot so we are incredibly out of time so that's it you took something and you deployed in three places but but to be honest that the message here is if you're a data scientist you didn't have to think about any of that you didn't have to think about Kubernetes or learn it you didn't have to do any special deployments we took care of it for you we expanded it for you you began to really decouple the infrastructure from what the data scientists actually do and so the answer is yes for now we are just getting started we would love for you all to come and help contribute we have an open source repo github whack Google whack Cube flow thank you we we really are just getting started we want this to be a community where we come together we know that almost everyone here anyone doing ML has written some custom bespoke solution for their team we would love to help replace that so you can focus on the hard stuff and and you can throw away the custom stuff you written or augmented and upstream it or whatever it works at minimum just share your experience yeah what you do every day what works and what doesn't work that is very valuable so that's it thank you so much for the time oh I guess well sorry we do have two minutes for Q&A if you want yeah we have like five more minutes I guess oh well we'll hang around be here but and we technically have three minutes for Q&A the next one's at 245 oh oh oh I yes I don't know where it is I'm sorry I think 545 430 430 I have no idea I have no idea sorry look at the schedule there's an ML salon yes how would you set the node to be GPU capable is this so on on a laptop like maybe let's ignore the laptop on a workstation there's one part pending which is enabling installation of drivers with device plugins we're working on that so we're working with upstream community essentially on that so that part is pending but if you move to any cloud providers like any managed computers it's already there you don't have to you don't have to think about it right so we're working send us a note if you're having problems with it well we'll find you the right person in the future yes very soon the idea here and again we want to be cloud neutral here the question was will we support TPUs which is for those that don't know that's just Google's custom TV ML chip as you know many clouds are investing in various custom ML chips we want to we want to create a real abstraction between this and and whatever cloud provider would like to participate in Kubeflow and help us surface those custom chips we are more than happy to include it yeah it's like TPU is like 5% of all of this right in that like you're just changing one line of code literally for TPU exactly everything else is the same so everything else is like generic nice pipe pipe or shirt there for anyone any other questions comments okay thank you very much you were we'll wait around and we'll be outside thank you