 Hello Thank you very much for coming to my talk. This is machine learning with kubernetes. I'm very excited to be the first presenter here at kubernetes day It's great to be here in Boston So who am I my name is Christopher. I'm Luciano. I'm part of the I'm a part of the open-source technology team At IBM digital business group I'm blessed with being able to work on kubernetes full-time. I mostly concentrate on sig node and sig network My github handle is on there in my Twitter ID. Feel free to tweet during the keynote But please make sure to add the underscore see I'm Luciano on Twitter founded some junk at service He's a lot more successful than I am and he doesn't need the PR But I do so make sure to keep the underscore on there. I have a very unsuccessful blog for some reason I'm hoping to fix that pretty soon. So check back later on. I'll be posting some more stuff on there So a lot of talks talk about some of the more technical things how to set up things how to Twist some of the knobs how to tune things but not a lot of talks talk about the who and the why So I've coined a series of talks by myself Was from calling the what where and why series So why do I want to use these types of technologies before actually using them? When we talk about machine learning, it's very important to note what we're trying to accomplish here with machine learning, it's very important to Get the most accurate results possible. So if we have a traditional bell curve That's garbage. We don't want anything like that. So we're gonna continually Train our system to get to eke out the best possible accuracy that we can So the goal we're starting with some sort of a base knowledge We have points of analysis a corpus of unstructured data Then we feed that into our system We notice the errors we correct. We rinse and repeat. It's a cycle. So take let's take a very simple example This is my cat sprinkles She has very distinct features if you notice her ear is very pointy if you notice her feet You know, they come down they connect the shape of them is very similar to that of her paws You could kind of see her tail not really make it out But what we're highlighting here are some of the features that say that this is a cat You know, we figured a few different data points and this is some of our base knowledge. We're gonna be feeding in So we move on again. Here's another picture of sprinkles a little darker But we could still make out pointy ears. We have a circular face. We can see her feet Maybe we start to notice some more patterns about the system We get to here. We notice myself my fiancee and a penguin. We ask the system. Is this a cat? Well, I don't see any of the features that I noticed before. I don't see the circular face I don't see the tail. I don't see the fur. I don't see the small nose. I don't see the pointy ears This isn't a cat. Is this a cat? I Have a short snout. I have a circular face. The eyes are closer together. I don't see a tail though There's no fur Not a cat Here's a dog. Let's assess the features of this. Do I see pointy ears? Not really The face doesn't seem right. The fur doesn't seem right. The legs do come together But the paw just seems a little too wide Not a cat. What is this? So here we have a sloth. Does our system know what that is? Maybe not What it does know is it has a smaller face It's circular in nature It has a smaller nose A distinct mouth There's fur. The feet seem to come together. Everything's kind of coming together. It kind of looks like a cat So in this instance our system might become confused and this is where we would see on our rate that maybe we'll have a jump and we'll have some like false or false negatives or false positive, sorry So this is where we would have to come in and kind of re-train our system to Distinguish maybe some more advanced features of a cat because all in the end I just want to go online and find more cats for sprinkles to have friends So I'm going to here. Oh, we found another cat potentially So we noticed the pointy ears You notice the circular face. It's a little harder to see but this is sprinkles again We noticed a long tail This is a cat if we have many cats But did we tell our system that we might be expecting many cats? I don't know once again another opportunity to come back around and train the system So you might be thinking like what what why would I care about this? This is in a very useful example I already have a ton of cats. I don't need anymore. Why don't you give me something I can work with here? So IBM first started its kind of foray into some of the more public AI knowledge with Watson Prior to IBM Digital business group. I worked on Watson for two years. That's why I'm here today So what to start it out on our ever famous power machines here? If you watched it, I Encourage you to do so if you have not instead of beat Ken Jennings at his own game essentially But this also isn't very useful for the common person Unless you're really trying to impress your friends every day invite them over Jeopardy only to dominate Them with Watson. This isn't a very effective method for you to to utilize So we can see that Watson started off as that research project There's a demonstration at Jeopardy then some of the more advanced features of Watson started to be separated out into smaller services So first off we started in health care moved on financial services And then it kind of ballooned to such that we had a ton of services So the Watson developer cloud is something exposed through The IBM cloud today. These are services you could hook into and are using some of these key points of The older Watson application that are actually useful to you What are some uses that you can think of for these smaller examples? Well, if you look at text-to-speech tone analyzer speech-to-text Some of these things to be used for use cases that are very simple if you have a podcast and you want to have Captions at the end or if you want to print out all of that in an easy fashion You can have that information at the end. You could provide that to your podcast consumers Some of the more interesting examples that people have used for tone analyzer emotion analysis There's been cases that people were experimenting with for determining if someone was lying based on the tone of their voice or based on Some of the the known like lying indicators Another interesting example that I found Being used in IBM today is for security, you know Intrusion detection systems also have a corpus of knowledge. They note Cases that they know about as far as does this look like a security breach or not They'll classify it and then maybe if you have a choose a prevention system. They'll try to actively block that But attacks are getting so advanced today that it's necessary to potentially incorporate Artificial intelligence in order to detect and note to newer types of attacks You know zero days come out every single day and it's almost impossible next to impossible to block every single one of them It's now get into like how can I do this myself? These links I'll provide the slides. Obviously you can't click on these now But we're gonna start with GPUs So you're gonna need a machine that exposes GPUs GPUs are being leveraged because of the amount of cores These training jobs take a long amount of time So you can have the case where you think you're gonna have a short iterative Solution spin up some GPU virtual machines or bare metals do your training and then tear them down But often case if you you know note a lot of the examples that we had with cats with dogs It takes a lot of time to train your system and to note these things and to air correct So it's not uncommon for these jobs to take weeks months even and kubernetes is gonna help you to Cut out some of the corner cases you'd have to deal with if you're deploying this on the bare metal directly Tensor flow is also an interesting project that has come out of Google that allows you to leverage some of these more advanced API You need to build it yourself There was also examples of a deploying tensor flow atop kubernetes Now you're thinking there's a lot going on here. I've got stacks on stacks on stacks If I'm going all out, I've start off with a bare metal then I put some open stack on it now a virtual machine I deploy a containerized runtime Docker rocket then I put kubernetes on it, then I put tensor flow There's a lot going on here and I can understand Cutting a lot of these out myself You want to cut out the open stack cut out the open stack you want to cut out the tensor flow You want to do this yourself go for it But I want you to think of an Irish breakfast, you know, I'm not beating my Irish breakfast I'm not gonna eat the blood sausage without the pork sausage The toast perfectly compliments the eggs. I wouldn't want it any other way So when you're thinking about these systems Think about how kubernetes can help you to better deliver some of these machine learning training systems So this is a the information that I'll present in the next following slides hot off the press some of these proposals were just Discussed last week in sig node But some of the carry characteristics of GPUs that is important to note That distinguish them from other types of resources. We'll go into each of these right now multiple video cards So one one node one blade could have a ton of different video cards You could even potentially have different models Video cards in there. You have some faster ones. You have some Slower ones you have some purpose built ones for a certain topology So kubernetes is going to help you out with this by allowing you to use Well, that's not excellent a node selector And I'll discuss that in a couple more slides will allow you to specifically target the exact GPU that you want There's a lot of discussion going on the community about Exposing topology up through kubernetes so that you can target the exact topology as you want But in the next few slides, I'll show you ways that you can get around that to do the right thing the first time Driver installations if you're using a video card and you'd install the drivers These are proprietary drivers and you're going to get these from Nvidia's website an important thing to note though is the driver version on the host Most often needs to match whatever you're deploying in your container, whatever workload if you don't they clash you get a weird air message nothing works So when you're matching driver versions, you want to be sure that Version two of the driver matches up with version two of your container This match is bad You could deploy these things the kubernetes daemon set and the kubernetes daemon set will essentially deploy a target Across all of your nodes do whatever work necessary. So when you split up new nodes, it's automatically going to install these things for you Now there is some confusion around Do I need to reboot the machine the official docs say that in the video that you should reboot to Grab all of the latest kernel modules prior to utilizing the machine I've seen mix and match sometimes you have to reboot sometimes you don't certainly my laptop It says I had to reboot when I installed the Nvidia drivers. So You might find discrepancies So this is before you have multiple video cards kubernetes allows you to have a node selector in order to target the specific video cards You want you have some Tesla 100 p 100s in there. Go ahead and target that right here and each winner Down here, you're saying how many GPUs that you want And this is a normal pod spec that you're going to be using to deploy out whatever system like tensor flow or your application specific Solution the link down here will show you this example and show you a little bit of base knowledge about that Resource fragmentation is also big thing. So as I said these jobs take a long time to run So you want these always to run on time as fast as possible with the most correct results If you are consistently working around slower systems In your data center, maybe you have some older nodes mix with the newer nodes You're always going to want to target the newer nodes that this is an important job You said these take weeks months So here's some experiments that our research team has gone through as far as testing the Intel quick path Interconnect versus some of the more advanced hardware features like NV link and the important thing to notice here is the huge bump in What you're able to achieve performance wise when you get down to using something like for GPUs with an envy link enabled And this is as I mentioned is a specific topology very specific to GPUs kubernetes is trying to figure out how to use topology in a way that you don't have to necessarily expose that to the end user because it increases complexity and People that just want to get a job done. We'll have to Figure out how to use that It could get a little messy GPUs fail in different ways and it's not always a hard failure Sometimes it just gets too hot in there if you're running a training job, and you're just grinding away at it The fan may start to overheat. You're starting. You're gonna start to get inconsistence builds Insufficient power Problems, what have you have an example of an air message You might see when it happens here in a normal note setup if you have one card In your blade that fails. What are you gonna do? Are you gonna go in there and hot swap it out? Or you're just gonna See if it's just fine in the end what kubernetes is gonna do it's gonna proactively mark this as unavailable and It's going to only target the working GPU. So at your leisure you could come back fix it again this type of Performance and problem analysis is active in the node problem detector and we're starting to Add some pull requests to kubernetes to actively do that blocking However, if a node is just completely dead your job will get migrated over somewhere else So in kubernetes 1.6 just released a few weeks ago a Lot of work went into trying to make the GPU experience a little better. It's officially reached the alpha stage So you can have multiple pods on your nodes a pod in kubernetes is just your unit of work really Video card discovery. It's doing a little better and now it's using some fancy regex to figure out any active video cards that it can expose The basic failure recovery as I mentioned is in there the only problem at the moment It only works with docker and it only does this because of some very Interesting handoff between the kubernetes trying to figure out what containers are actively using things This is someone that's gonna come in feature Person so where are we going with gpu in kubernetes? We're gonna start with the device recovery It is very important to be able to segment off one card and allow that job to continue somewhere else That's where the health checking features come in Topology as I mentioned before you can just full stop allow the user to configure these things themselves However, if you schedule it right the first time a kubernetes has a degree of quality of service features So a best effort is I just want this to run if it fails a few times if I can't we get my resources right now, it's fine First of all, this is starting out with these resources this amount of cpu this amount of RAM However, it might creep up to this limit in which case. I want you to cut it off Or guaranteed that's where you know how this application works and Kubernetes is gonna do its best to almost always guarantee those resources for you now If you bubble up topology to the guaranteed level and you say I want this thing to be guaranteed to have these things Building a topology to that is a lot more consumable for for a user Then they don't even need to know about what the best topology for GPUs are They're just gonna be assured that because of the base knowledge as you know about GPUs that they're gonna get that every time That's another thing we're gonna work on metrics as always a good thing Kubernetes utilizes C advisor for every container that it's spinning up So it's gonna give you metrics per container that you launch in there. That's also gonna give you metrics node by node Some cleanup features that we want to do we want to make it work with things that are not Docker There was a significant Effort in one that Kubernetes 1.5 to attempt to abstract out a lot of these things into the container resource interface So that you can pass the same amount of base information to kubernetes and deploy a VM to play a rocket container deploy Docker Whatever you want Last week we met with a lot of people from Nvidia Nvidia's having their conference this week on the other side of the country But they're gonna help us out with some of the libraries where they try to constrain some of their best practices By NVML and their newest addition LibNvidia container So this will be something that the kubelet, which is the worker node of kubernetes will call out to to gain this functionality So That's basically end of my talk. I apologize I did put at the end that I was going to have a demo for deploying kubernetes on top of open stack But I noticed there's like five to ten talks that are only targeting that I didn't want to steal the material so I will deploy I'll send up a blog post on the IBM code page with an example if you really wanted to see my specific example later on But come talk to me about these things I'm very interested if anyone has high-performance computing use cases especially of using GPUs I also work in the networking space on kubernetes Tell me about any features that you'd like to see in there I'm especially interested in multiple interfaces within a Container and also multiple networks that you want to try Also any other cloud native computing foundation projects that you want to talk about great If you just want to talk about anything you're just looking for a friendly face a couple of topics I know a bit about cars coffee cooking fishing world culture come up ask me about that if you want to ask about the non-technical topic Questions Yeah, thanks for the presentation. I have a question in my area GPU is less interesting But the networking part is way more interesting because I'm basically working with the service providers And they are looking into the networking performance Is there any plan to do something similar for in in the kubernetes environment to support Something similar to GPU but for the networking cards Yes, well within IBM directly we're working on that use case where it's node-to-node so a lot of the performance is based on More of the network some of the more advanced networking features that you see in combination with GPUs is infinity band mental This relates to some of the topology work We're trying to get it right at the node level first before we move on to Inter node communication Scheduling on the network takes place in kind of an upstream Potentially soon to be CNCF project a lot of those networking things happen in the container and network interface Which is a separate project where a lot of these plugins come together in order to achieve some of those advanced things so a lot of the hope is to Place a lot of that knowledge in Those plugins and I'll allow you to chain those together to get what you were trying to achieve This both supports a more clean code base for kubernetes, but still will potentially allow you to achieve those same results All right. Thank you Other questions No, well, thank you very much. I hope you have a great day Just my information again up on the board see him Luciano underscore unless you want the junk it guy Get to the comms in Luciano. See what I'm working on I'll be at the CNCF booth and also floating around the IBM booth I look like this if you're trying to find me Thank you