 Okay, it's time to get started. Thank you guys for coming. So I'm Daniel Whiteknack. I work for a company called Packarderm You'll hear more about that project a little bit later But today I'm gonna be talking to you about building GPU accelerated workflows with TensorFlow and Kubernetes Which is this really long title. I should have made it more Shorter and more exciting, but but hopefully the talk will be exciting So it's a great setup in the last talk talking about you know Some of the challenges around using GPUs on a data science team And how you could actually offload like model training on to a GPU on Kubernetes This talk is going to be related But you know, it's it's going to be have a slightly different spin So really what I'm gonna be talking about the keyword here is workflows. So oftentimes running model training Isn't the only piece of the puzzle right actually it's only a very small piece of a very much larger puzzle Which includes a lot of pre-processing of data it includes training it includes inference it includes post-processing it includes visualization and What I'm going to try to get across to you guys today is how We run all of that together on Kubernetes While still being able to offload those important pieces to the GPUs when we need to utilize them And so we've worked with a bunch of different users and clients to do this So what I'm going to describe is kind of how how we do that To that end I'm going to start by talking about like my my picture of this kind of bigger bigger data pipeline Scenario where a piece of that is model training on a GPU, but it's it's more and then we're going to you know Obviously as part of that talk about where the GPU comes into play Then we're going to talk about why Kubernetes is so good at managing this sort of thing And what if anything we need to add on on top of Kubernetes to be able to support this sort of workflow And then all of you guys are a technical crowd So I know you won't believe any of that so then I'm going to go right in and into a live demo and we're going to try to Deploy a bunch of tensor flow and data processing stuff on Kubernetes And switch between CPU and GPU nodes and you know fingers crossed that goes okay So it sound okay everybody good. I mean that's the only talk I have so I mean you can like exit now If you want to hear something different All right Let's go ahead and get started So in my mind again like the the typical workflow for a data science team or an analytics team Is very much, you know broader than just model training that that gets a lot of attention Because that's kind of I guess the cool part right but actually there's really really a struggle around managing the pipeline as a whole and Actually, I think this this problem is really crucial because if you can't handle like all these Pre-processing things and then get the data to the GPU when you need to and handle the post-processing things and be able to Share those resources and also be able to reproduce certain analyses and hand them off to other parts of the team Then you're really going to struggle to create value in a business So I'm going to share like kind of an example. So this is by this is very much an example. It's not You know the the things that we'll talk about today will apply in many more examples But one of them example of this is like processing images and maybe doing some object detection So in like this sort of workflow, we might start with like a raw data set of images From somewhere, let's say let's not worry about where they come from But we have access to this raw data set of images The first thing that we probably need to do is pre-process those somehow So maybe our like model needs to take those in in a certain format or a certain size Or maybe we need to like pair them with other images or label them or format them or tag them in some way Anyway, there's a lot of different pre-processing that we might want to do And out of that pre-processing then we might get a set of Nice images that we want to feed into our model training And I've represented this here by one stage But oftentimes like when we work with people this you know could be 15 stages of pre-processing, right? Developed by like three different people in a team, right? Okay, then we have the cool model training stage Which takes in that data Train some type of model maybe a neural net and often times like I think last week people or last night people raise their hand A lot of people are using TensorFlow. That's why I'm going to be talking about Using TensorFlow, but again, this is an example So this would apply to any framework that you're wanting to use Whether that be TensorFlow or cafe or whatever And so we're going to train our model using that framework on that input pre-processed input We're going to maybe serialize a model or export it in in some way Such that we can use that model For inference so we're not going to retrain our model every time an image comes in that we need to do object detection on We need to like serve that model somehow Like TensorFlow serving or other things so we this is kind of a separate stage And I haven't even added in after this like if there's post-processing, right? So here I've already kind of got these three distinct phases I've left out post-processing and these could be expanded into multiple other stages, right? So you can start to see that this can be a little bit of an orchestration nightmare And this is why oftentimes me as a data scientist and I start talking to you know DevOps people and infrastructure people and they say I'm a data scientist then they kind of just like Start walking away and they don't really like me so much anymore But it is a challenge, especially, you know utilizing multiple frameworks You're utilizing weird frameworks that the rest of an engineering organization doesn't understand and you have these like Multi-stage distributed things that need to be managed and updated over time Okay, so we again here we have the these kind of three distinct stages So let's let's talk about like where where GPUs come into that So Actually most of the time so I'm not like saying a general statement here But most of the time people utilize GPUs for model training as was mentioned in the previous talk So Here we actually have two stages that will run just fine on CPU nodes So like basically our whole workflow However many stages we do for preprocessing and inference Let's say that we can run that on on CPUs And then we have this one stage that we want to that we want to run on a GPU node This is pretty essential for a lot of teams that are they're building models at large scale. They they need to Run this training on a GPU, but they also need to Interface that with these other stages of preprocessing post-processing inference and all of that stuff Okay so That's kind of the the general picture that I wanted to have in your mind So really I mean we know from previous talks and more talks that will be today and other things you've seen online that we can we can utilize GPUs in Kubernetes, but Really like I don't think there's a lot of content out there And tooling around actually managing this sort of workflow on top of on top of Kubernetes Outside of scheduling the individual pieces So that's that's really what I want to focus on is like Enabling this workflow and then also being able to get those necessary stages that need some sort of acceleration on to On to GPUs when we need them Okay, so again, let's say that we have these these few stages just generally Well, I mean One of the things that we need to do is we need to make these stages portable, right? And I'm kind of preaching to the choir here, but you guys understand like I can I can Dockerize these these different stages and run them in any sort of environment and that's that's great And I like not having to convince you of that in this conference because a lot of times I have to convince people of that in data science conferences But you guys understand that so this this is a way that we can package things up and get them running with you know Reproducible behavior in another environment But again, we don't want to you know We don't want to be like SSH in the machines and deploying these things manually, right? So Kubernetes, you know among other things, I can't you know, I don't have time to go over all the benefits But again, I'm preaching to the choir somewhat here, right? Kubernetes gives us this great and awesome Framework for being able to take these stages and Deploy them not only on the CPU nodes, but on GPU nodes In a very descriptive way where I can say, you know I I want these certain workloads defined by these containers to run on these types of nodes and then guess what it happens And that's that's really great So we have up to this point in in this picture. We've made our individual Processing stages portable, right? And we've actually made deploying all of them together Portable because we can run Kubernetes anywhere, right? Your data scientists can develop these things you can deploy Kubernetes where you want on whatever infrastructure you want and then deploy This set of things on Kubernetes But that's actually not That's not the the only key. So so what am I missing someone someone tell me? So let's say that let's say that like the model inference. That's some type of serving thing Okay, so like outside of the functionality of each of these pieces Like operationally, what am I missing to enable that workflow that I discussed before? What now? The linkage between stages, right? What else? Life-cycle management updating that sort of thing. What else? Data yeah, so so the first the first one that comes to my mind, right is I deploy these containers here and I I am a data scientist. So I want to process data. Where is the data, right? I have to somehow get the right data to the right code, right? And maybe like let's say that's stored in an object store That's what was talked about in the last talk as well Let's say our data is in an object store So somehow I need to get The right pieces of data which aren't everything that's stored in the object store, right? I need to get the right pieces of data to the right pods to be processed, okay And then there's the element of the linkage that was mentioned, right? Actually, that's that's not all right. I need to get the right data to the right code And I need to run those steps of processing in a very specific predefined sequence, right for things to go right and and so like Kubernetes provides this really great framework and foundation but similar to like, you know how like Borg is different in Google than What Kubernetes is and Borg includes a bunch of these pieces that fills gaps in the context of what Google is doing And then there's Kubernetes and in industry, you know, it doesn't it doesn't offer us everything we need You know, we might want to use like Vault for secret management or Istio for for service mesh, right? So somehow we need to like fill this this gap of like getting the right data to the right code in the right order on the right nodes, right? so So really what I'm what I'm saying is like Kubernetes is great for ML because you know, we can have this portability We can have scalability. We can have, you know Auto-scaling all of these great things that you guys that's why you're here And those things are directly applicable to machine learning But we need a little extra sugar, okay? And this extra sugar is actually really important and not that trivial So we need to get the right data to the right code We need to process the right data with the right code on the right nodes, right? So whether that be a CPU work workload or a GPU workload And we need to trigger the right code at the right time with the right data on the right nodes You guys you guys get the idea. So this is really This this is really what I believe, you know, we we need and what our team believes we need But as was mentioned up front here So this is like operationally kind of what we need to enable this it would be nice as a bonus as well oops to be able to actually have some concept of Maintaining this over time and making sure that we do it in a sustainable way Which also means that we need to be somehow tracking what's going on We need to be versioning what data ran with what code on what nodes at what time And we need especially if you're working with like health care or finance data We need to stay compliant and be able to be able to reproduce and have the provenance of what we did at what what points in time So all of this together is what what we put together in the open source project packet arm. So packet arm Is the open source data pipelining and data management layer on top of kubernetes? So what I really mean by that is There's data pipelining and there's data management So we need to get the right data to the right code, which is related to the sequence sequence of things and and data pipelining we also need to somehow manage that data we need to Shim the right data to the right code. We need to collect output data, right? So all of these pieces that enable what we talked about before getting the right data to the right code on the right nodes at the Right time. This is that that layer for kubernetes So the pieces of of packet arm that enable this are First data versioning so all data that's processed in packet arm is version controlled So kind of think like get get for data you can set up collections of data and Commit data in there make changes and we'll track all of those all of those changes Which which both lets us have reproducibility But it also lets us know when there's new data so we can trigger the right things at the right time, right? Obviously since we're running on kubernetes, we use containers for analyses and this is actually really important It might be lost on on this crowd, but the the set of tooling that data scientists use Like we might take for granted that containers provide this unified layer But they struggle so much and I I struggled so much in my past with all of this diverse set of tooling and Being able to string it all together so being able to use tensorflow and I need it and then connect that output to our And do some visualizations and then someone else build some you know weird thing with Julia or something and we do that So that gives us like a unified framework for saying our basic units of data processing or containers We're unopinionated about what you run in those run tensorflow run our run Python run Julia run a bash command We don't we don't care Next we kind of combine the containers for analyses with the data versioning to build up pipeline or build up distributed pipelines or or DAGs of processing where these containerized processing stages subscribe to version collections of data Descriptively so you say I want to process this data with this image and you build up this DAG of processing stuff steps Which is also scalable and parallelizable and Finally because we're versioning all the data and we know what docker images we're using for each stage We actually have complete quote-unquote Provenance for any data anywhere and what I mean by that is we can produce a result and then if we want to know all of the other pieces of data and the states of the those pieces of data and the states of our docker images all of that when we produce that result We can get all of that information very easily Which helps both in terms of compliance and maintainability and debugging and and all of that stuff Just to again, I know I know we're in a with the with the technical crowd here So I want to definitely don't want to you know leave without kind of giving you a few more of Details before we jump into the demo so Packarderm again, it's a it's a layer that runs on Kubernetes So Kubernetes forms the base this this gives us most of what we need, right? Packarderm runs as a pod on top of Kubernetes and then it talks to an object store Which is where all the data is backed and We talked to that Packarderm pod and tell it you know We want to process this data with this code And then Packarderm talks to Kubernetes under the hood and then spins up whatever pods are needed to do that Processing so if I have an inference stage or a training stage running TensorFlow Then I can spin up however many pipeline workers under the hood to do that processing for that stage And then there'll be other workers which are just pods To to do do the processing for the for the other stages Okay So enough of my enough of my blabbing. Let's get to the let's get to the good stuff Okay, so I've got a demo here This is sorry about the fuzzy the fuzzy Text I think the terminal will be a little bit better But just to give you an idea so this is like a dashboard that you can have that you can look and see what's running As Packarderm pipelines, and I have this pipeline running Here I'll show you on the back end what that looks like here in a second But just to kind of illustrate here I'm here. I'm doing image to image translation with TensorFlow Which means like an image comes in in one style and I want to transfer it to another style in this particular case I want to bring satellite images in and kind of Automatically transfer their style to like Google Maps images. Okay, so here in Training so each of these blue dots here Represents one of these version collections of data remember this is a kind of our first piece of the puzzle And in this in this version collection of data. I have a bunch of of images That I can use for training so I want to be able to translate images like what's on the left to images like what's on the right? Also in my in my input here I have two input images and I want to say okay. I want to take this image and I want to translate it So this is my this is my Input that I want to utilize my trained model to transform And so my inputs are those that training data and then and then that input data or input images Then over here on the left. I do the training. So this this next stage here is the training Then I do just some model Export so just by the way that the scripts are set up. I just changed the format of the the model And then that model is used in a generate stage along with the Pre-processed images so I have a stage of pre-processing and then I feed those together Into the generate stage which generates Output images so remember so each of these collections is one of those version versions of data and each of my pipeline stages are Containerized Analysis, right? So if I click on one of these this is my model training stage and I go down here I can see how this how this pipeline stage is defined and it's defined via a Docker image and a command that's run in that Docker image So I'm basically telling Packard arm. Hey, I want you to process this data Using this Docker image and when you do it run this command, which is just my Python script That's that's using TensorFlow Okay, and each of these is defined in in a similar way so in this case all all of my pipeline stages except for this check point the model training stage All of those are just fine running on CPUs And actually I can parallelize them very very easily across CPU node instances So no worries like I don't have to worry about using a GPU for those stages So ideally what I would want to have is I would want to have those pods that are running those Processing stages run on CPU nodes and then when I need to run checkpoint. I want that to be scheduled on a GPU node Such that I can do my model training very quickly Okay, so so let's move over and connect some of the dots so we have If I look at what's running in this cluster now I can see there's pack D running which again manages all of this Pipelining stuff and data management things and then I have all of these pipeline workers So I have in this case I have a single worker for each stage of my pipeline Although that's by no means not the only thing you can do you could spin up a hundred workers to process a stage in Parallel that's that's fine And I can talk about that in the question answer if that's something you want to talk about But I have each of these pods scheduled and I've already actually put some Example data in so if I look at what jobs have run which jobs just mean how many times have these stages run Sorry for the the wrapping there But I can see like checkpoints run Once I've pre-processed a couple times I've run my model export and I've generated images once and that's reflected over here in the in the dashboard as well I can see that this This last stage if I look at Or actually Excuse me so the reason why that One input ran twice which I'll illustrate the data versioning here is because I put two images into the Into that input images But I can see that I've actually done that in two consecutive Commits, okay, so I could actually go back to this original commit of the data and see that oh Well, I only had one image at that point in history and then I added the other one and what'll happen to kind of illustrate how all of these things are automatically connected and and Those links are made. Let me just put one more image into that Into that input images repo So I'll put it into input images on the master branch remember again like kind of like get like semantics here for data And I'm gonna put this third image in and actually what I'll see if I list the jobs again I'll see that that automatically What happened is? Packarderm saw that there was new data that need to be processed in the input and it said oh You've descriptively told me that this this pod should be processing that data I'm gonna hand off that data to that pod It's gonna be processed and then and then it's gonna go on down the line So that'll probably run quickly we can see that it actually pre-processed that and it ran the last stage as well All automatically triggered Okay If we look at the output here, it's actually not super impressive because I ran So it kind of looks like a Google Maps drawing, but I only ran the training for like one epic if you're familiar with Training that's not sufficient in general But I did it so it ran for about 11 minutes on a CPU and And the reason why I did that is to show you something that would actually execute in the time of this talk hopefully So okay, I can see that I automatically generated the output for the for the third image as well Okay, so we've solved if we take a step back. Let's think we now have All of the all of our stages running as pods on Kubernetes. We've wired them together We've connected the right data to the right code When new data comes in we can automatically trigger all of those things the piece that I haven't covered which is really the the punchline of this talk, I guess is that That's actually all not not the complete story because I still need to make sure that At least one of these stages runs on a GPU, right? I want to run my training on a GPU and actually what what's happening under the hood Let me show you here The way that I told Packard arm to spin these up Is with a specification like this that says you know Create a pipeline called checkpoint use this docker image and command run it on this input data Here I'm just going to run one one instance of that, but I didn't say anything about a GPU, right? So what happened is Packard arm said okay? That's that's all cool. I'll run that Whenever the data comes into training then I'm going to train your model on that on that new data But like I said if we look at this so our original training here took 11 minutes so I just ran it earlier this morning and We're I mean 11 minutes isn't that much in the training world But let's say that we want to do better and we want to make sure that that runs on a GPU First let's just see if it did run on a Jeep on a GPU node So to kind of show you how I have this cluster set up I have this is the the instances in my cluster. It's running in AWS Although you can do the same thing in Google Cloud or Azure. It's fine or some hybrid Hybrid solution But here I have one node. That's a p2 extra large which include which is a GPU node, okay? So maybe I got maybe I did run on that on that GPU. Let's let's check so if I I list job and I look here at this training and Let's go ahead and get the logs for that job and Just grep for kuda Okay, oh crap. I didn't I didn't find any I didn't find any GPU, right? So this this ran just as it as it normally would on a on a CPU node Now all I have to do is so remember going going back to the last talk Remember what the standard workflow these days is for data scientists using GPUs? You do a bunch of stuff on your local machine Maybe you do some stuff in the cloud or on a cluster With CPU nodes and then when you need to do something on a GPU node you turn around you're like Hey Frank are using the the GPU node and then you know He says no, but then maybe Susie's using it So you schedule a job on it and then like all all goes to crap and everybody's angry and it's not Harmonious at all. So we want to strive for something more harmonious. So I want to just make sure that that runs on on a GPU so all I have to do is Go in here and modify my my pipeline spec and Just set some resource limits and say hey, I want to run I want to run with a GPU okay, and Then here's where you can cross your fingers Let me just update that and I'm going to reprocess Which means you know I want to reprocess with the updated spec oops You actually Do this? Sorry with the GPU Let me actually because I Just a second you didn't cross your fingers quite good enough Because I forgot the GPU tag there. So let me delete this other job That's gonna run another 11 minutes, and I'm I don't want to keep you guys from lunch Okay Now let's see what's running. All right, so now we have this checkpoint running again Let's see Let's get the logs again There's my GPU and so all I had to do was just set that resource limit and Now pack it or new hey, you don't want me to run this on a CPU anymore You want me to run it on a GPU and then it talked to Kubernetes under the hood and said hey This needs to run on a GPU and it was scheduled on a GPU node and then it it recognized the drivers boom We're off to the races the rest of the things are still gonna run on the CPU nodes Everything's still wired together. So when I output this model from the GPU training It's still gonna be supplied to the other stages that are running on CPU nodes And then I'm and then I'm golden Okay, so let's I'll just keep this up in case you're this should finish it I think it finishes in like one minute So one minute compared to 11 minutes. It's pretty pretty good improvement So yeah, that's um, that's the demo. I wanted to show so I'm gonna be happy to so I'm gonna be around the rest of today and tomorrow It's lunchtime now. I'm happy to have lunch with you guys. There's stickers up here So if you're into that sort of thing grab some of those, I'll also post these slides on the sked schedule site The PDF there's a few so don't feel like you have to grab the picture unless you want it But this will be posted on the schedule So you can grab the PDF of this and then follow these links to all the docs Let's just see if oh, we got it. Okay, so we ran About a minute. Okay, so all right. Thanks guys