 Okay, cool. And then should I stay on this side or with the better way? Okay. I just want to do the mic interference thing. Okay. Cool. Thanks. All right. Well, welcome. Today we're going to be talking about TensorFlow GPU Perfetune containers with OpenShift. My name is Bill Rainford. I'm a software engineering manager and senior principal software engineer at Red Hat. And with me today is my partner on this project, Perul. Hi, everyone. I'm Perul Singh and I work as a software engineer on the Office of CTO. All right. So we're going to talk a little bit overview kind of what we're going to cover today. The outline would be, first, we're going to talk about Chris Research Integration System, or Chris as we call it. And the reason behind this is this is going to be a platform where we are going to run machine learning workflows. After that, we will talk a bit about the technology stack behind this demonstration. And we're going to talk about doing local development and running on your local CPU. And then we will see how to build and deploy and what are the configurations that you need to do to run it on OpenShift. And we're going to talk about things like hardware acceleration, making use of the GPU and performance tuning and related best practices. All right. So a little bit about Project Chris. And so Project Chris is a medical image processing platform. And it's an open source project, which is a collaboration between Red Hat, Boston University, Boston Children's Hospital and the Massachusetts Open Cloud. And what you can see here is a rendering of some of the UI that we've gotten. So the Red Hat UI UX engineers have helped with this. And in doing this sort of medical image processing, there are a lot of images. There's a lot of computations to happen in order to make meaningful insights on this sort of data. So like a brain scan like this, normally you've got multiple slices of images there. And that's something that's kind of hard for the average human to really digest, a little bit in mass if you've got lots of images that you want to do research or make differences on. And so part of it is being able to annotate these sorts of things, apply AI, apply machine learning to try and solve some of these medical sorts of problems. And we use OpenShift job framework to run the image processing algorithms. And the real reason is behind is that right now medical image processing takes days to hours and we want to reduce that time to minutes and seconds so that the results are clinically relevant. We talked about the technology stack that's involved in this effort. We are developing obviously a machine learning application, which is a very naive example. What we have taken is MSIT upstream open source project and it does is identify handwritten digits and we are using the same repository. By the way, our code is available online and it's an open source. And at the end of the presentation, we will give away all the links that we have. And one of the technologies we use is TensorFlow, which is an open source project started by Google that allows us to have a common way of developing and sharing models. And we'll talk more about that later. And for GPU acceleration, we are using NVIDIA CUDA so that it's a model by which you can run general computing on GPUs. And we'll talk about the OpenShift container platform, which helps orchestrate and manage the container environment that we use to run a lot of this. Just to let you know, we want to keep this demo very interactive so feel free to stop anywhere you have any questions. Just raise your hand, the mic would come to you. Don't hesitate to stop us anywhere if you don't understand. I just want to take a moment to kind of pull the room just to kind of understand where folks are in terms of experience, that sort of thing. By quick show of hands, how many folks in the room are familiar with or have used TensorFlow? Cool. What about people who have used NVIDIA or not, like, tried to use NVIDIA and then gave up? And how many folks have used the OpenShift container platform? And Kubernetes, which is derived from? Okay. Cool. Thank you. So those of you who don't know what's OpenShift, OpenShift is a set of family of containerized application that was developed by Red Hat. And the flagship product is OpenShift container platform that helps you to deploy and build your application inside a container. And as Bill mentioned, it is backed by Kubernetes, so it is orchestrating and monitoring and managing your containers using Kubernetes API. And the question of why we're using it, not just because of Red Hat employees, but because there's, you know, a wide community that helps support it. And also as a broad acceptance of other public clouds, whether it be Amazon, Google, Azure, those sorts of things. How many of you are building containerized application as a part of your job or hobby or anything? Okay. So can you tell me one of the problems that you face or something that's the most annoying thing about developing containerized application? Life cycle. Life cycle? So sometimes the container images might become growing size, you know. Right. So the same, when we were developing Bill and I, we were experiencing a lot of problems and few of them is like speed. It takes really long time to build your image. Every time that you are building an image, you have to install all the dependencies. Like for example, your base images, CentOS, or REL, or anything. You install that. Then you install all the dependencies that are related to your application. So what we really wanted to do is that each iteration, we only wanted to rebuild what changed. Not like, you are not changing anything in REL or CentOS image, but each time you install it. So we wanted to remove that and increase the speed of building and deployment. And the other trick was patchability. So even the case where we're not hit by the rebuild issues, hey, there's, you know, some security vulnerability came out. And so we're going to be able to patch that image stack in a hopefully less painful sort of way. Any other pieces of efficiency, right? And what are some optimizations we can make in terms of how we're building it, and how fast it takes to turn around those images and the related changes. And also we wanted to have a structure so that the layers that are independent of each other, they are isolated. So let's say that there is a vulnerability that was found on, it would never be the case. But let's say that something happens with REL, which is never going to be the case. We support real REL. And you are changing nothing in your application layer, right? But you have to build all the layers again. So we wanted that the layers should be isolated from each other so that if you need to tweak or modify any of the layer, it's just that layer you're attacking, not all the layers. And we want our image to be reproducible. And by that, we mean that it should contain all the dependencies and libraries as inputs so that anybody can just take that image and reproduce any issues that I've been facing or Bill has been facing. It should be consistent with everyone. So we wanted a solution or a tool that is a tense or that helps us to deal with all these problems. And the tool we came across is SourceToImage or S2I as we call it. And it is the way to write image easily. It is a tool that will take your application source as an input and create an image out of it and run your application inside a container. And here are the few steps. The first step is you take a builder image, and the builder image will have all the libraries and dependencies installed on it. And then you inject your application code. For us, our application source is residing at GitHub. So we pulled our source code from the GitHub and we assembled it into the builder container. And there is a assemble script that is run that will install all the dependencies that are related to the application code. And sorry. And then it will commit it as an application image. You can commit it anywhere. You can commit the image in Docker Hub. We are doing in OpenShift internal registry. And once that is done, you have the application image. All you need to do is just run it inside a container. And when it lets technology, a stack piece we're using is TensorFlow, which is Google developed open source library from machine learning. And for developing deep neural networks. And so what's shown here is the MNIST fashion data set. And so this is a open source data set of different fashion coding, those sorts of things. And so those public domain data sets help us train different models and explore techniques and share things. And so TensorFlow really makes it a little easier for us to develop these sorts of models and be able to share them with others. So if I've got a complex model, I've got a massive data set, say, you know, something like this or something much bigger. I can go train that data set, save the model off and then share it with somebody else. We can either use it as is or take that trained model and train it further based on the problem space they're trying to solve. And so that's really a powerful extraction mechanism for us. We are also using NVIDIA CUDA, which is everybody who has or not experienced with it knows how big her headache is. Anybody who tried and gave up, like they started, oh, I'm going to use CUDA and they said, oh, no, this is too much. Nobody. Okay, I was there, but then I was saved. So CUDA is a parallel computing platform and programming model that was created by NVIDIA and it's called NVIDIA CUDA. And it will give you all the set of libraries that will help you to run your application on GPU nodes and basically harness the power of GPU. Quick trivia, CUDA was initially NVIDIA named CUDA as Compute Unified Device Architecture, but they themselves said, oh, my God, this is too much. And they dropped it. So now it's CUDA, just CUDA. And if you deal into the fine prints of licensing of CUDA, you will see it's now free for distribution. So what it means is, like, you cannot give your containers with that as NVIDIA or uses NVIDIA CUDA to anybody else because that is not allowed. Your developers will be happy. You can download the image, but you can't redistribute. You cannot redistribute. So that's the catch. So either your developers are happy or your lawyers are happy, not both of them. But don't worry. We have found a solution to make both of them happy. And we will come on it later. So that's a quick recap of the tech sector we're using. So with that technology, how do we make it easier to access and develop AI and machine learning solutions? Yes, open source is a key piece of that. And we'll talk about some of the other technologies to get around those issues around NVIDIA and other problems. OK. So we have talked about a lot of challenge. Anybody faced any challenge in developing AI ML? I'm not talking about the kings of AI ML. I'm talking about newcomers who recently started working in this field and said, oh no, this is again too much. OK. So when I started at least, I was very overwhelmed. What is the right libraries to use? Which version is compatible with what version? How to train my model, where to procure data set? And then I realized that once I figured out everything, there was how to make an efficient application, how to ensure the performance is good. And once I did everything of that, I was hit by the whole factor of how to run this on GPU, how to make CUDA available for me. So I was really overwhelmed when I just got into this field of AI and ML. So our suggestion or how we approach this problem is first ensure that you have trained your model and optimized your model. So that is giving the right inference. Like if you ask it, is this orange or is it an apple? It doesn't classify as an elephant. So that's the first step that you need to do. And once you have done that, build up your solution to address all the problem. Or as Bill says. Make it work. Make it fast. Make it pretty. This is a challenge in software engineering. Deadlines and everything. So the first demo we're going to talk about is using TensorFlow to do machine learning on your local environment. So Luke, running with your local CPU. And so in order to go do that, how do we set that up? And so basically what I do is I grab a publicly available base image. So in these cases, like a CentOS 7 image that's got the S2I and Python already installed on it. And then I go install all my dependencies. Like a specific version of Python, TensorFlow, anything else that I need to kind of just be the baseline to do the work I need. And so this is all using a Docker file. Simple Docker build. That becomes the new base image that I'm going to use for the project specific work that I'm going to do in the next step. And so we use S2I or Source2Image. And basically what that does is allow me to, as Bullets or that Podiagram there, and stuff to try and separate our code development from the image itself. So with that base image there, I can run the S2I assemble script, which then installs all my app-specific source code. It doesn't need training, so I need to train my model. I can train that. And once I've got that in place, the S2I will then use the run script, which there basically would just say in this case, run the app inference. And so now I've got a base app here. So I've got my application image. I can push this to my repository. And then I can go start instantiating it and running it. And so to run it, simple docker run command on your local machine. And so that gives me potentially multiple instances of a valid application container. And so in the context of the project Chris worked that we're doing, we're going to show a little quick demo video of kind of how we do that. So in this case, we start off, we clone the sample repo that we've got. So we're doing the clone right there. Then we're going to do a make build, which is going to basically execute that docker file. And that's going to build the base image that we're going to use. That's what we've got there. And then we've got how-to guides, everything we've got here. All the commands are exposed in that readme. And so we're going to grab the S2I build command that we've got here. And we're going to execute that. And so that's going to grab the image we just created. It's going to go copy the source code over. It's going to set up anything that I care about. And then it's going to go train the model. And so in this case, we're using the MNIST numeral dataset. It's doing the training there. It's saving up the model that we care about. And now I've got that image ready to go. So with that, we do a docker run, run the sample application. So in this case now, we run, should load everything up, and should do an inference. And so there it goes. It's launching everything, running inference, and the inference value of the test image is 8. So now I'm able to use that model that we just trained and get a meaningful result. And so we're hoping that this basis for folks in this experimenting gives them a quick way to get up and running for both TensorFlow and potential project, Chris. So this is a valid Chris plugin that we've got here. Can't find my mouse. Got it. So now we are going to see how to do the build and deployment on OpenShift. Again, like the Docker local build, first you do is get the center space image. You can also have UBI 7. You can just use CentOS. You can also use REL. The reason why we are not using REL is because you need to have a Red Hat subscription license to manage the distribution. But since it's an open source initiative, so you can either use CentOS or UBI. And after that, these are both publicly available. And then we installed CUDN, which is CUDA distribution of deep neural network libraries. And what it essentially provides is an optimized implementation of various deep-neuding network algorithms like how to do activation layers or how to do forward and backward propagation. You can totally write this on your own as well. But if you use this, it's a highly optimized and highly tuned implementation. Then we needed CUDA toolkit, which will give you the compiler or essentially the entire SDK, which is needed to develop GPU-accelerated application. And then you need a runtime so that you can run your application or distribute your application as executable without necessarily giving away the CUDA binaries, which is not allowed. And then with all these things, you have your base container, which would act as the builder container for the next step. So that when I am going in the next step, I don't do all these steps. And the reason is you hardly want to change any of it. If you are optimizing your model, you wouldn't say, no, I don't want to use 3.6. My model will work better if I go to 2.7. That's not going to happen, right? So if it's better that we have all these layers built and the image ready so that in the next step, you just reuse that image and not build everything from scratch. And you cannot push this into public repository like Docker Hub because of the licensing around CUDA. So what you need to do is you need to push it into your private repository, which in our case is the default repository with OpenShift that comes. You can also push it into your private GitHub repo and each time you just pull it and use it. But that's not recommended. But if you don't have anything else that is doable. The next is like you just take your base image, which will have CentOS, Python, and CUDA, and just add TensorFlow on top of that. Now, at this point, people might be asking, why am I installing TensorFlow on this layer? It's like TensorFlow is a dependency for the application source, right? Ideally, people do install minus our requirements of TXT and they specify all the requirements over there. So the reason behind this, again, installing TensorFlow and specifically for this demo, we have installed optimized TensorFlow for CentOS and not a general distribution of CentOS. It takes six minutes like we calculated that. It takes around five to six minutes to do installation. And imagine all you wanted to do in your application is add a print statement or like add some debugging level. And if you rebuild your application, you're adding five minutes every time you are building that application. So that's why we have introduced an intermediary delay, which is a TensorFlow layer, and we installed the optimized version of CentOS. You also have for L and UBI. And once you do that, you again push this image in your internal registry. That will act as a builder image for the next step. The final layer is the application layer. When you're going to use the base image that you developed in the previous and inject your application code. So at this point is when we are introducing the Python code, like we are pulling a Python code from the GitHub, and we are injecting into the existing builder container that we created in the previous step. Then we will assemble it by running the assemble S2I scripts. And we will train and save a model and commit this image to our internal registry. So now this application image has your application source, all the required dependencies, as well as the trained model onto it. And now if anybody wants to do inference, all they have to do is run a container with this image and give it the input or the testing image that you want to classify, and it will give you the output. So in my scenario and in the idle scenario, what happens is the platform where you are running your machine learning workflow is different from the project where the developer was creating the image. So in my case, I don't have access to the radiology project where the Chris Joffrey work is running. So in such scenario, you need to give access to the service account of the project that would be pulling the image from a different project. So this is one of the configuration that you have to do. It's not mandatory, but in case somebody has this kind of working situation as I had, so this is one step that you need to add. And you also need to do some configuration as to let OpenShift scheduler know that the container may ask for a GPU resource. So you need to make certain environment configurations and you have to make sure that you give the Invidia driver the capabilities of both compute and utility and you specify in the resources that this container or this pod might require or might ask for GPU. Now I'm going to show you the code demo. So the first step is building. I have already built the CUDA dependent builder image because it takes a lot of time. It takes literally a lot of time. So even showing a video would not have helped. So in this scenario, I already have as you can see in my project images that has CUDA into it. So if I go to image, you see that CUDA has all of that CODN in the runtime and everything. So now all I need to do is pull this image and I upload a template which will install TensorFlow. So all I need to do is change the yeah, so I'm changing the S2I. I'm specifying what base image to take. So this is a base image generated in my previous step. So I need to change that and I need to specify where is my version I'm using which is 3.6 and the TensorFlow as we have used the optimized version so there's a link for that. And once all is done, it will start the build process and you can see it has already initialized and I'll go to the log and it's downloading TensorFlow and making changes so that updating the permissions of the container so that it does not need permission to run and it is going to take some time. Five minutes. Yeah. So I can forward this it's pretty much just installing TensorFlow. So I can eat your 90 meg binary. Just the TensorFlow. And once the TensorFlow is installed it's pushing them into my internal registry and now you can see that I have a new image which is good at TensorFlow runtime then what we do is we do an S2I build of our application code so I will take the CUDA tier front time which I created in the previous step I upload my new template which will do an S2I build of my application and if you see that I'm just renaming my application to TensorFlow sample changing my S2I base image to use the one I created in previous step and it's sourcing it from my screen tag that is my internal registry and that is the docker file it will use for the S2I and this is my GitHub repo the branch is GPU S2I now if you see that this step was pretty fast so imagine if you combine TensorFlow as well as this it's going to take all that time for just building your application code which is not very optimized so you see that it's already started the build S2I symbol script that ran and it has started training the model as well and the model is also trained and saved at that location inside the image and it has pushed this image to my image stream so I've got like the application image that will have the trained model saved into it now all I need to do is run all I need to do is just run the workflow and to that I would be using chris platform to do that so chris fixed up the input from a data store so right now this input folder for TensorFlow app has my image that it has to identify which is number one as you can see here these are handwritten images so they look a little bit messed up but this is number one from the MNIST set and then all I do is this is my way to fire the workflow and you see that I'm pulling that image that I created and I'm saying I need GPU at least one GPU node to run this all I need to do is enter and the job has been kicked off on OpenShift you can see that for us the GPU node is one so it's scheduled over there it has started the container and it has done the inference as well to verify that you can see that it says the inference value of test images one and it has saved the image so if I just refresh I see that the output has been created and it has a file that has number one into it that covers kind of the basics of how we get things up and it's totally in using OpenShift but now as your model is there you want to start using things like artworks so it should be a GPU with some other best practices to kind of get more out of the system that's the question so we have divided how to do the optimization and GPU acceleration in two steps the first one is development and build phase so as you noticed that we have used the optimized version of TensorFlow image and by that it means that the typically available image that is given by Google is designed in a way that it runs on majority of the system but it doesn't mean that it is it is the highly optimized so most of the time even if your CPU is say x86 you would get this error which says that oh you have got like good hardware but your software is not so good it cannot use your hardware so that's why we are using the wheel file to try to optimize it to the latest version of the hardware the next is layer segregation and as we talked earlier that you might have found some importance in like how we segregated TensorFlow with the application and we are saving 5 minutes each time you are building the image and then saving the model on the image is critical for us to be able to reuse it so especially you know if you have multiple people working on the teams I still have that reuse or quick testing and even say like one year down the line your data set has become obsolete and you want to train your model on a newer data set all you need to do is reload the existing image and retrain it and not like again train it from scratch and then the people that reuse those artifacts are critical in terms of reproducing the bugs the next is hardware acceleration so we recommend that if you don't have access to GPU on your development environment you can still create and optimize your model in the development step and then just rerun the workflow on GPU nodes in the production so it's not like you're not breaking your head how to do CUDA and everything on GPU on your development machine for simple models like we've got for the sample the difference between the CPU hardware acceleration and GPU is pretty negative but imagine as the model gets more complex having that extra hardware and so here we've got links to the major projects that are part of this we'd love to have more folks contribute to it or if you know other folks in your circle of friends that might be interested let us know even from my own side I have a a child who's got leukemia who's actually being treated at Boston Children's Hospital and so that's part of how I went up all of this so I work at Red Hat from the traditional engineering teams but being touched by that and seeing how ECH and the hardware owners are trying to tackle these problems the fact that I could work on something it's a really good opportunity to learn new technology push them below UIUX folks working on the interfaces there and have a graphics multimedia guy in the drop-in of the past and so the ability to apply that to some kind of cutting edge problems is interesting space and I've been talking to some folks and obviously it's kind of branched out so give it a try we're easy to contact but we've got more folks and now we are happy to take questions Thanks guys that was a great talk I have a question though about trying to the strategy used about trying to accelerate the whole building the container and track optimize that process so what I do is I just map my source code into my container image so each time I just run that container again and again and there's no penalty because I'm changing the source code while debugging and there's no need to rebuild the whole thing each time so I'm just wondering why did you guys not do something like that one thing is well that is a good way but what if you don't want to so you're mapping a volume that is a totally fine way but what we are trying to use over here is maximize the feature that is given by S2I because that is a totally legit way to do it but this is a more recommended way and also because we are taking CUDA libraries from the source so we cannot just pull the image so we have to build from the source each time so that's one of the reason why we're doing that and again even if you have mapped the volume how do you address the problem if there's a security issue in one of the layers you will still have to rebuild all of them right so it's not just we are solving the problem at the application there we are trying to solve the problems at all the layers and one of the challenges we have is finding a meaningful sample and so most of us are already going to be container working but the thing is to get the production acceleration just to provide an example close to the graph and modify those important parts of this effort hi I was just looking at the demo so when you're doing a build and then you're creating a model and then are you deploying that model to TensorFlow serving or or you just have that container and then we are not using serving over here what we do is just we save the model inside the container and then we just run the container but if you have another container that was running serving you can then go share operations through that but in this demo we don't use TensorFlow serving why didn't you combine the tensor rt provided by by NVIDIA they publish containers that already wrap around most of the tooling you presented what I understand there's again the licensing issue over there and what we try to do is build a solution where you don't have to pay a dime part of the piece was around the project crystal that we used so that means we didn't have it and some of the work we moved around that was I feel like one of the most difficult part of building that base image is creating the assemble square because you need to define like a standard where you're going to receive the model or where you're going to receive the data or whatever how you're going to run that or how you're going to pick that data and execute that do you guys have any guidance on how to build that assemble script specifically for that image because I saw that you run and you train and you execute and you save the model so I'm guessing that you need to have some standards to run that against like a specific path or receiving that in a specific path because you are not modding volumes right you're just receiving those in a specific place in the sample code there is some pieces there where there are all locations for put libraries and some pieces in code and one of the other challenges of running in an open shift environment as opposed to running locally is in the user and so that there's some my stuck cave in glue that we've done in here to kind of keep things happening so like when I run locally wouldn't work with open shifts so we don't have to go back and tweak that to run as you know user ID in 1001 and stuff like that yeah I'm trying to show the assemble script that we have you can yeah and so user ID was born in the location the ability to use it in equal way right so as we launch that container we'll command line it in those sorts of things we've got some fairly standard things that we use to try to make sure that's consistent so if somebody you can grab this and just want to swap a different model see the model of the model you can take the sample as it is take a model you've developed don't square it you can check out our assemble script but for us the biggest challenge was how to modify it so that it can run in open shift because open shift doesn't let you run a container with root and when you build containers locally as Bill mentioned yeah so it's like there was a small tweak that we had to do but it's there that's the assemble script a lot of the image that we provide on the registry the do as it is like a fixed permission script right that is all we have to present are there any other questions so like you mentioned there's like licensing issues with CUDA right so are the are CUDA alternatives on the roadmap for you guys like rock'n'roll or something is less our roadmap and more TensorFlow right yeah yeah and it's open source I think like why we chose CUDA over here is because when we were kind of like pulling what people find more problematic like myself I was I had a really hard time using CUDA when I was at university so that was the motivation behind it and also from my side we donated some of the hardware that was in this BSI environment for other projects I was doing specifically in the hardware so it was kind of cool thanks thank you guys