 Thank you. So, welcome everyone. This is a talk about patterns for TensorFlow applications on OpenShift. I am Subin Muriel. I work in the AI COE team in Red Hat, which comes under the CTO office. My primary duty from the past one year has been around TensorFlow. So I interact a lot with Google TensorFlow build team, NVIDIA, Anaconda, IBM, and others. Anyone who is pretty much dealing with building TensorFlow on their platforms and dealing with deploying or managing their TensorFlow applications. So just to get a feel of the audience, I just like to know if anyone is using OpenShift here, like anyone has exposure to OpenShift. Can you please raise your hands? Have you used Kubernetes? How many of you are actually like data scientist or anyone? No data scientist. Have you used TensorFlow or have you heard of TensorFlow? Okay, some of you. That's good. So thank you for that. So today's agenda, what I'm trying to do is talk about a machine learning workflow. This talk is about helping you solve your machine learning workflow, especially for TensorFlow applications on OpenShift. So I will talk about machine learning workflow, talk about TensorFlow, some of the challenges of TensorFlow, and then talk about Kubeflow, and then jump right into using OpenShift and S2I, and then also talk about an interesting project, which I'm part of, which is a thought project. And I'll talk about how thought can help you take care of TensorFlow dependency issues and some of the TensorFlow container images which we provide. And then the last, I'll just conclude by giving my opinionated recommendations on how you can use TensorFlow for machine learning workflows. So a typical machine learning workflow is shown in the diagram. If you are a data scientist, you would pretty much be familiar with it. You start out your machine learning task with the data, and then you like start cleaning up and you get the future extraction. And then, and only then you start doing your machine learning coding where you actually train a model. And then once you have the model built, you validate the model and check the quality of your model, and then you create your inference endpoint, and that's where you make money out of. For example, if you are a Uber app or something, the Uber app sends a request to the inference endpoint to get the shortest path between point A and B. So that inference endpoint is what makes you the money. And all these efforts before the inference endpoint is which you have to do the hard work of getting a good performance model. And once you have the inference endpoint out deployed, you need to look at constantly monitoring it and looking at the quality of the model. So I'm not going to talk too much in detail about the machine learning workflow. This is just to give you a context because it's helpful to further understand the talk. So if you are the data scientist and if some of the problems when you try to solve, you might end up picking up TensorFlow as a machine learning library to solve some of your machine learning training problems. So TensorFlow is an open source platform for machine learning created by Google about two to three years back. It started out as a machine learning library for deep learning, but Google sees a very larger picture of TensorFlow. They don't expect TensorFlow to be a library just for deep learning, but they want TensorFlow to be a project which is going to be having a larger footprint, which you can use not only for training, but you can deploy, managing your models and blah, blah, and all the things in the machine learning workflow. So it has a very good ecosystem of tools for end-to-end machine learning workflow task. So here's just a basic TensorFlow machine learning workflow. You start with the training data and then you take the TensorFlow library, you do the training, and then you produce the TensorFlow model. And you create multiple iterations of the TensorFlow model. And obviously if you have multiple iterations, you need to kind of store it somewhere. You might take a simple solution. Just keep that here. Sorry about that. Yeah. Can you hear me now? Okay. So this is a simple TensorFlow machine learning workflow diagram just to talk about how if you are the data scientist, how you might use TensorFlow, start with the training data, you produce a TensorFlow model, and then you might have a collection of models which you might have created over like six months or one year or two years. And then you pick one of the best models out from the model repository, then you can use TensorFlow serving project, which is another project provided by Google, which can be used for creating the inference endpoint. But the landscape of TensorFlow is pretty huge. You can pretty much train TensorFlow models on CPU, GPU, on the cloud. If you use Google Cloud Platform, you can use TPUs. And the deployment models also of the TensorFlow models, you know, it covers many verticals, like you can deploy your model on iPhone, so edge devices or you can deploy it on the server. So based on where you deploy it, you need to make adjustments to your TensorFlow model. So TensorFlow provides the ecosystem provides a whole bunch of tools which you can use to do that. So when it comes to Kubernetes and TensorFlow, you might have heard of the project Kubeflow. Kubeflow project is again from Google. And what it aims to do is to make machine learning workflows portable and easier to deploy on Kubernetes and other container-based cloud. The way Kubeflow comes out is that they give you a perception when you use Kubeflow that creating a machine learning workflow, the biggest task is not creating the machine learning code. You hire the data scientist to write the machine learning code, but there's so many other aspects of machine learning workflow, like data collection and refinement and then managing the entire infrastructure, serving the model. That's where you make the money out of it and then monitoring the models and monitoring the entire infrastructure. So Kubeflow basically provides you a couple of tools which can do all these things. If you haven't explored Kubeflow, I have a small demo talks about which shows you how you can use TensorFlow in Kubeflow. I have here the latest Kubeflow deployed on OpenShift. You can also deploy it on Kubernetes. So this is a console where you, as a data scientist, you can come and the first thing you do is create a notebook. And as you start creating the notebook, you need to, let me click on the news. You need to identify which TensorFlow do you want. So you can go all the way from, let's say, 1.5 to 1.2.0. So whichever TensorFlow you want, and if you want it for CPU or GPU, you pick the appropriate one and then you define how much compute you need for your machine learning task. Sometimes, if you're not a Google or an Amazon company, most of the machine learning models are pretty small. So you don't need a lot of CPU. But if the models you're going to train are pretty huge, you might need to give almost 32 GB or 50 GB memory or larger CPU counts. And then after you provide the CPU and memory resources, you need to give your workspace volume. And once you're done that, you just click on the spawn button and you will end up with a notebook. And click on the connect and then you'll have a notebook UI. I have pre-created a couple of notebooks. One is a simple TensorFlow notebook, which does a simple machine learning task. The first thing I do when I start a notebook is I try to find out what is the TensorFlow version available, mainly because I'm pretty familiar with TensorFlow now that TensorFlow keeps releasing new new releases. And sometimes you might have CVEs in TensorFlow. So you don't want to pick up some TensorFlow library which has some issues or some bugs. So I try to find out which TensorFlow it is, and then I start coding. And once I do the training, I try to produce a machine learning model out of my machine learning task. So here I have created a model block. So this saved model dot pb file contains my final machine learning model, which does a very simple linear algebra task. But as a data scientist, let's say I'm not a data scientist, but as a data scientist, you might want to also test inference endpoint. The way you can check the inference endpoint is by using a project called TensorFlow serving. And unfortunately, if you're using Kubeflow, you cannot use that piece of project. For example, I try to execute that binary, and it says that it's not found. So the task of creating notebooks and writing your machine learning code is pretty much easy. But when it comes to deploying your application from Kubeflow, you need to do a little bit more task. What you need to do is you need to create couple of deployment YAMLs, create some customizations, and then use a tool called customize, which is a tool used by Kubeflow to configure your deployment YAML files. And then you basically do a Kube CTL build of the customization YAML files and then deploy them. So I have here the Kubeflow deployed on my OpenShift instance. So it creates a whole bunch of a lot of you know, software stacks. And if I click on the pods available, it just shows me the test pod which contains the notebook in which I used for development. So I have here a sample piece of code which you can try, will deploy, serving endpoint for you after you have done the training. But the point I'm trying to make is that Kubeflow does not sometimes give you all the tools to do end-to-end machine learning deployment. And the TensorFlow model server is something which is missing there. So the way it works is that you have a notebook pod, you do the training, and the training output is a model and that model is saved in a volume. And that volume is again mounted to a different serving pod. And you define the input to the serving by passing in the parameters. I don't know if you can read that, but what you do is you just pass in the parameters like where is the location of the model file and what is the model name and a few other details used for starting the inference endpoint. But this is done manually, this cannot be done from the UI. So this pretty much gives an example of how you can do TensorFlow applications if you're using Kubeflow. Everything is prepackaged, but there are certain limitations of developing end-to-end TensorFlow. And also one other thing is that the model produced after TensorFlow training is in a format called saved model format. And saved model format can be targeted for different deployments for edge or server. And TensorFlow also provides you some tools where you can compress or optimize the models so that the size of the models are less on edge devices and they work efficiently. So this talk is about patterns of TensorFlow applications. What we're trying to share is an opinionated approach of using OpenShift for your TensorFlow application deployment. So what we're also trying to do is share some reusable components which you can take home and then try out instead of using Kubeflow for TensorFlow application development. So in OpenShift, there's a very good feature called S2I build config. I don't know if any of you are using OpenShift. I just want to take a quick test again of the audience. How many of you have used OpenShift? Okay. And how many of you who have used OpenShift are not from Red Hat? Okay. Good. So the build config and S2I is a feature in OpenShift where you can take an application source code and create an application image out of it, which can be used for deployment. And I'm going to share how we can use this pattern of S2I for TensorFlow application development. So here is a S2I developer workflow. On the left hand side, you start with the source code. And then what you do is that you use a builder image where you can compile the source code and build and then create an application image which has this compiled executable of the source code. And this application image can be deployed to create API endpoint. So for example, if you have a Java source code on the left hand side, you need to use a Java or JDK builder image to make this JAL artifact. And this JAL artifact can be executed in your application image. S2I also has this concept of chain builds where you can extract this artifact from this intermediate application image into a final runtime image. This is useful because the application image might have a lot of extra baggage which comes from the builder image. And you can make a very thin, optimized runtime image which just contains the final compiled executable artifact. So why did I share this S2I? So my point is that you can actually map the machine learning workflow with source to image. So on the top, you can see a very simplified machine learning workflow where you do the model training and you repeat the process many times till you have a good model and then you publish the model as an inference endpoint. So below you have a S2I workflow. If I try to map both the ML workflow with S2I, I can see that the final stage is common between both because you can create a serving endpoint and here with S2I, you can create a runtime image. But the S2I does a build step. So when you want to map with ML workflow, you need to imagine the build step as a training step. So how do we do it? So let's say you have Jupyter Notebooks and you do the development and you push your first iteration of machine learning code onto GitHub. And then you start this S2I process. You do the training with the TensorFlow image here. And then after the training is done, the training produces a saved model, trained model, and that's ready to be built in an application image. And that application image can be deployed or you can use the chain S2I builds to extract the model into an even thinner runtime image, which just might have the stack for deploying the influence endpoint. So what about GPU training? You can do the GPU training as well with S2I. If you are using OpenShift, you can basically the integration between OpenShift and NVIDIA GPUs is pretty good. You can use labeling to deploy your S2I workflow on NVIDIA node based on the label and you can do the training on the NVIDIA node. But the important thing is here is that the image which is used for the training should have the CUDA libraries in it. And again, the workflow on the right hand side continues the same. You have the model, which is the yellow circle there. And you can extract that yellow model into a final CUDA runtime image which can be deployed on a GPU node as an influence endpoint. Here's another example where transfer learning, where if you assume that you have a data scientist who developed a pretty good model a couple of months back and you want to extend that model. Instead of doing the training from scratch, you can use the S2I workflow where you can take the existing model, which is in the dark green box there, extracted as part of the S2I workflow, do the retraining and get a new model and then deploy it. You can also put your model, get the model from a volume by doing a volume mount in your build config and continuing the build config steps to get the final influence pod. Assume that you have a whole bunch of models which you have created over the past six years. If you want to pick one of them and do a deployment, S2I is also pretty good for that. You can just trigger a build config where you pick one particular model from the model repository and feed in an inference image which might have a certain inference technology which could be a TensorFlow serving or a simple Flask server. And then you can create an inference server which has API endpoint. And another example is of image updates. It might happen that TensorFlow has released a new stack for inference. And the model which you have created six or seven months back has to be re-replied on the new runtime image. So if you have a pipeline like S2I, you can actually take the older runtime image which has a very good functioning model and then you can just trigger the rebuild to extract the model and redeploy in a new container image which has the latest TensorFlow serving binaries. So I keep talking about TensorFlow serving. TensorFlow serving is a project under the TensorFlow ecosystem. It's a flexible high performance serving system which can be used in production. Many of the initial effort work as a data scientist might not need TensorFlow serving because if you want to just test out your model API endpoints, you can just host a Flask server which is hosting your model. So if but if you're using TensorFlow serving, we also provide you TensorFlow S2I templates which are pretty easy to use. Here's one example. Let's say I have two different models and I want to do a A-B testing where I create two inference models and I use UI to do the inference. So the commands to do that if you're using the templates are shown here. On the top, you have two commands. The first command is basically consuming a template and you're fading in the parameters, the location of the model and the only input which is needed is the location of the model and the template would basically host an inference endpoint for you. I have this particular thing deployed on my cluster. So here is two TensorFlow server endpoints which contains two MNIST models. One is using convolutional networks, one is a regression network and if I click on the UI, I can actually do this A-B testing where let's say I give it some digit and this particular digit is sent to two different inference endpoints, hosting two different models, Model 1 and 2 and you can see the response from the Model 1 which is 0.996 but Model 2 seems better. It's saying it's a probability of 1 and let's give it a weird, yeah, so you can see that this model did not do well for this particular form. And the next important tool which I can highlight is Toth. I'm part of the Toth team. Toth is an AI-based tool to analyze and recommend software stacks for AI applications and we are available at tothstation.ninja. Apart from creating recommendations for dependencies, we also provide optimized AI stacks. What it means is that we provide optimized container images for let's say TensorFlow or PyTorch and we also provide value added recommendation for your machine learning stack. We are completely open source and we are available on github.com. So talking about why Toth is important, so I need to back up a bit and talk about when you start initially from scratch. The first thing if you want to use TensorFlow is you need to do a pip install TensorFlow. So what pip install TensorFlow does is it goes to this Py org and then it downloads one of the TensorFlow artifacts which has been compiled by Google and it has been hosted on the PyPy website. So it will just pick up the Python 3.5 or 3.6 or whatever one is available on the PyPy org. But there is a certain problem with these artifacts is that Google does not optimize these TensorFlow artifacts for different architectures. So let's say you have AMD Epic chip on your cluster, on your work station where you do this machine learning or you have Intel Xeon or something else. Your TensorFlow installed from the Google PyPy does not have the instruction sets compiled with instruction sets so that you can actually use all the instruction sets of AMD Epic or Xeon. You need to basically recompile that from scratch. So that's what Toth does for you. We have already we have a project called TensorFlow Build S2I where we have compiled TensorFlow for different machine architectures and then we have we also work with IBM and Google and we host our own PyPy index and this index is available on this URL and it also has the instructions on how to use it. So if you're using TensorFlow for your work stations do please check out this PyPy index. It will give you a better performance. And you can also find these wheels and much more on this particular repo. For example, TensorFlow model server, which is the serving inference technology, we provide RPMs for that. So if you're using any of the relevant container images, you can do yum install of this particular model server by using the RPM available here. So we also work with Google and we have hosted all these TensorFlow wheel files on the TensorFlow GitHub page. This this is listed under the community supported builds. You can see red hat wheel files are listed at the bottom. And if you click on the link, it basically goes back to the thought PyPy index. So we also tested the TensorFlow performance and you can see this is a matrix multiplication benchmark report where as the matrix sizes increase, as you can see that the TensorFlow provided by red hat takes actually less time to do those computations. Another problem after you do pip install and you do the machine learning code is that TensorFlow it's released I think every one one and a half months. So the versions keep updating and sometimes CV is critical CV is come up. So as a machine learning engineer, you might be so busy with machine learning training workflow that you might miss out that a new release has come out a new CV is there. So thought basically keeps track of this CV is and the dependency hell of TensorFlow. So this if you're using s to I and if you have your code on GitHub, you can actually make your code s to I compliant by adding going adding your dot s to I folders. And then you need to add one configuration file, which connects your s to I to the top endpoint, the top prediction endpoint. So when your code is compiled or trained in the s to I workflow, right before the code is compiled, a request is sent to the top station to request for what's the best optimized TensorFlow wheel files which I can use. So the thought recommends the optimized wheel file locations and as part of the s to I workflow, the optimized TensorFlow is installed as part of the build process. And the machine learning code training is done. And then the model is created. So this kind of helps you just to give you and what I'm talking about. I'm talking about this this workflow. So right at the training, you can get the best TensorFlow artifacts. So what this means is that the training time is much lesser. And you don't have to worry about this dependency hell, the thought will take care of it. So I have another demo, but I'm out of time. So this particular URL, you can actually go and try out the example here. This gives you a simple sample code where if you have a TensorFlow application, how you can use that with s to I and how you can use that with thought and how thought would recommend the best TensorFlow stack and you can deploy. So finally, the recommendations which we give is that please use containers. Some of the TensorFlow training takes more memory. So all you do is give more memory to your containers. And another recommendation is please use rel containers. Why? Because TensorFlow if you look at the TensorFlow stat TensorFlow internally uses numpy, then this open blast, and then there is, you know, the instruction sets. Now, if you use rel, and of course, there's a Python also in that stack. If you if you use rel containers, we actually optimize much of the layers there. So you can actually get the benefit of getting a complete optimized TensorFlow stack. Also, please use s to I that really takes care of a lot of your workflow problems. And and assume if you're using cube flow, and you don't want to use s to I. So that's pretty cool with that. But please use the AICO images, the TensorFlow images which talk provides, they actually give you better performance. And they also take care of the dependency problems in TensorFlow. And and again, the recommendations is to use a talk TensorFlow wheels and RPMs on the PI PI index, which we host, and also use the TensorFlow serving templates, which we have provided in the slides, if you want to create inference and point. And the last slide, I like to thank, you know, the panel for selecting the stock, and also all my team members listed here, who have done much of the work, creating the thought recommendation system. In the final note, I would like to say that OpenShift s to I and thought is an excellent set of tools if you're using TensorFlow applications. And it can probably take care of a lot of issues which you will face eventually. And you can just focus on just doing your machine learning task. Thank you. I have some time for a question, maybe. Yeah. There goes. Yeah. My quick question was, we kind of at the beginning of your talk or whatever, when you had the training, like you were you're mapping things into the STI model. Yeah. And I think you had training, you had like developers working on Jupyter. Yeah. Yeah. So I was actually just quickly wondering, do you expect like, do you operate are you expecting this image that people would actually be working with Jupyter notebooks? Are they taking their code out of the notebooks into something more curated that then goes down the pipeline? So what I have noticed, at least in the last two years, is that let's say any student in the university who is doing machine learning course, the first tool they interface with is notebooks. So all these data scientists who are actually coming out of universities have notebook experience. And in the companies also have started adopting notebooks as a tool for machine learning engineers. So I cannot, I mean, really, we cannot escape notebooks for machine learning because everyone is using it as their first interface. But as part of their, sorry, part of the keeping, once you're done with the machine learning code, you need to store it somewhere. So you put it on GitHub or internal Git repo. So from that point onwards, when the code is stable, you can use a story workflow to get to this end in the inference endpoint. So the question is the answer is that we really cannot escape notebooks. And we have to provide them because people expect notebooks. My question was does the notebook go through this whole pipeline? So right now it doesn't. Right now it doesn't. Yeah. Because like when you save on notebook, it doesn't go as a Git commit to the GitHub, right? So that feature is not yet there in Jupyter notebooks. But once that is there, you know, this flow is pretty small. Yeah, I agree. But we need to extract the code from the notebook into a functioning Python application code, which will be much different from the notebook code. The notebook code is more like an exploration. Again, I'm not a machine learning engineer or data scientist. I've seen people do it, they code on notebooks. Even myself when I did some machine learning courses, I did on notebooks. But the final code is we extract out of it. So that the Astro image that you use to build up the application image is is basically a Python Astro image with some some libraries on it, or and then runs off and does the external call and get the best wheel to to inject. Is that what it is? Because if you are not using the Jupyter netbook, right, it doesn't make any sense to move that into a build process. Like you just say, right, you need to extract that in a Python actually file. So that's that's that's the base image that you are going to use in the Astro is is a real Python base image. It is the base image provided by Red Hat and the registry, what kind of yeah, that's built. So the Python story images are the one from the software collections rapport. They don't have TensorFlow in them. So like even the source code on GitHub, when you dump your code on GitHub, the GitHub code does not have TensorFlow. All you might have a requirements.txt which has TensorFlow in the version. But what you can do as part of thought in S2i is that you can keep checking if that code works with the latest TensorFlow. For example, some of the code which I've written like two years back use TensorFlow 1.0. I had no idea like when 1.6 or 1.8 came out if my code would work out. So but had I had used a thought S2i, my requirement.txt would have been constantly being updated. And I had a workflow to actually see if do I need to do anything with the API calls in my machine learning code. So the workflow would constantly check if the code works with the latest release or not. So that's the kind of a benefit I see in that flow. So honestly, because I'm not I've seen some people they write the code, they don't update the dependencies for two years. And the code is really good. And if somebody wants to use it, and they tried the first step, the PIP install step itself fails. Because, you know, the latest PIP and the API, there's a change. And this API change is going to be more pronounced in the TensorFlow 2.release, which is going to come out soon. And the project of the animation is going to manage the versions of TensorFlow, that's the whole idea. Okay, that makes more sense. Thank you so much for showing up. Thank you so much. Have a nice day.