 Hello everybody and welcome to another OpenShift Commons briefing. This time again, we're going to be talking about TensorFlow. It's one of our more popular topics these days. Subin Maudel from our OpenShift and RAD Analytics group will be giving a talk about containerizing TensorFlow. We'll toss into the chat a link if you're interested in this topic and want to hear more about it to sign up for the machine learning special interest group. The way that we do this is you can ask questions in the chat. I will try and answer them, but we'll have live Q&A at the end too. So stay tuned and stay on afterwards and we'll do that. But right now we're going to let Subin take it away and introduce himself in this topic. So there you go. Thank you Dan. My name is Subin Maudel and welcome to today's OpenShift Commons briefing. I'm going to talk about containerizing TensorFlow applications on OpenShift. I currently work in Red Hat and I work in a group. Okay, Subin, you were hard to audible. Okay, can you hear me now? It's better. Yeah. Okay, I'm adjusting the mic once more. Okay, testing my audio once more. Is it, am I audible? That's better. That's much better. Okay, thank you. So I work in a group in Red Hat called Rad Analytics. This is a community effort trying to empower people to do application development on OpenShift with respect to data-driven applications, machine learning, etc. And past few months I've been working on TensorFlow and trying to do some TensorFlow applications on OpenShift. So I would like to share some of my learnings and notes in today's briefing. So this is basically sharing my notes to the wider community and not really a tutorial as such. I would like to talk more about how I used OpenShift for TensorFlow application development and some of the workflows which I created for creating these applications. And you'll find quite a lot of notes from my side. So in some places it might be dense, in some places it might be light with respect to technicalities. So, but unfortunately this session cannot be something like a deep dive into TensorFlow or, you know, any of this machine learning or OpenShift platform. So I try to align my notes with respect to building, creating, developing and deploying TensorFlow applications. So this is the agenda for today. We're going to review some concepts like the OpenShift source to image, OpenShift templates and then talk briefly about TensorFlow, try to understand what training and inferences and where GPUs can help. And also talk about TensorFlow models. So then we will talk about building TensorFlow binaries from source and why we need to do it and why it is complicated sometimes. And then talk about creating Docker images for TensorFlow applications which can be used with source to image and which can be deployed finally with the TensorFlow model. And then I talk about how you can develop using Jupyter notebooks on OpenShift where you can use underlying GPUs available. And finally, once you have done the training on the data and you have the model ready, you can deploy this TensorFlow model as a prediction endpoint, as a service endpoint in OpenShift. So just to review some of the concepts, maybe some folks are not familiar with OpenShift and source to image. Source to image is a feature in OpenShift. It's a feature in OpenShift and also it's a tool. What it gives you is it lets you take, go from source code to a final application image which can be deployed on OpenShift. It is actually currently integrated into OpenShift and there's also a command line tool for it. It also combines a builder image with a source code in GitHub to create this final application image. So this particular chart describes about how source to image works. The link below has an excellent explanation of the internals of source to image, which is also from OpenShift Commons briefing. So here in this flow chart you can see that a user writes the code using his choice IDE and then commits to GitHub. And from the GitHub repo, you can build using the builder image. You can create this application image from the source code and the builder image. The builder image is sometimes created by the same developer or it can come from some other place. For example, Red Hat provides Python builder image and the user can just develop the Python application code. And with this Python builder image as an example, we can create this application image which can run the Python application created by the user. This final application image can be pushed into the Docker registry and then can be used by OpenShift to deploy that application. You can have a single instance or multiple instance of this final application image. Just to break down this particular flow, what exactly happens in a step-by-step case? The first thing what happens is that you need to provide the GitHub URL and the code is pulled from the GitHub. And then S2I builder image is pulled and both of them are pulled to the node in OpenShift and an assemble script. This assemble script as part of the S2I builder image that is run and after that a new image is created. And finally, this application image is pushed to the internal OpenShift registry. And finally, when you want to do a deployment using the deployment config, this particular final application image is loaded and the run script is executed. So just to look at the logs, if you have a build running and if you look at the logs, you can actually see the step 1, 2 and 3 which is pulling the GitHub code. For example, here you can see there's a neural style GitHub source code which is getting pulled and a certain steps in the assemble script is executed here. And finally, when the assemble scripts are executed, the steps 4 and 5 where you basically push the final application image back into the registry. And once the final application image is ready, you can actually deploy this application image as a running container. Now we're using this S2I probably exploiting to cater to the different needs of developing applications with TensorFlow. A light introduction to TensorFlow Next. TensorFlow, as you know, is an open source library from Google used for machine learning and it's pretty popular for deep learning applications as well. In TensorFlow, the main constructs are in terms of data flow diagrams graphs, sorry, data flow graphs which define computations on the input data. These operations of the graph, each node in the graph denote an operation and these operations are kind of like a blueprint of the execution. On the data and this blueprint is executed in a session and there's a separation of bloop creating the graph and executing the graph. The main reason is like for example, if you use NumPy and other the tools, most of the calculation is done within the Python runtime. But in case of TensorFlow, the we design this graph of computations and then we give this graph to a session and this session executes. The operations in the graph outside the Python runtime. So looking at the code, let's look at the code and try to understand what I mean by graphs and sessions and operations. So here in cell number 38, I take the default graph which comes as you load TensorFlow and I look at the operations available in the graph. So I can see there is nothing there and I then try to create a simple multiplication between two tensors. Tensors are like a multidimensional arrays of NumPy. And if I look at the operations again in the graph, I can see that there are two operations A and B and then there's a multiplication operation between A and B. And if I try to find out what is the value of C, I notice that there's actually no value in C but C itself points to a tensor. And to find out the value of C, I need to actually create a session object and then run past that graph to the session object so that it can run that entire graph and calculate these values. So this is a difference compared to other frameworks with respect to the way the code is executed. In the latest code 1.4, they have changed this thing. They kept this session and graph same, but they have added a new feature called eager execution where you can actually find the value of C instead of running it within a session. For example, the line number 8 will actually provide the same output as cell number 10. And moving on, other concept which you should know is what is training and inference. If you take a standard supervised learning pipeline, we start with a data, a labeled data and we extract the features and we train the model. Train the model mathematically means trying to find the loss function and optimizing the loss function and minimizing it using gradient descent and finally evaluating the model to see if the other rate is less. And then finally, when we are satisfied, we can have this model which can be used for predicting new values. So you can see that after the data is trained, the model is then passed to the next prediction layer where new data comes in and the model which has been trained is used to predict the new data and extract the labels. So with respect to TensorFlow, what is this model and we need to understand what this model is and how it works. So when you train a new neural network with TensorFlow, you can actually save that particular graph of operations which is the neural network in TensorFlow and you can save it with all its weights and parameters and its network architecture. So that you can actually use it later for production or use it later for doing a retraining on existing models. So what does it mean? Let's take a simple example of this. Again, I import TensorFlow and I reset the default graph and I create few placeholders a, b and I basically am trying to do this calculation a plus b multiplied by c. I do this multiplication and as you know that although the values are passed, the calculation of this final a plus b multiplied by c is not executed unless you call the session.run object. So here what I do, I create these variables and placeholders and then I call this API method called saver which is available in TensorFlow. And what it does is it saves these, this particular graph into a TensorFlow model. Excuse me. So TensorFlow model is consisting of four data for the types of data. One is the MetaGraph. One is a data file and the index files and checkpoint files. The MetaGraph is the actual TensorFlow graph with its operations and variables and etc. The data file contains the actual values and of the variables in the graph. And the index files contain the different checkpoints which you do while iterating to minimize the loss function. So in this example, let me just show this online. So here's an example where I do this operation in the graph a plus b multiplied by c and once I invoke the saver API and save that model from the session. I can see that in my file system a couple of files called Meta index data are created. The Meta represents the graph. Data represents the values of the variables. So for example, here in this example, I have stored a value called 2 in a variable called bias. So this data file will contain this value 2. So the model contains a graph containing this operation a plus b multiplied by c and it also contains a saved value of a variable c called with value 2. Now once these model files are created, I can transfer this model files to any other system and in in assume that I am on a different system and I can take consume these model files and I can use this API available in TensorFlow called import MetaGraph which consumes the Meta files and recreates it graph of the operations a plus b multiplied by c and it also from that you can also extract if there are any saved values. For example, the bias value was stored as 2 in the previous code and I can actually print that value out. You can see that I can print it from that graph which I have extracted from the file. And then finally once I have the graph, I can pass new values into into that graph and compute the results. So for example, here I passed 13 and 17 into that operation and I get the new value 60. So with this simple example, what I'm trying to show you is that here is a training and I train a graph and I save the graph as model files. And then next step, I put that model into a different place where the production environment is and I use that model files to load the model files to invoke that graph with the new data. So just to repeat this diagram on small this model I can get after training and I pass this model to a different system where the new data is passed to this model and in TensorFlow this model is the graph. So here I save the graph here and in this green color box I load that graph again and I pass in the new data to get the output. So this is one way you can use TensorFlow graphs to save it as these model files. There's one more way which is called TensorFlow serving. TensorFlow serving is one of the sub projects of TensorFlow where TensorFlow serving is a flexible high performance serving system for machine learning models. So with TensorFlow you can save the same graph model into a format called saved model. So this saved model is something from TensorFlow where the saved graph model using the saver API, there's a higher level abstraction applied to the saver class to cater to this high performing TensorFlow serving server. I can show you some examples of how it might look. For example, I have in this particular project created couple of saved model files and these were very simple model because of MNIST it's not really complicated there. So here's how those files from saved model look. They end with .pb extension and there are some variables associated with the saved model. Again, there's a data file which contains all the values and there's some index which basically points to different checkpoints if you are saving the models at checkpoints. So what is TensorFlow serving and how can you use it? If you have played with TensorFlow, you would probably have used just pip install TensorFlow, but you would not have seen this TensorFlow model server. There's no way you can pip install TensorFlow model server. For example, you can see here that this is a TensorFlow model server which I created for GPU and it's around 562 MB file. You need to actually build TensorFlow model server from source and it's really a complicated process and I will come back to how to build it later on. But once you have this model server, you can invoke it like this where you can give it the model name and then you can point this to the path where this particular file is. Let me just show you that file again. So you just need to point TensorFlow model to the folder where variables and saved model is present and what TensorFlow model server will do is that it will pick up the saved model format file and it will start executing it. Sorry, the font is pretty small, but if you can look at the slides later on, you can notice how it starts. And there's a prediction endpoint created by TensorFlow server at port 606. There's some other tools also, which is available from TensorFlow. One is called saved model CLI. So these particular utility is pretty useful if you want to look into what a saved model file contains. For example, this particular command I have used to explore the saved model file which I just showed you on GitHub. So if you look at what is printed out, you can figure out that there's a couple of two methods available in saved model. One is a method name called predict and another one is classify and the predict method consumes input which is float with the shape of minus 1784, meaning that it's an array of 784 elements. And then it takes one more value called dropout. So these are the two inputs for this method called predict. And once these inputs are given, it actually gives you output called scores, which is a float of 10. So this particular the saved model is used for MNIST. MNIST is a dataset for identifying digits. Since there are 10 digits, you can see that the output scores is for 10 classes, which is from 0 to 9. So this is one of the other useful tool which you can use from TensorFlow to actually look into the saved model and to figure out how I can consume a saved model. Assume that you get the saved model from online, some source, and you want to write an API or something to consume the saved model to do some prediction. You need to use these tools to actually figure out what is the spec of the saved model? What is the input? What's the output? What's the shape of the input and what are the methods available? What can I do with this particular saved model? More information about this is available here. You can actually go to the link about saved model. You can find everything about saved model here. There are a lot of options about saved model. This is something which I've been frequently using right now whenever I need to debug stuff, whenever I need to write clients with saved model. So moving forward, what did we cover so far? We tried to cover what is a TensorFlow graph? What's the TensorFlow model? What do you mean by it? What's the saved model? And what's the TensorFlow serving project? What is the TensorFlow serving model file? And what is the source to image in OpenShift? So we're now going to use all this to create TensorFlow applications and deploying them. Before we jump into writing TensorFlow applications, I need to cover the next part of my agenda. So we did the review and the next part is about building and creating and setting up stuff so that we can actually begin our work. So one thing I notice is that if you use pip install TensorFlow, the binary which you get is something which is not optimized. So if you just have CPUs with you and no GPU with it, and if you're using the default TensorFlow binary, sometimes you might not get the performance you can get if you actually build it from the source. There's a reason behind it. When you build TensorFlow from the source, you can actually apply a lot of compile time optimizations to the TensorFlow code so that you can get better performance from the TensorFlow binary which you created from your own source. So if you have tried building TensorFlow from the source, it is really complicated. They have used a tool called Bazel, which is really complicated to you. It's not complicated to set up, but it takes a really long time to run and when things go, when the build fails, I mean, you have no idea how to fix it. And to get the environment itself correct for the Bazel command to execute, it's kind of difficult. So I spent a lot of time trying to figure out how exactly to build TensorFlow. So after a lot of efforts, I found that having a S2I build image, which I can use to build TensorFlow binaries, it really eases my trouble of building TensorFlow from source. I can give you one example where a couple of days back I had to move from 1.3 TensorFlow to 1.4 and I couldn't find a 1.4 binary and I had to build it myself. And when I use this particular S2I, I was able to do it. Going to this particular GitHub, I want to talk about how exactly it works and how exactly you can use it. If you have not used S2I before, it doesn't matter. If you want to use this particular project, the TensorFlow Build S2I, all you need to do is you need to go to this template. Every project which I'm going to show today has a template and you need to take this template and you need to go to your OpenShift and create a project. Go to a project and add this particular template, which I'm going to open right now. So I'm back to OpenShift, create. So I'm not going to create it, but I just want to talk about what are the different things this template provides. So this particular template, what it does is it downloads the TensorFlow source code from GitHub from the master repo. And it gives you a whole bunch of options, which is consumed by Bazel and TensorFlow to build TensorFlow binaries. So there are things like if you're building it for GPU, it's different. If you're building it for CPU, it's different. You need to define the CUDA versions, the CUDA NN versions. A lot of information like which path to use, whether you need to use a CLANG or GCP or HTFS, stuff like that. So this particular TensorFlow kind of gives you all these parameters where you can just fill in those parameters. And above all, there's this field called Custom Build. So this Custom Build, you can actually pass in the build command to build TensorFlow. For example, you can see here in the dropdown, there are two commands which I use. The first command does a Bazel build for creating TensorFlow PIP file. And the second one is actually the one which creates TensorFlow PIP file for CUDA. So I pass this and what happens is that TensorFlow build, for example, you can see here, there's a build which happens once the build completes a pod, a container pod starts. And in the container pod that particular Bazel build command is executed. If I can see the logs, let me see. So you can see all this is basically the complicated build process which TensorFlow follows. And finally, what I give you is that if this build is successful, you can just click on this particular link. And when you go to the link, you can actually see the binaries available for this particular build. Coming back to the page to find out what is that I have built. For example, this particular project I have named it as model server. So I'm trying to build a TensorFlow model server and you can see that I have built it successfully here. So this is a binary which is available for consumption. And this particular binary is for GPU. So I would have enabled a lot of parameters which are related to CUDA. For example, the CUDA version is set, the CUDA compute capabilities, and the most important verb called TF need CUDA, which is set to one. So this thing, this particular S2I build for the TensorFlow binaries builds the TensorFlow model server. And if I look at the other projects which I have, I have one more for this particular build. Successfully built the wheel file for TensorFlow 1.4 for GPU. Let me just close that and close this and let me go back and see if I have created any other builds. So I have also created a TensorFlow wheel file for CPU. And okay, so I have been able to successfully build that. So you can just play with the custom fields pattern in the template and pass in the appropriate Bazel build command to create whichever binary you want. So now, again, moving back to that particular repository, the README contains all the different options and values and how you can use this and some examples of how you can create this binary for CPU and GPU, stuff like that. And the reason why we need to do this is that if you pass in these optimizations and if your platform actually supports these optimizations, TensorFlow is actually faster than the default pip installed TensorFlow binary, which is available. So coming back to the slides. So this is the reason why I built TensorFlow from the source and this is the project which I use to build TensorFlow. The next part is to create Docker files. We need to have CUDA and the CUDA neural network library in the pod so that the TensorFlow applications can work. Now, the thing about CUDA is that this thing is bound by the NVIDIA license. So you need to be very careful with this. You cannot simply download and distribute this and you need to ensure that you extend from the right Docker hub image. So for example, this is one particular image which I created for NeuralStyle, which is an application which I'm going to talk later on. And here I am extending this from the NVIDIA CUDA 9 and the CUDA NN 7 runtime, which is actually allowed by NVIDIA. So you can actually extend from the runtime Docker image and then you can create your application Docker file from that. So I will explore the Docker file some more for this application. So the NeuralStyle is one application which I'll explain later. And for this particular application, the Docker file I extend from NVIDIA CUDA and then I had to set a few environment variables to make TensorFlow figure out where the CUDA home is, where the CUDA path is, etc. And after that, I need to install a few other binaries of CUDA, which is not available in the default image from NVIDIA. So I download the CUDA RPMs from the NVIDIA repository. And once this particular binaries are installed, I can go ahead and I can install the TensorFlow binary, which I built myself, and the TensorFlow will then continue to work. Else it will complain saying that CUDA libs are not found and you cannot basically use TensorFlow on GPU in a pod. So coming back again to this, this particular setup of extending from this particular CUDA image might change with something called NVIDIA container runtime. But as of now, if we need to use the CUDA images in the community, this is the way to go. So just to repeat that the runtime images are this and you can extend it. And if you choose to publish on Docker Hub those images, you need to carry this NVIDIA license and you should not distribute this for any other purpose other than probably community efforts. So the next thing which I want to talk about is setting up OpenShift with GPU. Before you set up OpenShift to identify GPUs, you need to actually figure out what is the NVIDIA driver and what is the GPU available on your system. So you need to go to your system and figure out what is the GPU available and then go to NVIDIA website, download the NVIDIA driver. For example, in my cluster, I was using Tesla M-Class M60 for Linux 64-bit and I choose to use CUDA 9. So I had to download this particular driver to get OpenShift to work with GPU. Now the steps for setting up OpenShift with GPU, there are quite a lot of steps. I have documented these steps here and there's also a blog from OpenShift guys here on the OpenShift blog. You can refer both of them and figure out how to set up the OpenShift node so that the OpenShift can identify the GPUs. I don't want to go into them because they're nothing but just files of all the commands which are executed. So once you have created your own TensorFlow binary and once you have created a Docker file extending from CUDA and then you have created a setup OpenShift with GPU, the next thing you need to do is you need to create these templates so that you can actually invoke, sorry, you can actually deploy these templates so that the application pods can consume the GPU and you can actually do some training in that part. So the template changes with respect to OpenShift and GPU, the major changes are here. One is you need to add this node affinity parameters here where you need to feed in a key called alpha Kubernetes IO NVIDIA GPU name and you need to figure out what is the GPU name for the GPU on your system and feed in here. So this is something which you pass to the deployment config. So this is the first param and the next one is that you need to actually set the resource limits on how many GPUs your deployment config will consume. Since my systems only had a single GPU, I will be only using a GPU limit of one. And these patterns are actually mentioned below. I will actually go and deploy this particular template on a GPU cluster. And I will talk about how having these templates actually help you deploy them. Let me just go to that soon. Okay, so I'm just importing that particular template. So once the template is created and if you want to deploy that application from the template, you can see that you can define the GPU image which you need to give. Here these GPU images are mined and you can replace them with the GPU images you have. And you need to provide the GPU name. For example, minus Tesla M60, but yours might be some other class. And how many GPUs do you want this particular application to have? So having such templates for each of your application is actually good because people can actually consume these templates and deploy them on OpenShift to see if your Docker images or your applications can be actually executed on the OpenShift cluster. And basically repeatable research. If you have some project and you want to test it, this thing is having a template which works. It saves you a lot of time. So most of my projects have a template in them. So all the projects which I share today as part of the slides will be having a template.json file, a template.hyphen.gpu.json file. The gpu.json file is something which you can deploy on a cluster which has GPU. And template.json is for the cluster which doesn't have GPU. So coming back to the slides again. So we have reviewed and then what we have done is we have talked about Dockerfile, setting up OpenShift with GPU, the templates. And now we jump into developing applications. So what I notice is that everyone is using Jupyter Notebooks and having a Jupyter Notebook for TensorFlow is something which kind of helps me develop sooner, iterate faster. So this is one particular project which I created under Radialytics.io which is a TensorFlow Jupyter Notebook image. So this thing is having a Jupyter Notebook in CentOS 7 image. One of the good things about this particular image is that if you deploy this on an OpenShift cluster, for example this particular Notebook is deployed on OpenShift cluster and I use this for my development, you can go ahead and install any of the libraries you want within the pod. For example here I have a TensorFlow binary which was pre-installed in the pod with version 1.2.1 and I needed to upgrade it to 1.3.0. So I did a condi install TensorFlow. It went ahead and installed the binary and finally I found that I had upgraded to 1.3.0. I could, if I want, go and download the binary which I built myself. I can do a wget here and then do a pip install of that real file and I can upgrade from 1.3 to 1.4. And this particular project, TensorFlow based Notebook, this is what I use for developing applications and iterating over them. And another thing about this Notebook is that I have put TensorFlow Model Server available in this particular Notebook so that if anybody wants to play with TensorFlow Model Server or the saved model format, you can go ahead and play with this binary from the TensorFlow Notebook. And let me just talk about how I've been developing applications on TensorFlow. I usually write all my code here and once I try to save the model into the model files and then what I do is I always check if I can consume the same model files and invoke the new data on the model file using a REST API. For example, I have created a sample Flask App here where I create a prediction endpoint and the prediction endpoint, all it does is it consumes a request into the input field and here I consume the model file and I execute that particular graph by inputting the new input which has come as part of the REST request. So this is a way you can actually test your model and practice application development so that you can do the training and once you're happy with the training you can save the model and then publish the model onto some Dropbox or some place or Google file store and from there somebody else can consume that particular model and they can develop applications or Flask application or something else and create a prediction endpoint which can be consumed by some other web application. For example, if you look at Google what they have done is that they have published a lot of models and they have also published a website where you can actually go and input new data and it will do prediction on the new data. So this is kind of a workflow which I have been following and having a TensorFlow notebook which has all these capabilities of installing new libraries with pip install, content install, doing a WQT, doing a Git clone and trying to play with all these code, play with others code and do a testing, create a Flask app, test the Flask app kind of helps me create TensorFlow applications sooner. So this is what I've been using gradually for development and I have a demo which where I can show you TensorFlow notebook which is used for both CPU and GPU. So here's one cluster with TensorFlow notebook and if you use that template file which I've shown you in the GitHub and if you deploy it on OpenShift you can actually find a pod with the route and if you click on the route you can actually you can go to the Jupyter notebook you can create files and finally you can save it and push it back to your GitHub and there's one more cluster I have this is a cluster which I have for the GPU and in this demo namespace I have a Jupyter notebook here. As you notice the pod is set to 0 mainly because I have scaled it down now there's a single GPU here and if I create multiple applications all the applications will be fighting for that single GPU so I basically disable all the pods and for the demo purpose I just scale up single pod at a time per application. So coming back to the slides again let me just do a demo of this let me just scale that up and hopefully it has come up yeah great okay I'm going to create a new code here so that I can show you that it works so what I'm going to do is I'm going to look at the tensorflow here what is the version and I'm trying to upgrade it import tensorflow as tf so there's no tensorflow there that's fine so I'm going to do a slight cleanup activity of removing if any existing tensorflow or numpy is there and after that what I'm going to do is I'm going to what I'm going to do is I'm going to get tensorflow binary which I created myself using the tensorflow build source to image project and this thing is for GPU and I'm going to get that and I'm going to install that in this particular pod so it's going to take some time and I will just copy paste the piece of code which I plan to execute on GPU I'll just wait for it to complete so it looks like I have completed it yes I have completed it so you can see that that particular cell is executed so now what I will do is I'm going to show you nvidia so my I'm just going to log into that particular machine which has GPU to show the GPU activity okay so right now you can see that I have a single GPU in my node called Tesla M60 here and what I'm going to do is that I'm going to execute some code from the python notebook to show you that GPU is actually being used so I'm coming back to my notebook and I'm just going to execute this particular multiplication operation where I have two tensors I'm just multiplying both of them using the session object and let's just execute what happened okay I need to restart the kernel sometimes restart let me test if tensorflow exists yes tensorflow exists and I want to also show you what is the version of the tensorflow which I have installed okay it's 1.4 and let me just execute this and come back to the nvidia SMI you can see that now a process name has shown up here which is the condom in python so this is basically this particular code tensorflow is executing this particular operations using the GPU which is accessible from the pod running jupyter notebook and you can see that proof here where you can see that the PID and the process from the jupyter notebook pod okay so exiting this coming back to the slides so we understand how we can deploy it and what are the bits which we need while doing the development we go to the next thing called deployment and I'm going to show here some fun apps which I have created one particular application is the MNIST app MNIST many people consider it as the hello world application and if you click on this link I have actually we have a blog on rad analytics talking about this particular application and how we have developed this application we have used source to image what this particular application contains is it contains two services prediction service one and prediction service two and each of them is having a tensorflow model and each model is different one is using logistic regression and one is using convolutional neural networks and there is a web UI which is connected to both these prediction services and from the web UI we are actually sending data new data to the model and the model does the prediction and replies back with a response I would like to give the credits of course to Mr. Yoshihiro Sugi who developed the UI but the back end was developed by me so you can go through this installation steps and figure out how to you know create this application again I always give a template in every project I have so you create a project and then you deploy the MNIST web application template and then I have this tensorflow serving and points template and once these two templates are created I create two prediction endpoints now these two prediction endpoints what they do is they consume a tensorflow model and then they serve it so here is one particular endpoint where I consume a tensorflow model which has been published on github and here is another tensorflow serving endpoint where I consume another tensorflow model and I create that prediction endpoint once I create these two prediction endpoints called TFreg and TFCNN I pass in these service endpoint names to the web application so for example you can see here that I create a new MNIST web application with prediction service one as the first service endpoint which is here and the second prediction service endpoint called TF-CNN which is the second prediction endpoint I created here so just to show how it works and what it is I have created this particular application on my OpenShift cluster I will create three parts so you can see here on the bottom two prediction endpoints which consume two different tensorflow models and here is my web application and this web application what it does is it does digit identification using MNIST data so I can give it input and this input is sent to both the prediction endpoints in different models so you can do a kind of A-B testing with this for example whatever input I gave it predicts the collect value as the number let me give a different value here for okay again both of them seem to be predicting similar value let me give a different value slightly angled so you can see here this I wrote the digit 5 in a slight angle and you can see that the model 1 kind of failed to predict what the value was but the model 2 is pretty good it was able to predict the value I'll give you one more example and then I'll go to the next application so here you can see that I wanted to write a 7 but I just stopped in between and you can see that model 2 looks things that this is a 2 I think this can pass as a 2 but definitely not as a 3 which model 1 things so moving on to the next application on a high level the architecture looks like this the way I developed this is from Jupyter Notebooks and you know writing code in the Jupyter Notebooks and then finally putting into a Python script and again a high level architecture of how I did it notebook on OpenShift with GPUs. I did the training I had the MNIST data created the model published to GitHub all the models which I created got published to GitHub so this thing is similar to the pipeline which I discussed earlier and I use a S2I build driven prediction in point deployment where the model from GitHub which I created and published in GitHub model to my S2I build config and the build config consumes the model from GitHub it consumes the TensorFlow binary from some other source and it uses TensorFlow serving builder image and it creates an application image which once it runs it basically starts a prediction service endpoint but just to tell what that prediction endpoint looks like so for example this thing is a user network prediction endpoint this is a pod this prediction endpoint is hosted at port 6006 if I just look at the logs you can see that this thing is a TensorFlow TensorFlow model server you can see TensorFlow model server and it's running the model server at port 6006 and this particular web application is actually sending the input data 2606 and it is getting back the labeled output moving quickly to the next application which is NeuralStyle I have another five slides so NeuralStyle is a process of using convolutional neural networks to migrate semantic content of one image to different styles there are couple of applications in the real world some of you might be using Prisma app on your iPhones this thing is another website which basically does the same thing so NeuralStyle transfer this where you have this particular input which is on the left hand side the diagram of let me take a better example yeah okay so you take an input example for example here it's a Mona Lisa and the right hand side is a content of it could be a painting of a well known painter and you transfer the style from right hand side to left hand side and to get this particular output where you know it still looks like Mona Lisa I think but you know it has this funky style of this particular artist and again another example where you know image of a woman transferred into this particular you know it looks like a wood carving or something so this semantic content style has been transferred to that particular image so this application what I have something coming back to slides so the way I have developed this application is slightly different from the MNIST app here if I come back to the MNIST app the GitHub repository contained the model but in this case the GitHub repository contained the source code where the source code for the actually creating the model for the new style transfer so I again use S2I build I consume that TensorFlow code from the source and I create this application image and once the application remains image starts it does the training at the start time since we have GPUs for this particular application the training time is actually very less and within so when I compared the training times between CPU and GPU there was almost 98% reduction time in our training the training on CPUs for neural style transfer took me like one hour sometimes one and a half for a thousand iterations and for the same number of iterations on a GPU it took me like two minutes sometimes three I just want to do a quick demo of it and this thing these links are the source code of this particular application coming back to the web application so I am coming back to the GPU cluster I have I am going to scale down my Jupyter notebook app so that I can deploy one more application which can consume GPU so I have a neural style application deployed but there is no pod running what all I need to do is scale up this particular pod and it will trigger a TensorFlow training and once the training is done it will actually consume a test image and convert the test image to a different image after the style transfer is done let me show you a work example of that okay so if you go to this particular project again you will find a template you can just deploy the template and it is going to create this particular pod and once the build is done the pod will start and when the pod starts the training will continue and once the training completes a model file a TensorFlow model file is created and I kind of host it to show what exactly happened so for example what I am trying to do with this particular code is that I give it this input content and I try to convert that input image to this famous painting style and this is what I get output so this is you can see the content style has been transferred to this image and I put some metrics here for example you can see the time here is it is actually 3, 5, 4, 3 seconds which is about about 1 hour but if I do the same thing on GPU it is going to take like 3 minutes it has not yet completed but once it completes I will come back to it moving on my last application which is the Inception app Inception app is based on a dataset which is available in Google the dataset is called ImageNet it is a large scale image dataset somewhere introduced in 2010 you can go to this particular the link and it will show you how it is deployed I have an example of that particular application just to show you what it does so if you follow that particular template it is going to create again two pods one is the web UI and the other is the TensorFlow server is serving the Inception model and the Inception app is a web UI which looks like this where you can just give it input for example I pass it a dog dog image and it gives you the top five results of that particular image so the image which I gave was just to show you what it was so this is the image which I passed and it identifies that image as a Yorkshire Terrier dog and that's what it thinks and it also gives you the four other names labels for this particular input image one thing I want to show with respect to Inception app is that there are about thousand labels for this particular model but if you want to create a new model for example if I give this input image of the Prime Minister of Japan Mr. Shinzo Abe and I try to ask this particular app to predict what it is it's going to tell you that it's a groom, bright groom and it's not a Shinzo Abe or some other well known personalities because this particular label of Shinzo Abe is not among the thousand labels this particular model can predict so how can you add new labels like this jumping to the next slide you can consume the published models and you can train new input data for example the Shinzo Abe is a label and you can consume the existing models and train on these existing models to create new models which can train new labels so I so transfer learning is one way we do that Github if you actually go on Github to this link you can see some of the pre-trained models published by Google there are also some pre-trained models published by Mozilla Twitter and other companies where they have their large scale clusters and a lot of data and deep neural networks where they have tried to solve different problems of image recognition speech recognition NLP and stuff like that and they have published these models to the outside world and we can basically consume these models and do apply transfer learning on these models for new data set so this is my last memo about transfer learning and we build on inception app and after that I am done so coming back to the neural style just to show you that my training has completed so this thing is a neural style training which happened on GPU and same thing I try to create this output image and if I look at the time here is 232 if I divide by 60 that's like 3.8 minutes so you can see that training is faster on GPU I am going to downgrade that scale down that image so that I can show you the transfer learning demo I want to scale this up I am just going to wait for this image to start and as it starts I am going to just talk about TensorFlow learning so with TensorFlow you can actually do transfer learning where you start with a model which has been published by Google, Twitter or some other companies which has already been trained on a set of problems and with has a predefined class of labels and you can retrain on a different problem to get a new set of labels to which the model can actually do the prediction and keep learning from scratch can take days these companies do these learnings and create this model by you know deploying on large scale clusters and taking like 10 days or 15 days to develop these models but once these models are available people can actually use these models to create our own models for predicting new dataset so how exactly it should work is where we need to write our source code to do transfer learning on GitHub and we need to use S2I build to consume this TensorFlow code and we need to also consume this externally published model into the build and then finally create the application container and in the application container when the application container starts we can provide new data for the source for the TensorFlow application to train and once the training is done and once the model files are saved we can start the prediction service I do not have this end to end working for transfer learning but just to show you the concept and how we can see it working and how we can actually try it out again I am going to open my Jupyter Notebook pod on a GPU and I am going to quickly install the latest TensorFlow for GPU so that I can complete this demo so what I am doing right now is I am trying to do some cleanup and I am trying to upgrade the TensorFlow in my notebook from the default 1.2 to 1.4 for GPU so it is quickly downloading and installing and once it it is still you know okay it is completed now and what I will do is next I am going to download a new set of files which I will show you what they are no module new TensorFlow we start so what I have done exactly here is that I have downloaded new data and this new data what I am trying to do is I am trying to take one of the existing models out from Google and I am going to train a new set of data so that I can predict these personalities like Markle, Modi, Shinzo Abe and Therese Hame and I have downloaded these people personalities photos a lot of photos around 40 to 50 photos in some cases 100 to 200 photos and then what I will do is I am going to execute some code here to actually show how I retrain on an existing model and I save the model and then I predict the new labels on these particular models so in this demo I am using a model called MobileNet this is from Google and once the model is once the model is running I am actually doing a retrain steps here where the existing model I use and I retrain the new data on this particular model and if you notice that the GPU conception for this retraining in transfer learning is very less as a matter of fact if you are using transfer learning you don't need to use GPU at all you can see here that when I did new style transfer the volatile GPU was around 100% but for this particular application it's very less so I will wait for the retraining to complete it's going to complete it's going to take some time let me just go back to the slides and cover some material for the transfer learning so what you can see right now is that I am just creating using a TensorFlow notebook to do application development but I am not done what you see here this S2I build driven deployment but once I do the application development I can basically get this thing set up and create a template and probably publish it for the community to explore this is going to take some more time and if I just look at the GPU utilization it's not much just one or two percentage going to wait for some more time to complete this as you can see the training can take a lot of time if you have a GPU you can actually save some time training in the meantime if anybody has any questions definitely I can take that I had one quick question for you and we are really running towards the end of the time limit here where are you actually running your OpenShift it looks like it's either on bare metal on AWS but where are you getting these GPUs from okay so I should explain that this thing is OpenShift on AWS so this is an EC2 instance and I am using GPU provided by Amazon also do this on bare metal as well yes but the reason I wanted to do this is that GPUs are really expensive for example one GPU is probably four thousand to eight thousand dollars for the server and if I use a GPU for one whole month I might be spending like probably you know it's I don't know maybe three hundred four hundred dollars maybe less much less actually so there's a cost effectiveness of using GPUs in the cloud and I think that that's an approach which everyone will do for the training machine learning applications that's correct we're having a machine learning panel at the OpenShift Commons gathering next week and I think that's really the approach that we're looking at it's like using some of Google's schools to do the models create the models and run them or create the models locally in your Jupyter Notebook and this whole approach with the tensor flow server is really kind of think the way people will be going yes so for my transfer learning I have completed my retraining process and what I'm going to do is I'm going to download a new image Mr. Shinzo Abe and just to see what that particular image is I'm going back so this thing is a new image which I downloaded that's a pretty huge image just to prove that that guy is Shinzo Abe yeah that guy sorry too big so as you can see that I have done the training and I have an accuracy of 86.5% and I have trained new set of labels and just to show you what those labels are see after retraining what I've done as part of retraining is that I wanted to retrain the model which was published by Google and create a new prediction endpoint which can cater to these five labels so assuming that I have a web application where I publish a photo of Mr. Shinzo Abe or Merkel or Modi these are the prime ministers of different countries I wanted my model to predict accurately whether the image falls into any of these five labels so after the retraining I noticed that I have 86.5% accuracy and I have downloaded a sample image of Mr. Shinzo Abe and I am going to ask my model whether it can predict Mr. Shinzo Abe accurately and the code is this and it's going to use the underlying Tesla GPU for this particular prediction which is not really required but you can see that after evaluation it gives you the results here and it says with a good accuracy that that particular image is for Shinzo Abe so this particular example what I am trying to show you is that I have consumed a model published by Google I have done a retraining for a new set of labels which I have highlighted here and I have tested a sample image with the model to get a correct prediction and I have done this all on OpenShift and to be frank having OpenShift and doing this application development on OpenShift has been actually easier for me to create more applications and create more templates so that in that way I wanted to say that as a community if you can look at my examples and bring out more examples and probably somebody coming forward to showcase their templates and their applications which they have developed using some of the methods which I have highlighted in today's slides that would be great hoping for more feedback from the community on this efforts thank you Well, Subin, thank you very much I have a thought I know on the Rata analytics page you have a number of these templates and we have blogged about this stuff but you also mentioned Google has a repository of all of their models for people who have plans for anything like that for the stuff like you have created a number of things here that are in your personal repo but to make some sort of repository of models that use the S2I approach available to people or do I need to do something as a community manager for OpenShift Well, some of the projects which I showcased today are from my personal repository mainly because some of them are related to GPUs and Nvidia and I had to wait for others to give me a feedback, a green light to actually publish them and I actually got it yesterday so I can actually transfer all those code to Rata Analytics so you will find most of what I have shown today available as proper write-ups as proper blog entries with detailed explanations on how it happens and how it can be done on the Rata Analytics blog soon probably under two weeks Awesome, so that's a great segue into saying thank you to you for this wonderful tutorial and explanation. It went a little bit longer than we usually do so if you can share the slides with me so then afterwards I'll try and put some markers in the recording when we load this onto YouTube so people can find the different points of entry for the different pieces of the talk and we may even split this into pieces but there's a lot going on in the machine learning workflow space on OpenShift. Again if you're interested in this, if you go to commons.openshift.org you'll find a sign up for the machine learning special interest group and next week we'll be in Austin, Texas talking with folks from across the board at the OpenShift ecosystem from Google, from Anaconda from the Python community and elsewhere about what they're thinking are the next phases for machine learning and tooling that we need to get ready on OpenShift. So again, Modil, thank you very much for today and we look forward to many more of these talks. Thank you.