 Hi. My name is Pani Raj. I'm a technical marketing engineer in data center solutions team at Cisco. This presentation is about getting a quick high level understanding of how to operationalize machine learning pipeline. I'll be talking about how machine learning problems are solved, how to write APIs for a machine learning service and deploy at edge locations. My colleague Hasi will take us through running machine learning workflows with GPU support and he will cover Cisco converged infrastructure solutions for AI and ML. Let's get started. First, let's understand why we use Docker and containerize machine learning workflows. Biggest challenges in machine learning works are setup and reproducibility of experiments. If we take a working machine learning model from a developer laptop to a QA cluster or a production environment, it is hard to replicate the same output. One reason for this is the version of software libraries. Variation in versions of software libraries in two different environment results in different output altogether. Another important reason is because of difference in the hardware and operating system where it is developed versus deployed. Having exact environment is very important because reproducibility of environment leads to reproducibility of output. Docker is built for reproducibility. It standardizes this process with build once, deploy everywhere concept. Another challenge is with complex software dependency. Typically, any machine learning implementation will have multiple software libraries for data preprocessing, data visualization, model building, etc. It's common that a version of library is compatible with some libraries and might have conflict with few others. Docker uses containers to deploy the isolated applications. It makes sure different resources are isolated so that there is no clash in environment. If your coworker wants to run what you are running, rather than spending a day to recreate your environment by installing all the dependencies, you just build a Docker image so that it can be run everywhere. Sharing your experiments is very simple. Docker enables this machine learning code to be packaged as images. And these images are self-contained to run consistently across multiple categories of hardware. So what is guaranteed is irrespective of the environment, be it local development environment, be it on-premise data center, cloud, or even the integration with other applications, everything is standardized. You basically write your code once and deploy it everywhere without going through the hassle of version conflicts. Now, let's take a very simple example to understand how the model training is done. We use the iris data set, which is a built-in data set in scikit-learn. This data set has four features, sepal length, sepal width, petal length, and petal width. Using these four features, we need to predict the species of the iris flower, which are mainly of three categories. So four input features and three labels. Let's see the training program. I'm using two classifiers to build the model, random forest classifier and support vector classification. Doing this just to compare the result between two classifiers with the same input data. I'm importing random forest classifier and SVM from scikit-learn. Importing the iris data set here. First step is to load the data. Capital X is my data and Y will be the target. We need to split the data into train and test data set. The split is 0.75, which means 75% of data is used for training and 25% for testing. We also have a random state, which makes sure we get the same split every time we run the program. Next step is to build and train the model. First, I build using support vector machine and fit our model on the training data. On the trained model, I will calculate the accuracy of the model. Accuracy is the ratio of correctly predicted examples by total examples. We can also calculate precision, recall and F1 scores based on the use case. Similarly, I build and train using random forest classifier with 25 decision trees, which means it takes four input features and use the rules of 25 randomly created decision trees to predict the outcome and considers the high voted predicted target as the final prediction from the algorithm. At the end, what I am doing is taking these trained models and saving it as binary. For this, I use pickle. Pickle is a Python library which takes any Python objects and save it as a binary so that we can load it in some other file and leverage all its properties. Here I am saving them as model underscore svc.pkl and model underscore rfc.pkl. Let's execute the script. I get the accuracy scores and also two binary file for the trained models. Now that we have built the model, let's understand how to expose these models as APIs so that it can be deployed at edge or integrate it with another application. Let's say I want to predict the species of flower using our trained support vector machine model and we want to expose this as an API. How do I do it? I use Flask which is a web framework to expose the set of functions as APIs. Here I am trying to expose prediction logic for SVM and random forest classifier using both get and post methods. So first I import Flask and then we are creating a new web application instance by creating an instance of the Flask class and calling it ml underscore API. We define the endpoint using Python decorator that helps us to define the endpoint handler as a function. This is basically telling for endpoint predict underscore svc call the predict svc business logic. We are also specifying the kind of HTTP methods that the endpoint supports. This particular one will handle get. In the main function I will say run the Flask app on port 8888. That's it. Flask is very simple framework. We are also generating the user interface for all my API endpoints without any HTML or style sheet. The packet that will make this happen is called Flasker. Flasker is a Flask extension to help the creation of Flask APIs with documentation powered by Swagger UI. All we do is import Flasker from Swagger. Next we just have to provide API specs directly in doc strings. First line here is one line description of what this endpoint does. Next line is three dashes followed by all the input parameters. Now if you look at the script again from top we are doing all the imports. Next we are loading our pickle file. We start our Flask app. In the prediction logic for SVM I have API specs. I take four required inputs using flash request module and doing a prediction on the trained model and return the predicted value. I repeat this for random forest classifier. I have two more endpoints to implement the input from file instead of request URL. Let's run the script. It is running on localhost on port 8888. I copy the link and take it to a browser to get the output. Along with the link we need to specify the endpoint API docs after slash to form a complete URL. We will try one API endpoint and it's all fine we are getting the output. Now that we built the model successfully let's switch side and focus on dockerizing the model we built. First thing that we need to know is how to build a docker file which helps in building the docker image. There are few instructions within docker file which helps in building the image. Basically we start with a base image which acts as foundation. From that base image we start building our application image in layers. The way to specify this base image is using from instruction in docker file. In this example we are using anaconda as a base image. You can optionally specify the maintainer information using label. Next is copy instruction. Using copy we can specify the list of files or folders we need to copy from the host file system to the docker image. We copy ml underscore trial directory to the docker image which has all the scripts and the pickle files. When we run a container none of the ports will be open by default. We need to expose a port to external world using expose command. This is for the security reasons where you expose only minimum ports as and when you need them. I am exposing the port 8888 here because that is a port which is mentioned in the script. So when we start our container docker exposes port 8888 so that an external entity trying to access some URL which is based on port 8888 they are able to find it. Workdir specifies the working directory for a container when it starts. I have my scripts in ml underscore trial directory so I will make this as my working directory so that every command that we would like to execute will be directly executed without having to navigate using changing directory every time. The run instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting image will be used for the next step in docker file. Using this we take care of building or installing libraries on top of the base image. In our example we run pip install of flask and flasker. The last instruction is cmd. The main purpose of cmd is to provide defaults for an executing container. We specify the default command to run in the container here. In our example we will run our python script. Next we will see how to build the docker image. I have the docker file along with the ml underscore trial directory. Now I say docker image build hyphen t ml underscore trials followed by a dot. hyphen t specifies the name or tag for this docker image. I am calling it as ml underscore trial. dot specifies that build docker image using docker file present in the current directory. I run this command and see that the image is successfully built. I run docker image ls and it lists all the docker images present. I can see the image we just built. Now let's run a container with this image. Docker container run hyphen t hyphen hyphen rm hyphen p8888 colon 8888 hyphen hyphen name my underscore trial followed by ml underscore trial. What we are doing here is we are running this container in detached mode using the flag hyphen d. If you want to run interactively on the terminal use hyphen i and d instead of hyphen d. I am saying remove the container when it stops using the flag hyphen hyphen rm. I specify hyphen p flag so that port 8888 on my host machine binds to 8888 of the docker container. I am giving a name to the container using hyphen hyphen name flag. Now if I execute this command it returns the container ID. I can confirm that container is running using the command docker container ls. Let's also open the browser and see if we can access our UI. We open localhost colon 8888 and we are able to access our app running in container. Now that the image is built and we verified that it's working fine how do we share it with other developers when we are doing distributed development or how do we share it with QA teams or how do we push this to production? Note that the image is still local. What we have to do is push this image to a registry so that others can pull from it. Let's push our image to docker hub which is default public repository for container images. For this we need to tag the image we would like to push with the docker ID. I tag my image Now I push the image using the command docker image push my docker ID slash ml underscore trail Once the push is complete I should be able to see that image in docker hub. Now others in the team will be able to pull this image using the command docker image pull my docker ID slash ml underscore trail and they are ready to run the container from this image. This is how docker images are deployed across multiple environments. Hello everyone my name is Haseeb Niazi and I'm a technical marketing engineer with Cisco UCS Solutions Group. My colleague Pani Raja has already covered machine learning workflows using docker containers as well as developing a machine learning model using docker for AI and ML. Now let's take a look at the GPU support for the containers specifically for AI ML type of workloads. Cisco UCS compute portfolio provides customers a lot of choice when it comes to the platforms and the GPUs. For the DevTest type of environments customers can choose the Cisco UCS C240 or HyperFlex 240 systems and each of these systems can be equipped with two NVIDIA V100 GPUs. For deep learning and training type of workloads Cisco UCS C480 ML platforms with eight NVIDIA V100 GPUs and NVIDIA V100 technologies could be utilized. Customers could choose Cisco C480 non machine learning platform which can be equipped with six PCIe V100 GPUs and for the inferencing type of workloads Cisco UCS C series or HX220 or 240 platforms can be used. I hope some of you are familiar with the Cisco Converge infrastructure stacks. We work very closely with our storage vendor partners such as NetApp, Peor, IBM and others to come up with these FlexPod and FlashStack and VersaStack portfolios which are Cisco validated designs where we take the compute and networking gear from Cisco and we use the storage gear from our partner and we verify and validate the Converge infrastructure and we provide customers with the Cisco validated designs. What we have done is we have taken the existing designs and integrated them with the Cisco UCS AI and ML platforms such as C480 ML, Cisco C240 and as well as UCS C220 platforms so that customers can easily extend their existing portfolio and get the enhanced AI and ML capabilities without learning a new tool and they get it in a validated design. If you look at any validated design this picture carries over from all the stacks so we have storage system at the bottom it connects to the switching layer and from the switching layer it connects to the Cisco UCS platforms. What we have tried to achieve is you take any of our AI and ML platforms such as C480 ML or C240 or Cisco UCS C220 you drop it into the stack and then you can manage the server just like you would manage any other Cisco blade server or the rack mount server. The biggest advantage over here is you get an integrated solution and the management is exactly the same as what you're used to and you don't have to learn any new management tools. In our validated design we did not stop at just doing the physical integration we actually help customers and guide customers to install the AI and ML software all starting all the way from the operating system installation and going all the way up to a point where customers can just issue a single command download the AI and ML container from the NVIDIA GPU cloud and execute the workload. So in our demo which we are going to cover in a second what we show you is that we have a system which we have set up following a Cisco validated design and then you can log into the system you can issue a few commands download the containers such as TensorFlow from NGC which has a GPU support and then you execute your workloads on that container. A few words about the NGC containers for AI and ML so NVIDIA has done the heavy lifting and they have enabled most of the very well known AI and ML models such as TensorFlow and MXNet etc. with the GPU support so customers all they need to do is if they have a platform such as Cisco UCS which is GPU equipped you can download the container from NGC and you don't have to do anything extra to get the support from the GPU within the containers. Now that we have covered the high level concepts let's look at a demo. In this demo we will verify Cisco UCS 480 ML GPU support we'll download and run the TensorFlow container we'll execute a benchmark script and we will verify the GPU usage as well as server resource utilization specifically the power utilization on the server. So I'm currently logged into a Cisco UCS manager I have installed the Red Hat Enterprise Linux I've installed the NVIDIA Docker NVIDIA CUDA drivers etc. so we utilize the NVIDIA-SMI command and as you can see we have 8 GPUs reported so I have logged into the NGC.NVIDIA.com and this is the catalog of all the different programs they host on the NGC. So we'll go to containers and we can search for TensorFlow and if you click on the container you would actually see a command line which tells you how to download this container on your Linux machine. Back on the C480 ML we issue the command to download the TensorFlow container it will take some time depending on your connection speed and the whole container will be downloaded. The next step is to run the container. Since the containers already downloaded the container shouldn't take too long to load. Once the TensorFlow container is up the first thing we want to do is to make sure that all the GPUs are also available within the container as well so we execute the NVIDIA-SMI command and we see all the 8 GPUs are available to the container. We need to install the TensorFlow performance benchmarks we'll get a benchmark called tf underscore cnn underscore benchmark which are implementations of several popular convolution models such as ResNet50, Inception3, VGG16, etc. So we download the script from GitHub and unzip it. We will now execute the script against the ImageNet dataset using all 8 GPUs. With the script running we will open up another window into UCS C480 ML. You can see from the show command all the GPUs are utilized close to 100%. Something else which we want to pay close attention to is the power usage on the server with all the GPUs running at around 100% and consuming a lot of power you can see that the power usage for the Cisco UCS C480 ML jumps up to about 3kW. To wrap up this session let's summarize what we covered in this session. We went over a machine learning problem and containerizing the problem. We then talked about the GPU support for the AI and ML type of containers. We also covered the Cisco UCS platform as well as the converge infrastructure. For all the code which we have gone over and shared in this session you can find it at the GitHub repository which you can see on the screen and for further information about the Cisco validated designs specifically around the FlexPod and FlashStack for AI please visit the URLs listed here. We would like to thank you for attending this session and have a great day.