 Hey, my name is Michael Bennett, and I'm here to talk to you about the GPU accelerated machine learning with the OpenShift container platform I have Diane Fedorna here who's a principal software engineer at Red Hat who will be presenting the performance results from our ML Perf benchmark on this platform So we'll start with the talk about the introduction to the potential of AI and ML and then how GPUs can accelerate that We'll discuss OpenShift as an enterprise Kubernetes platform for AI workloads and how the Dell EMC ready stack provides a validated flexible solution to deploy OpenShift in your data center environment, and then we'll go over the results of ML Perf running on this environment So to start out talking about AI and machine learning artificial intelligence is a Big subject that encompasses many things of which machine learning is only a subset of what enables artificial intelligence artificial intelligence is also, you know driven with automation and being able to act on the predictions or the classifications that are made by these machine learning algorithms and Inside of machine learning. There's actually deep learning and neural networks, which are just the most state-of-the-art implementation of artificial intelligence today and so and and these neural networks just happen to be very fast to train and Inference on GPUs, so we'll talk about how the OpenShift platform enables some of that with Kubernetes Focusing a little bit more on the impact of AI though It's been said that if you don't have an AI strategy that you know, your business is going to fail that AI ops AI Marketing and AI risk compliance are all the types of things that are going to be able to be Automated and done with this artificial intelligence that can be faster than a human and more accurate for certain types of use cases and so the OpenShift platform is with GPUs acceleration is designed to speed up your adoption of the next three technologies to have the Infrastructure you need to power your AI applications and enable all your business to realize all of these benefits It's important to note that when we talk about building out artificial intelligence one of the reasons that Kubernetes is a great platform is because the actual code that is done to Train a neural network or perform inferences only a small part of the total number of components that are needed to have a machine learning and AI infrastructure You also need configuration management tools ways to collect the data and verify it as well as a serving infrastructure once your model has been trained and then ways to monitor that model for drift so that You can ensure that your inference results are accurate over time and so what we like to do in order to break down the process into a Journey that can have smaller steps is to first discover the problem that we're trying to solve with artificial intelligence And usually that's a business problem that we want to solve not an infrastructure Problem the infrastructure will just be used to run the solution And so after we have discovered our problem We will begin to explore what data we have that will help us to train a neural network to assist in Automating or supporting decisions for that business requirement and will analyze the ROI Then at that stage once we have all the data that we know will be useful for the model we can start running the model doing AB tests and Trying different neural network architectures until we land on something that is accurate and performant Which we evaluate in our findings and then finally we will operationalize the model by deploying it on serving infrastructure and promoting user adoption of the artificial intelligence backed service and so Bringing all this back to kubernetes. We enable it with this stack of what we call it NGC containers Which run on this red hat open-shift? Environment and then on finally underlying deli and see data-setter infrastructure And so that delivers you your containers your GPU acceleration as well as red hat Open-shift for enterprise kubernetes and then the NGC software stack from NVIDIA that is optimized for AI model development and training and includes all of the Latest SDKs like TensorFlow PyTorch and cafe inside of those different NGC software registry Now to talk a little bit about GPUs accelerating AI So with this slide what we're hoping to reference or what I'm we're getting across here is that again if Referring to that earlier presentation where ML code is only a small part of the problem even inside of the AI application You know a small amount of the code that is the convolutional the convolutions in your convolutional neural network and Things like that are Benefit from being able to be highly paralyzed and they often actually just represent a very small amount of the code But by moving them to the GPU you can see 10 15 times speed-ups Versus running the same part of the application code on the CPU So this is why it's useful to have this GPU accelerated open-shift environment a little more on how the NGC software optimizes your usage of NVIDIA GPUs on side of inside of kubernetes again, they have those NGC containers that contain the optimized SDKs and deep learning frameworks as well as helm charts to deploy Some solutions like how Clara healthcare? AI and other like tensorRT inference server different toolkits and then OEMC offers a wide range of NGC ready systems that are validated to Run these NGC containers in the most performant manner And then NVIDIA goes through the effort of making sure that these stay updated that they're designed to scale to multi GPU workloads out of the box and That they achieve state-of-the-art efficiency or accuracy or better for the models that they host inside of the NGC containers So this is an example of all the different software that goes inside of the NGC layer of the container architecture that we showed earlier You can see here that they have NVIDIA optimized versions of a Lot of the different popular deep learning frameworks that they package inside those containers for you and so we've talked about artificial intelligence and why GPUs and containers are a good choice for developing and hosting your artificial intelligence application and now I'm going to touch on the specific benefits of Red Hat OpenShift for your enterprise Kubernetes environment that's going to be running AI workloads So if we look at again that process of identifying the problem gathering data We want an entire platform to be able to do this on again the ML code only represents a small portion of this Stack which is the develop ML model. We still have to gather data deploy implement in apps But of course in order to do the ML model development and deployment will need The ML frameworks like TensorFlow as well as DevOps tooling like Selden to enable us to deploy those models into an environment where they can be reached at as an API endpoint We'll also need to tie in data sources that the model trains on or will be running inference operations against so we have to plum in those data sources and Then ideally, you know, we would want a platform that is self-service so that our developers can request new containers or pods on demand and Move things between development environments and production with minimal need to interact with operations teams Of course since artificial intelligence workloads benefit from GPU acceleration we would will want to make sure that our Platform has support for GPUs and that our data science teams can get access to them and Finally, you know the infrastructure that everything runs on which in our case is a Dell EMC ready stack for OpenShift with GPUs inside the servers And so, you know OpenShift provides all of these Through a couple different things. We'll talk about here one is Open Data Hub Which is an operator for deploying things like Kubeflow, Selden, Jupyter Notebooks and TensorFlow T There's the NVIDIA GPU operator which is responsible for going out and finding the GPU resources and in the systems and then There's also the different data sources that you'll need to plum in which in the OpenShift operator hub there's Deployment for many of these and then being a Kubernetes infrastructure you can deploy data sources via Helm charts also So talking about the Open Data Hub it is a Operator That Red Hat has for data science tools and deploying them on top of the OpenShift environment And so it if you're familiar with Kubeflow, you know, it it does services like that for data science workbench functionality, but it also enables Spark and TensorFlow deployments and then logging data to Prometheus and Grafana So it allows us to Deploy all those different components we saw in the earlier slide where the ML code was only a small part of it Open Data Hub ensures that we can not only run the ML code, but that we can have all the supporting services around it and So OpenShift is it's a trusted enterprise Kubernetes Red Hat puts a lot of effort into both upstream Kubernetes release, but then also taking the Kubernetes releases and hardening them and then making them production ready for OpenShift and Dell EMC works with Red Hat to Validate these OpenShift releases on Dell EMC hardware to ensure their customers have the best experience when they're running OpenShift on Dell and Red Hat's been the leading contributor to Kubernetes since day one Some of the benefits of using OpenShift for your Kubernetes platform are that it makes an it has an automated full stack installation they provide Ansible scripts as well as installer tooling for to enable deployment on HCI infrastructure like vSphere or public cloud instances and then they support auto scaling of the Resources like worker nodes inside of your Kubernetes environments And it's very easy to do updates with one click on the home page of your Kubernetes cluster manager Red Hat also has certified operators which are built with the operator framework and then certified through the OpenShift Operator certification at that point then they can go into the operator hub that is integrated into OpenShift 4 which you can see on the right is a portal right inside of the cluster manager to make it easy to Deploy these operators and they all have different levels of automation that when you drill into them They show you if it supports for example Automated lifecycle management or just automated deployment and scaling things like that So it's really cool. I enjoy knowing my operators are sort of finding going to work when I install them and then another benefit of Red Hat OpenShift for Kubernetes is Red Hat CoreOS that which is an immutable operating system with CoreOS all of the operating system images are managed by the machine operator and so The systems are an operating system configurations get handled with APIs and Configuration definitions and so you don't have to worry about mismatched settings inside of the hypervisor OS That are running your containers and things like that All right, and then right before we get into the ML perf benchmark results just a little bit about our Dell EMC ready stack for OpenShift So as I mentioned before we work very closely with Red Hat to validate the releases of OpenShift on Dell EMC hardware and we also Do scale testing with several racks of equipment to ensure that our reference architecture Will scale to any size deployment that customers desire The design is pre-validated in in the lab, but we create a lot of collateral that enables our systems engineers to customize and write size that ready stack for customer usage and you can find on info hub.delltechnologies.com a lot of Information of like for example the reference architecture design and deployment guides Some solution briefs and some videos and blogs talking about the benefits of OpenShift This kind of gives you a high-level overview of all of the pieces that have come together for to make this GPU Accelerated enterprise grade Kubernetes environment for artificial intelligence In the center, we have the Dell EMC ready stack hardware where we used three Powerajar 640s as manager nodes as well as Different GPU accelerated server options that we have With the DSS 8440 being a 10 GPU server for dense configurations and the C4140 Supporting four servers with NVLink Or for GPUs with NVLink, I'm sorry and then to manage the Identification of the appropriate workers with GPU resources and install the NVIDIA drivers Stack we have the NVIDIA GPU operator Inside of OpenShift and then finally on the right we have those NGC containers I mentioned as well as the OpenShift open data hub for deploying the different tooling required to build Artificial intelligence applications All right, and now I'm gonna hand it over to Diane to go over the ML perf benchmark results that we got when running on this cluster Michael gave us an introduction to the benefits of running AI ML models in OpenShift Using the Dell ready stack now We'll talk about the benchmarks we ran in the Dell AI innovation lab and how we monitored them We chose the ML perf benchmarks to validate our remits architecture because they are all open source They're created and supported by more than 40 industry and university research organizations And they're designed to help people choose the right ML infrastructure for their applications The models in ML perf are neural networks that train on publicly available data sets You can go to ML perf.org and download the benchmarks yourself and try them out Now we'll take a look at the lab set up. We had in the Dell technologies AI innovation Lab in Austin on the left you see the cluster system admin host which serves as a single entry point into the OpenShift cluster The OpenShift nodes on the right are in their own private subnet for security reasons You use the cluster system admin host to gain SSH access into the OpenShift nodes and to administer the cluster Next we have the bootstrap node which is Which is doing the setup of the cluster and after cluster setup is completed it can be added as an additional worker node Then we have three Dell R640 manager nodes, which are the control plane for the OpenShift cluster And we have on the right two worker nodes with GPU acceleration The C4140 with 4V100 GPUs was trained Was used for our training benchmarks and below that the Dell R740 with 1T4 was used for the inference benchmarks And now we'll talk a bit about Monitoring so this is how we monitor our monitored our benchmarks that we were running in the lab You can see the Dell cluster on the upper left running OpenShift Next you see the NVIDIA DCGM exporter that is installed by the NVIDIA GPU operator and exports the metrics from the worker nodes that have GPUs The DCGM exporter is continuously exporting metrics from the GPU worker nodes then the Prometheus time series database scrapes those metrics and puts them into a time series database and You can roll back and look at those Metrics later if you have a reason to because they are stored in a database And then we are continuously displaying those metrics on Grafana dashboards that we created so in this case I created this dashboard and Created the queries that we see But you can also download from Grafana.org you can download pre-created Dashboards that people like Grafana have created This is another dashboard that shows PCI sends and receives and the temperature of the GPUs So you can view that metrics like this and you can also see things that don't relate to the GPUs such as Iowax and The CPU usage Basically, these are some of the great things that are built into OpenShift that customers love because they can use these graphical interfaces to see What all their how all their resources across the cluster being used they can look at individual pods you can basically tailor this to report whatever you like and you can also Report metrics from your own applications and export them and have them scraped in the same manner Now we'll look at the system details For the cluster that we for the the worker node that we ran the training benchmarks on and So you see on the left column. We have the Dell C4 140 and This is the hardware and software stack that we had for that worker node on the right We have the NVIDIA DGX one comparator system that we Used to publish results to you know measure how well we were doing on on our OpenShift cluster, so both of these systems have a V100 GPUs the C4 140 has four V100s whereas the the DGX one has eight V100s So we'll explain how that's obviously going to affect our results and we will we will account for that then you notice that for the software stack the Dell system has an extra software layer that our comparator system did not have it is running OpenShift and Gives you all these added benefits and what we'll see in our results is that even that you've got this extra layer of software There's virtually no performance penalty That we paid because of that We ran a slightly different operating system. We ran Red Hat Core OS 4.3 on the Dell in the Dell lab and our comparator system was running Ubuntu in both cases. We used the PyTorch Version that is optimized for Nvidia GPUs and on down the line you see there's a slightly Different level of CUDA and CUDA driver that were used in the two systems and everything else was the same The exception of the fact that we have additional software running the entire time that we were benchmarking so that we could monitor Our performance so that we could go back and then tune those applications So now we'll talk about the ML perfect training benchmark itself And in terms of the types of applications that they focus on in this benchmark There are computer vision and natural language processing models in this benchmark So we train four models to or object detection models and two were natural language processing models Mask RCNN and SSD are object detection and GNMT and transformer are natural language processing Models So the public data sets are listed here as well That we use to train the models. So Cocoa is the common objects in context Dataset created by Microsoft and Cornell and it contains 300,000 more than 300,000 images that are that have objects in them that are labeled and For the translation benchmark the public data set we used was WMT Which is over 8 million sentence pairs in German and English that you use to train the model and These sentences were taken from newscasts and parliamentary proceedings. So These these public data sets are the way that you train the neural network and All of these benchmarks were donated by companies like Facebook and Google and they are used to solve real-world problems such as Google translate and applications like that. So along the bottom we have the timings that we received and we're gonna That we had in the lab Adele and we're gonna compare these timings to the published NVIDIA DGX1 results so These are our results and You can see that Okay, again, we need to remember that On the left we have the Dell system, which had half as many GPUs as the DGX1 that we're comparing to so you would expect The training time to take a little bit longer. So Lower is better for training times for neural networks because you want to train as quickly as we can so that your data scientists can get faster turnaround time and try different model settings and So We did really well on mask our CNN as you can see here on the far left the object detection benchmark and We also did extremely well on SSD We took us about twice as long and we had half as many GPUs So then going on to the translation benchmarks. They were slightly they are highly variable benchmarks We didn't do quite as well on these There's a random seed that's used to get these benchmarks started and that affects the number of epochs that The model takes to train So if we had run this benchmark more times, we feel that we could have gotten it down to about half as much or twice as much training time as the DGX1 So now we'll look at these we look at these Same numbers are only normalized per GPU and Again, you see like we did better for the mask our CNN did you know Equally well in the SSD and not quite as well in the GMT and the transfer from benchmarks When you look at it on a per GPU basis so now moving on the next set of benchmarks are the inference benchmarks and This is when you put your trained model to work you give it queries and it gives you results and You have a quality metric that you're aiming for you your benchmark fails if you don't meet that quality metric and All the benchmarks we ran of course did meet the quality metric and were successful so the MLPURP inference benchmarks have four possible scenarios Offline single stream server and multi stream We ran offline and and server because those are intended For Data center benchmarking the Single stream benchmark is intended for smartphones and the multi stream benchmark is intended for embedded systems So what happens here is you have something called low gen which generates an artificial load to run the benchmarks it sends queries to the model and in the case of the offline scenario it sells it sends all the queries in one big batch and then you get the answers all in one batch and in the case of the service scenario it uses a PIZON distribution to simulate the birthiness of queries coming in that are more Realistic to real-life situations So the MLPURP inference results that we are comparing to are from the MLPURP.org Website you see them off on the right here and all the published results for these benchmarks had ECC turned off error correction turned off So Turning ECC off does give you a performance improvement of 13.14 to 14.29 percent depending on the models in this group but the cons of turning off ECC are that single-bit memory errors would not be detected or corrected and Double-bit errors would not be reported so We decided to leave ECC on so that is going to slow Slow our results down a little bit, but any of our customers in the financial industry or the scientific computing community and any situation where safety is involved you would want this extra Level of security for your data and basically it's just cheap insurance that you you want to keep So we decided to leave ECC on but you want to keep that in mind when you're looking at the performance results that we Present because we're comparing it to a system where they turned ECC off So these are our results you can see that that these are the server results and In this case higher is better So we're doing inferencing we want to answer as many queries as we can as quickly as we can and We are comparing one T4 on the left with four T4s on the right The other difference between the left and the right is that on the left. We're running OpenShift and on the right the comparator numbers that we got from the mlpurp.org website They were not running They were not running OpenShift or any other container orchestration platform So we had an additional layer of software that they did not have So you can see that You know, it should take about They should be able to do about four times as many queries as we can and and that's about right So if you normalize The server benchmark per GPU, that's what you see here You see that it is about the same but we are off by a little bit and that can be accounted for By the fact that we turned ECC on we turned our error error correction on just for a little more data safety so That was the server results Now if you look at the offline results that we had Again Higher is better here And we're comparing one T4 to 4 T4s and you see something similar You get about four times as many inference answers when you have four T4s Just compared to our one T4 with OpenShift then if you normalize Per GPU You see that We we did about the same just the slight difference between you know three to 14% difference can be accounted for because of the difference in our memory correction setting so we got excellent results and Now back to Michael to wrap up. Thanks, Anne For that So just wrapping up. We have a couple of links here GPU accelerated machine learning with Red Hat OpenShift white paper that goes into more detail about these ML per benchmark and the ready stack that we use to run the solution We also have our design and deployment guides for the ready stack and Then a link to the AIML and OpenShift page by Red Hat as well as the NGC Program run by NVIDIA with those GPU optimized containers that we talked about Thanks everybody for taking the time to listen to our presentation today on GPU accelerated machine learning using Red Hat OpenShift