 I'm Diane Fetemi, I work for Red Hat in the AI Center of Excellence and I'm going to talk to you about machine learning and AI today on OpenShift. So as Clayton and Derek were saying earlier, the first cluster is the most important cluster and I just went through that experience helping two of our hardware partners install OpenShift on bare metal in their labs. They were so gracious to offer us their hardware and we installed OpenShift clusters there. We containerized the ML perfect benchmark, the training benchmark, this is a new industry standard benchmark for machine learning and this is a large scale benchmark that runs for over two hours, some of them run for over two hours. It's a whole system test, this is not a toy, so it's a great way to validate new reference architecture which is what we did at both SuperMicro and Penguin. So in both of their labs we ran OpenShift and the ML Perf training benchmark and along with their servers the worker nodes had eight NVIDIA V100 GPUs for hardware acceleration. So at this point you basically have a mini supercomputer, this is high performance computing and we were able to nearly match the results, the NVIDIA published results for this benchmark on a DGX1. So this is a good first cluster for each of them. These benchmarks are basically two areas, two application areas that they cover which is computer vision and natural language processing. So the four benchmarks were Mascar, CNN, SSD, Transformer and GNMT and these are all deep neural networks that we're training here. These are the data sets, sizes and the data sets that we train all these models with and we use the PyTorch machine learning platform for all of these. So this is just a little example output of what Mascar, CNN does. It takes an image and it identifies everything in the image and it even does segmentation so it can break an image down and this is really useful for things like medical imaging. They can look at like a limb and then look at the bone and the tissue and so this has very practical applications. So why do we want to do this on OpenShift? Well the same reason that you want to run a lot of things on OpenShift, you can load balance your workload, you can have it scheduled on the GPU enabled nodes. There's a very nice user interface to submit the job and that's in the movie that I'm about to show at the end here. You'll see a little of the user interface. Also another thing that's important is that we ran this on bare metal with these partners but you could take these same images and run them in private or public cloud or virtualized. This is an example of the actual Grafana dashboards that we used at the first site. And this shows, this Grafana dashboard shows that all of the GPUs along the top there are working at like almost 99% and this is true for over two hours. So these things are really taking advantage of these GPUs. You also see the temperature of the GPUs and how much energy they're consuming. So when the GPU gets to about 95 degrees Celsius it just shuts down because it starts to damage the GPU. And this slide would advance, there we go. So this is another dashboard that we have showing GPU utilization, GPU memory utilization, how many times we went over this threshold of 95 Celsius on the GPUs, which is none here. You can see none of them have. And then there are various clocks within the GPUs that you can also look at in megahertz what the clock speed is. So in order to, I won't go into a lot of detail here, but in order to containerize these large benchmarks, we used a tool that Red Hat created called MLCC that outputs a docker file. And then at the end of that docker file we added commands to build PyTorch from source and then we just added a little script that actually ran the benchmark. Then we pushed that to QAIO and from QAIO we were able to deploy it from YAML into the OpenShift cluster. So this is just a tiny example of how you could do a CUDA vector add. This is a toy example. Our examples were large scale real world examples, but that, the YAML for that doesn't fit on the screen here. So you can see that I'm indicating for this tiny test case that I just want one GPU, but in all the test cases that I ran with MLPerf we used all eight GPUs. So just, I don't have time to go over all this, but these are all the people that were on our team, my collaborators. I just wanted to give a shout out to them. This is the SuperMicro team and three of us from Red Hat. And this was our hardware setup. We had a 10 node OpenShift cluster at SuperMicro. And on the right side you see the software that we ran. We actually ran OpenShift 311 at SuperMicro, but we ran 4.2 at Penguin. This is the great outcome that we had. We were actually faster from MascarCNN than the NVIDIA published results for that benchmark. And we were only, in the worst case scenario for GNMT, we were 6.13 percent slower. And there's a random aspect to that benchmark in particular because it takes a random seed as a initial condition. So you have to run it a lot of times. We weren't prepared to run it thousands of times to beat the timing, but I think maybe we could have at least matched the NVIDIA timing there. So this is the Penguin team. I didn't have a time. This is so fresh off the press I didn't have time to get everyone's picture. But later you could look at my slides if you wanted to contact any of these people. This is the 9 node 4.2 OpenShift cluster that we ran at Penguin. And these are all the details of the hardware and software stack. So at Penguin we also did extremely well with our timings compared to the ML Perf publish results, which you can just Google ML Perf version 0.6 that came out in July to see their timings which I've got here in green. And then our Penguin numbers are in yellow. So again in MascarCNN we did better and a little bit worse there on the SSD single stage detector benchmark that was 12.5 percent slower. So instead of taking questions, I think I'm going to go straight to this video. This is done at Supermicro and I'm showing how we ran one of these large benchmarks through this nice interface that we have in OpenShift. So I went to the NVIDIA project and I first I'm going here to the Grafana dashboard to show that nothing's really happening. You look at all these gauges, the GPUs are quiet, nothing's in memory right now. But then in a second here we'll launch the job. Also you see that the GPUs are cool and they're not using much energy. Then I will import some YAML. This is the YAML that we use for the GNMT benchmark. And you can see that I'm pulling that from QuayIO. I'm specifying I need eight NVIDIA GPUs and I'm using the host path option in OpenShift to read the data from local SSD. So we start up the pod. You can see that the model is training. So you can see training there on the screen. And you look at the events that this pod has been through. You see that you pulled the image from Quay, created the container. You can look at the logs from this console. And then now you look and these GPUs are very busy and for the next 25 minutes they're going to be pegged and working hard. And basically that's it. Does anyone have any questions? Sure. Give us a little as good as they are. Certainly the OpenShift are comparable with the right up there with some of the fastest stuff that's ever been tested on an old Perth. So part of the reason is that in a container there just really isn't that much added overhead. So that's one of the things we wanted to prove that even though this is running in a container there's basically a process and there's not much overhead added. Also OpenShift itself is not adding really any noticeable overhead here. And part of the reason it ran so fast also is that we used NVIDIA's version of PyTorch which is very tuned, very well tuned to run PyTorch on these NVIDIA GPUs. Yeah, it's a private repository. We have to do it in a private repository. Which is because we're not... Actually, I have to use a private repository for my images. Each of these hardware partners can pull their own images that they create. It's just that... And we did that too. It's just that I can't pull images for them that have NVIDIA software in them. No, no. I'm just talking about how we pull it. Sorry. So no, it really is just a matter of... The hardware partners have to... They have a relationship with NVIDIA for those CUDA bits. And so they have to pull the bits themselves, not me. It's just a technicality, really. It's not a technical issue. Thank you.