 Hi, good afternoon. Today, we'll talk about this EnableGPU accelerated on-prime Edge AI computing. So basically, here, the workload is the AI workload. The environment we want to emphasize here is running Edge. And we want to run in the container. For OpenStack, we want to run using DreamProject. So that's basically the purpose of this presentation. So this is the environment we want to discuss about where we use GPU. And before, if we focus on cloud computing, the data from the Edge, from the IoT device, will send to the data center. And the delay is very long. Because this center has the GPU or the computational power. So it's processing the data and then sends the decision back to the Edge. So this introduced long latency. And so that's basically the model of the cloud computing. But for the Edge computing, basically, we move the computational power from data center to the Edge side. Basically, here, in this diagram, basically, it's in the far end edge or the edge aggregator. So why we need a GPU in this environment? It's basically, a lot of application right now require the GPU power. For example, if we handle AR, VR, which related to video, we do some algorithms related to deep learning trainings of the data model. So that involves a lot of computational power which GPU can basically accelerate. So yeah, so basically, there are multiple applications we can help with using GPU. And in this environment, we want leverage, leverage the power of containers. So definitely, the most important part for this is the isolation, especially when we talk about so many applications, different applications, the software stack, usually, are very different. So to use a common host, a common bare metal server, to run this multiple different software stack, container gave us the ideal environment to provide isolation and portability as well. Especially, quite recently, the GPU support on the container environment have some advances. So now it's enabled on Docker containers. And so that's why we want to integrate this work into OpenStack Dream Project. So for the OpenStack Dream Project, before, we don't have the support for GPU. So this is ongoing work we are doing to basically schedule the GPU resource, schedule the workload to the GPU resource, and monitoring the GPU resource. The work is not complete yet, but our testing is based on the testing environment, which have this special code enabled. And how do we use the GPU workload? Basically, if you look at the top, this is a component deployed at this center. On the edge side, it's different. We call this deploy mode. Actually, the deploy mode is called DCN Distributed Computational Compute Node. So in this mode, the networking between the data center to the edge side is using SpanLIF, R3 network, instead of have L2 network involved. So yeah, that's only different than to deploy the DCN node in the data center. You can see that at the edge side, we usually, this one side represents one or two servers. Basically, we have GPU power. We have enabled some open stack component, including Zoom, to provide the contender capability. So the left side of this diagram is copied from NVIDIA, which basically giving the reference architecture how the contender being used in the GPU environment. So I want to emphasize a few points here. Basically, on the host environment, the shared environment, we have to enable CUDA. We have to enable NVIDIA driver. For Docker, it has much better support on GPU since the version 19.03. And right now, it's naturally enabled. We can have, in the API, it's enabled. This new API actually supports GPU. You can specify one or more GPU and assign it to the container. And again, the contender have their own CUDA environment, have their own software stack, which really help to isolate different environment to different applications. We test the solution based on Lenovo hardware. It's called SR5 650, which is a general purpose server. Basically, this hardware requirement, we have two kind of GPU in one host to test. One is NVIDIA V100, which is a newer version. And NVIDIA K80 is kind of the old version five years old about. So we just want to compare the performance. And the software version is quite important here. We tried many different versions. So, and this is for kernel, especially for kernel version, that some kernel works for virtual GPU, some not. So this is we have to do some smart kind of research there. But this is the final one we use CUDA 10.1 Docker 1903. And we run it on Ubuntu. So use this environment, we basically did some benchmarks. So the first benchmark we do is on each container, we just assign a single GPU. And we test the different benchmark algorithm to see what's the performance. If you look at a number here, the number is how many image the algorithm handle per second, which this is a TensorFlow benchmarking. It's quite standard. So there are lots of reference there. If you have interest, I can pass you the information later. So basically, the rest net, the very left one is the most common used benchmark for this kind of activity. We can see there are quite different if we use different GPU card in this setup. And the next test we are running is to see how we can run multiple containers in one host, share one host. Especially, we run multiple containers, share one GPU. So this is a scenario. Basically, we save costs to have one GPU on a single edge server. Because single edge server, usually a small footprint, cannot have multiple GPU card. So this is quite important to see how many containers can share this one GPU without this virtual GPU technology. So basically, the diagram shows five containers, share one GPU. In our test, when we increase the number to six, we basically found some container even cannot get the GPU resource. And one another thing we want to show that is if we treat basically a single container using one GPU as utilization, for example, is 100%, then if we do the sharing or compete the GPU resource with multiple containers, the performance, basically, the utilization is about 85%. So there are definitely some penalty there. So basically, I want to summarize this presentation. We basically discussed how to enable GPU in the edge computing scenario. And there are some ongoing work in the OpenStack Zoom project to support that. We also gave some performance data. Thank you very much. Questions?