 Good morning everyone. My name is Pei Zhang, a senior query engineer from Red Hat. Welcome to this session. I'm very happy to share this talk. On-stack in telecom, how to achieve low-lentency computing with on-stack. This is today's agenda. I will give a general introduction about low-lentency, the real-time QVM. Deploy a real-time QVM in the on-stack. And last I will present our testing results in our testing environment. So let's go through the first part. What's low-lentency? I will start from some examples. The first one is telecom nutrition. Currently, some telecom companies are using the network function materialization solutions, which replace the dedicated appliance with standard hardware and software to implement their network functions. So this means they have the high performance quality for the infrastructure. So if there is high-lentency, the voice of the communication may be weird, or the phone call may be interrupted. Imagine that you are talking with your friends by phone, and sometimes there is high-lentency. And maybe your voice is delayed or interrupted, that's bad, and that's weird. The next example is vehicle control, especially the self-driving car. Sometimes the self-driving car needs to transmit and need to need data from the data center or the navigation system. If there is high-lentency, if the car needs to turn left or turn right at the crossroad, but there is high-lentency, then they may miss this operation and they may fail to arrive at the destination, or even worse, they may cause car accidents. The third example is stock trading. In the stock market, the price of each stock changes every second. So if there is high-lentency, the trade may fail, or the outdated price may mislead people and confuse people. So this example shows that low-lentency is important. As far as I know, there are more industries which may also meet the low-lentency. So what's the low-lentency? The maximum allowed response time must always be within a certain value. Let's show this chart, let's see this chart. The x-axis stands for the time, the y-axis stands for the response time, the horizontal red line stands for the lentency bound. So the black spot stands for the response time of each request. So this means at any time, the response time should always be within the lentency bound. Let's see the red spot. This is not allowed because it exceeds the lentency bound. So the low-lentency means the maximum response time must always, within a limitation, must always less than the lentency bound. So how to achieve the low-lentency? We can use real-time TVM. Real-time is not about speed, it's not fast, it's about internalization, and it's about predictive content behavior. So in other words, we can see as a deadline, some applications they have, the deadline may be second, maybe a million seconds, or maybe a matter of seconds. For some applications there may be thousands of deadlines in one second. So this means the system must respond before the deadline, otherwise it will cause bad consequences. TVM is a Linux virtualization technology, it's an open source, it allows hosts to run multi-virtual machines on it. Real-time TVM, real-time TVM is the extension of TVM, it allows the virtual machines to be a real-time system. Actually, the print RTPatch enables the Linux kernel to be a real-time system, and the real-time TVM allows virtual machines to be a real-time virtual machine system. So how we deployed the real-time TVM in the OSAC? OSAC is an open source software for building the private or public cloud. It's a cloud operating system which controls a lot of the software, of the network resources, computing resources, and storage resources. It has many projects, and there are many deployment configurations people can choose to deploy some of these projects to satisfy their business requirements or to satisfy their other deployment requirements. For example, their KINSTO is responsible for the authentication services, NOVA for the network services, and MUTC for the network services, and NOVA for the computing services. So let's back to the real-time TVM. The real-time TVM provides the real-time virtual machines. It's mainly related to the NOVA project. OSAC supports real-time TVM. Actually, it's mainly configured in the NOVA. In this picture, we can see the bottom is the computer node hardware, and we install the real-time TVM and we set up the real-time TVM using the legal words. As the real-time TVM has strict performance requirements, so actually each part of this needs special configurations. I will talk this one by one. Let's first see the prerequisite for the computer nodes. First part, hardware. We need the standard S86 service, and another is the real-time deployment. It requires the BIOS setup. We need to disable some options which may cause high-lentency in the BIOS. For example, the hardware threading needs to be disabled. Actually, some vendors provide a document to set up a low-lentency system, so we can follow these instructions. After the setup, we can use hardware to check if it's ready. It has some standards, and we can run maybe 24 hours to check if it's ready. This is the hardware part. Software part. We need to install the real-time packages. We need to install the real-time TVM and the tuned. We also need to partition between the host calls into two parts. The one part is the real-time calls or isolated calls. The other is the housekeeping calls. The housekeeping calls are regular applications. They can run on the housekeeping calls. Also, we default to use the housekeeping calls. For the isolated calls, they are not used. Only if you explicitly pin your application to the isolated calls, then it will be used. So, the isolation is included in the real-time setup. Also, we need the housekeeping part. On the X86 servers, the number of memory pages is 4 kilobytes, and the housekeeping side is 1 gigabyte or 2 megabytes. The hydrate advantage is, first, it's never been swept out. Even the system is under very high memory stress, it's never swept out. Also, because it's a hydrate side, it has less pages, so it can seal the less pages of the system. Another is it has a higher priority because of better performance because the TLB mist ratio is more less. So, this is software on the compute nodes setup. After the compute nodes setup is ready, now we can define virtual machine using the little works. I need to have two points. One is basically to pin. First, the calls used for virtual machines need to be isolated. This call should not be overlapped, which means if we pin one host card to a virtual machine, this card should never be pinned to another virtual machine. So, the call cannot be shared with other virtual machines, otherwise the performance will be bad. Also, besides the basic use, we also need to specify the emulator calls for the virtual machines, which is used for the maybe curious drives. Another is we need to specify using the compute plate for the virtual machine definition. This is an example. In this example, we designed four gate-byte memories for the virtual machine, which is using one gate-byte memory backing. We set two CPUs, the base CPU zero, each CPU is pinned to individual SD cards. The base CPU zero is pinned to the host card 18, and the base CPU one is pinned to the host CPU 19. So, the emulator CPUs will use different cards. Also, all the base CPUs have faithful scheduler priority. This is an example of the virtual machine definition, after the compute nodes and the VM definition ready, we still need to set up inside the virtual machine. We still need to install the real-time packages, including the kernel RT packages and the terminal packages inside the guest. We also need to participate into the cards into two groups, isolated cards and housekeeping cards. This is set up from the compute nodes, from the VM definition and from the set up inside the guest. So, what it looks like in the own step set up? Actually, the deployment of the own step is complicated. So, if we follow the one step by step, they may be very complicated. So, there are many tools. In Red Hat own step, we are using director to deploy the own step, which already set up in one server, and it will deploy the own step ready, it will deploy the components and all the other servers. So, for about three-part configuration, we need to configure it in the director, and the cloud config files, and all of them can be set up. But in the community version, if you follow the steps of the own stack, a facial doc, maybe we need to... The mainly related configuration is the normal part. We need to specify which cards will be pinned to the virtual machines in the normal config file. And we also need to use Flavor, which sets the real-time attributes, including the... that pin the CPU policy and the memory and so on. So, this is the configuration, and let's go to the final part is our testing performance. We made it test the three standard test scenarios. The first is single VM. This means we run only one virtual machines on a host, and each virtual machine has one real-time CPU, a real-time core, and a host-keeping core. Since narrow is multi-vcqs, we run one virtual machine each with 10 vcqs and eight real-time cores and two host-keeping cores. And the third scenario is multi-vms. We already run four VMs each with single real-time cores and single host-keeping cores. Because the real-time performance, it may be related to the hardware. So, in our CPU type, in our server, and in our server type and CPU, this is our testing results. We use the septic test tool to do the testing. The septic test is a tool to test the real-time latency. So, this is the testing results of the three scenarios. We have the average, we have the minimum latency, the average latency, and the maximum latency. Actually, we don't care about much about the minimum latency and the average latency because we care most about the maximum latency. Because the maximum latency should never exceed the latency boundary, and our threshold is 40 microseconds. So, in our testing, we run these three test cases each case running 24 hours. So, after 24 hours, the latency should always be within the 40 microseconds. Otherwise, there may be problems. So, and the below is our, is our, the compound we use to specify the way we test the real-time cores. Besides, during our testing, we have heavy stress in the host-keeping cores. We compile kernel in both host cores and guest-keeping cores, which may consume almost 100% of the resources. So, we are doing our testing on the system and the health pressure. So, this means that under the health pressure, the real-time latency should always be within the 40 microseconds. So, this is our testing results. And that's all. Thank you. Any questions?