 Hello, let's get started. Good afternoon. Very honored to share with you our topic, which is the case studies and experience of catech containers at cloud container service. So a quick introduction for myself. We are from Huawei. I joined Huawei in 2011. Before that, I was also a developer of the internet network, and I also do interactive design. Then I've been a member of the SPN product of Huawei's development. And this year I joined Huawei cloud, mostly responsible part of the Huawei cloud products. I am currently the chief architect of Huawei's serverless container service of cloud container instance. Now talk about serverless service. It is just kind of a response to the last speaker. So you can see from my work experience, I've actually started to convert to container and failed for quite a few recent years. So I'm still an elementary student in the cloud container instance field. So I hope to communicate more with the seniors here. Contents, three parts. First, cata containers introduction. And second, why cata? We are Huawei cloud serverless service. Why do we choose cata? And the third is key cloud instances and some of the decision making process. So cata container introduction. I think we can go through it pretty quickly. If you participated in our morning session, then you should be well informed about cata containers. So cata containers, it is runtime security containers. Specifically, the idea is virtual machine in order to achieve the same experience of raw containers while provide load isolation to guarantee security. Hi, basic stuff. In 2017, in the end of 2017, from internals cata, the nurse, they do integration. And then the result is the cata container. And in the previous speakers sessions, I think these are all being into Jews. On the right, you can see the basic architect I got from their official website. So basically, you can see the standard interface, the VM realization and the communication technologies in the middle layer. So these are pretty basic stuff. So why Huawei choose cata? So we have to first talk about Huawei CCI cloud container instance service. So start with this CCI service. This speaker mentioned that cloud container service, this trend is a trend that a lot of cloud, so we don't tell the users about the resource pool, about the nodes, but they can still use on the cloud. So because in terms of the ecosystem, we are not doing K8S service. So K8 interface is available for the users. So it's maintenance free and more instance and better experience for the user. At the beginning of 2018, Huawei is the first to propose the idea of a serverless container service. We are the first to propose that idea. So in terms of design, I was just talking to a few colleagues. And in terms of the design, we can realize this. It's not a, it is not a VK based. We directly put K8S and connect for them to manage the bottom layers of resource pool. So my understanding of VK. So for VK, people have some concerns because VK defines the interface and vendors have to do their own realization. So a lot of, there is Huawei realization at the vendor side. But what we are concerned is that on the base layer, if there is a lot of repetitive realization, it will be too costly. So to achieve this serverless platform, basically is we can directly manage the base layers of resource pool. Talking about the container platform, also the network storage is inevitable. For this part, for OS platform, of course, we need a strong isolation, not at the runtime layer. So for storage, we connect different cloud storage. And for compatibility, we support over 95% of, not 100%, over 95%. Because if it's OS platform, it is not a level of interface, will not be open or available to the users. So for our serverless, it's based on how appbs, which is optimized by Huawei. So serverless container platform need to have strong isolation. And for us, there should be three aspects of guarantee. One is runtime guarantee, which means it's the core that I wanted to talk about, which is the container. And the next part is, of course, at the network level, it should be also strongly isolated. We have VPC based network solution, and also objects, user interfaces, you still have sufficient isolation model. So we have MCs level of user verification and isolation. So these are all necessary. So for scenarios and solutions, at the beginning of the training, after we launch it, it's been one and a half years. And in terms of application of different industries and scenarios, there are actually quite a few instances. There are even scenarios that are extremely suitable. For example, deep learning of AI. In another meeting, I also shared some of deep learning instances. And also, general computing, sorry, genetic computing, and also Cd, Ci, and also, you know, it is serverless. As a functions-based platform, it is very suitable, including we have some users, you know, they thinking about strong security demand. For example, government financial sectors, it is also suitable for them to use. So based on different scenario, you can see that it's suitable for scenarios that require maintenance-free, rapid deployment, easy to use, strong security isolation. There's actually a lot of scenarios that fit in. So serverless could be the mainstream trend in the future for cloud containers. So this is another topic. Cd containers are our divisors. Which one to choose from? So from our history, we know that divisor, it was launched by Google by the end of 2018. It was quite an innovative idea. During our development, we also see the ideas of divisor very innovative. We also do testing and benchmarking of Cd containers and divisor. Actually, in the morning, the speakers also mentioned divisor and benchmarking with Cd containers. Without divisor, it's a two-layer isolation. So definitely, it is better than the original kind of raw containers from time. But there is also the performance benchmarking like CPU performance. Cd and divisor are actually pretty good on CPU performance. But there's one issue about divisor is compatibility. Because we know that it is a new kernel. Does this kernel support all forms of applications? They only supported a limited number of applications. So the compatibility of the scenario is an issue about divisor. This will be slower than the hostess. So this is one of our concerns. And for network, divisor, you can see the user protocol stuff before this I think is going to be improved. And for the support of the heterogeneous hardware, like GPU, it is relatively weak in divisor. So considering the general usage and the scenario, I think in the near future, we will choose to catalyze. And also we are following the progress of divisor. And also we are looking forward to using divisor in the right scenario in the future. So now let's look at some of the key zones in the scenario that we need. Because this is a cloud service, which will require the good networking So from the application perspective, and also the function in terms of our networking, the interface directly between the blue lines is the cloud of the management. And the middle is our network plug-in to interface our neutron on the cloud. And we apply the network plug-in to create the interface. And add in the cut-out interface. And also we have like an optimization of the networking, and also we add the rehost sockets to the cut-out, and also the huge images memory in QoCas and QoS. So this is the matrimony to play. And the red line is the data player. So for the common networking optimization and optimize through the OVS plus the APDK. So this kind of cut-out. And also we add the rehost sockets to the cut-out, which will also support the memory sharing. And QoCas and other networking functions also support it. So in our lab, we have 5 million PPS and 5 million connections. And we have 17.40 GVPS bandwidth. Another networking. RDNA. AI development. So in order to take care of the transition transmission between the multiple GPUs. So we can use the RDNA approach. And we can also expand our data connection. And so we will be very good at plan for the network. And for RDNA, there are two options for RDNA. So you can use the rocking tool or the sRLV. The IB engineering construct is most powerful and it is more difficult. After the construction, the performance is the best. The 111st generation is going to be a lot better. So we use the RDMA, not looking performance, but rocking, the difficulty level is lower, but performance is not as good as IB. For rocking, we recently launched the next generation roughly solution called air fabric. So in the future, we will follow up with this aspect. So we use IB as an example for the RDMA. We support IB connection to the container on the serverless server. IB is a connection to the container on the serverless. So we connect the RDMA ports to Kata as sRIOB. So the Kata should support the VF interface. And the second IB has the upper level processor. And also the RDMA is dependent on the port. So in the runtime, we also support the IP requirement driving. So we need to support the secure container in the future. IB is a heterogeneous network to interface with the container. And also for the GPU, and all these kind of heterogeneous components, we also need to support this interface with the Kata interface, which means in the Kata container, we need to have corresponding components for this kind of interface. So we need to support this interface with the Kata interface. And for the management, we also need to support this interface with the Kata interface. So we need to support this interface with the Kata interface. And also we need to support the network using the network isolation. So we need to support the IP network isolation. And also we need to support the which LAN support. So we need to support the IP the Richland support. So the IB network will support 100G event with as I just mentioned. And especially for the deep learnings in our organization, it's very good to have a solution for GPU. So, we can have a GPU interface that's been passed through and also can have a NV-link management. Here, I want to mention one thing about the NV-link management. And we really need to be used to address the bandwidth and also the transmission between the cores by the NV-link and the media. In a similar way, we have one server, one server, eight GPUs for each GPU at six NV-links. So, when we schedule it, we don't like to schedule the GPU without the NV-link. So, at least after the NV-link container, we need to approach the NV-link topology. So, after the NV-link is done, we have to schedule some GPUs to work with the NV-link. So, we need to have some sort of interruption on the remote. And next question, GPU directly from NMA. So, I'm saying that between different drives, we can use the NMA directly access to the GPU. And also, this is very useful. There are several key points here. If we want to do RDMA, first of all, we have to support RDMA and GPU and GPU and NV-link. So, this is like a server design. So, this is a bit tricky. First of all, we need to have the GPU on the GPU and GPU on the host. And also for the GPU on the host and on the IP network. They should be set the same or inconsistent. And also about GPU sharing. For GPU sharing, I've actually talked to customers and I've talked to friends in industry. Because in certain scenarios, there may be a require for GPU sharing. But the reason we don't mention this is we don't mention GPU sharing that. Actually, the technology for GPU sharing is not appropriate or suitable. For example, it is an MDV-based GPU sharing. Then it does not support multi-tenancy. If it supports NCAS hardware technologies for some AV support grid to do GPU sharing, then NCAS lessons will be problematic. So, if the scenarios require GPU sharing, we can actually use some other form or scale of GPU. For example, we give you a T4 or P4, this GPU to do. Or we can, like in Huawei, we have our own chip 310 to match the scenario. So, sometimes basically you don't need GPU sharing. And even with the scenarios that you need the GPU sharing, it can be replaced. So, it is practical, but it's not necessary. Another example for the test data, it is something we did by the end of last year. Our models, which is Huawei's AI training platform, on the top bench of Stanford, we became the top one on the deep learning ranking. So, with 120 GPU, we can have an acceleration ratio of about 80% in common scenarios, like 32K and general imaging scenarios. We can even achieve 96-97 acceleration ratio. This is AI computing with Huawei Ascend chip, support our own Ascend chip, because of a reason well known to each other. So, this is the storage part. This is also something I like to mention. For the cloud storage for EBS, OBS, and SFX, these are all our Huawei cloud storage. All are supported. We also support local storage to optimization point to mention. One is on the same server node for service less access. Actually, there's still IO competition between different containers. So, for this part, we do IO control. We have IO process control. But this process control is not visible to users, but it is managed through the upper layers of management interface. And also, you have interface definition from the round time layer. For example, on the top layer, we have to define the current limiting to do a current limiting. Another issue is SPK acceleration. And this is a DBK acceleration similar. So, in terms of storage, we also support SPK acceleration, including we support cloud storage acceleration. And we also support local storage acceleration. And also data. So, for we can do one million IOBS for cloud storage and 0.6 million for local storage. And live migration. Live migration. I think previously, I didn't heard other people talking about live migration. Well, actually, it is some, this is something we've been practiced for quite a long time. A lot of scenarios where we have to discontinue the server or do update. But we cannot manage a live migration or a live catching. We don't want to interrupt the customers' users use deep learning. In a lot of scenarios, our mission has to run for many days. And if we discontinue, then you have to rerun it for another, you know, a few days. So, the cost of discontinues is huge. So, we want to achieve container live migration to enable that the users can continue their business without being interrupted. This is a very complicated illustration, but pretty simple. So, from top Dr. Demon to the bottom Runtam Kata. So, at all levels, we have to achieve live corresponding live migration interfaces. And you have to also do check and preparation. You also have to do copies. You have to, at the Kata level, we have to cache copy, container copy. We, and also do copy at the destination node. So, it's quite complicated, the logic. And in a DPD case-based internet environment, the break time we can do now is less than one second. The break time for the network is shorter than one second. So, we're doing this. The story is we're doing this. But for the user's environment, it is not fully deployed. This is an issue that we want to develop. We want to have a wider deployment of live migration so we can increase the reliability of our users. And for this solution, the realization of different levels of interfaces, we also want to talk to the community, to the companies in the same industries, to promote live migration. The concerns, there's still a lot of restrictions for live migration. For example, in certain equipment and also for local storage, there could be issues. So, there's a lot of restrictions. For example, for heterogeneous devices, there could be an issue. So, these are all x8s. And also because for reasons that is known to you. So, recently we will support ARM-based containers. I think in the upcoming several months. And for ARM, the test result is meeting our expectations. So, with the single kernel performance, it's very close to the single kernel performance. And it is cheaper. And also, Cata, and Cata. So, they support the, but actually it requires the extra work. Like the live migration I mentioned. We want to actually support x86 and ARM. And that means we have to do extra work to make it compatible. Thank you. That's all for my presentation. Thank you again.