 Hi everyone. My name is Rantao. I'm from Huawei, ConnectWork. I hope you all enjoy this summit. My session is about Huawei's practice in migration to cloud native-based 5G technical cloud. In IT, cloud native is clearly defined. It's microservice plus container support and with some system automation. For all these three parts, some of them have already been supported in the technical cloud. Here I'm talking about a 4G EPC-based Core Network. We start to support the NFV from five years ago. We do already support microservices. Before that, the technical system is a purpose-built hardware with embedded, monolithic software. This kind of architecture is to support carrier-grade high-performance and high reliability. But when we are introducing the cloud technology, we need to decompose the software modules into small microservices. And to consider the difference between the technical software and the IT software, we are doing this decomposition in two dimensions. The first dimension is the vertical decomposition. This is because in the technical software, it is a state for design. We use a very complicated state machine mechanism to control all the user context during the services. So when we are doing the software model decomposition, the first thing we need to do is to separate the state for database. So to make the service processing part stateless, then the stateless service processing unit can support the flexible scaling and scale-out. Of course, it is not only a single service processing unit. We also, from a horizontal level, to decompose the software modules into different function services. So this part, microservices, is already done. But at this moment, all the microservices or the service modules are running in a virtual machine mode. So the container part is not supported yet in today's 4G technical network. And here comes 5G. It brings again the requirement to support the container. And this is the major topic I want to talk about today. And the third part is the system automation. We started to do this in operators network. But you know, for the operators operation and maintenance, it is very complicated. It needs to consider the inheritation of existing functions and by introducing the separation of the hardware and software layers, the current OAM procedure, even because it's more complicated. So the automation is mandatory. But this is a very complicated part. We cannot talk it very easily in a small session. So I will not talk about this today. So I will focus in on the container part. And I mentioned actually the come of 5G brings new requirement to support the container. So why is that? There are several reasons. On the left side, you will see the very famous triangle of the 5G definition. So in 4G, it's an MBB system. The 5G is not only about enhanced MBB. It's also about the massive machine type communication and the UR-ARRC, which is the ultra reliable low latency. So this need to do some adaptation based on the cloud technology. Something are good, easy to support, but something is not that simple. So here I list four major things to be considered. The first thing, obviously, 5G will bring much higher traffic. As forecast, it will be at least 10 times compared with 4G. So this need to support even higher performance. This kind of high traffic performance actually is advantage of the IT technology, not the IT. IT is good at computing, but here we're talking about the throughput, data transmission. So how to support this? Especially in a container based environment. So this is the first requirement. Second, it's about the low latency. And to support the low latency, we need to put the system more close to the exercise side, which means we need to support a distributed cloud instead of a centralized cloud. So as a forecast, we have some discussion with operators in some big network. It will bring maybe a hundred times more sites compared with 4G. So this is also a very high challenge or bring new requirement to the container systems. The third and the fourth things, they are good to be supported. 5G 3GPP defined the SBA, Service Based Architecture. This is quite suitable for the microservice and container systems to support. And the last is the network slicing. This not only needs the network function automation, but also a whole network level automation. So this brings new requirements. So with the requirements, how to support the container and which mode is suitable for the 5G technical cloud. I have a single comparison here. I list three modes. The left one is the virtual machine. Right now is the moment. We all have the virtual machine based NIFE 4G systems. The advantage and the drawbacks are very clear. It's not that agile, but it's enough for the 4G systems. The performance has some limitations because of the high provider will bring some performance loss. In the beginning, the performance loss is very high. We only can reach 10% of this throughput compared with embedded systems. But after five years hard work, this problem can, we can say it's solved. So the performance based on the virtual machine based system is okay enough for 4G. But for 5G, is that enough? It's still a question mark. And the virtual machine based architecture is okay to support the SBA. Different microservices can run in a single virtual machine, but the resource efficiency is very low. In a centralized data center, it's okay because we have enough hosts have enough virtual machines. But when we put it into a distributed cloud, the edge cloud doesn't have enough resources. This will also become a problem. So we say the virtual machine mode have some drawbacks supporting the 5G. So here comes the container mode. There are also two options. Option one is to support the container in the VM. Option two is to support the bare metal container directly. The technical comparison between these two modes are quite clear. The container in VM, the good thing is that we do not to change the infrastructure layer. And we do not to change the current build NFE based measurement systems. Operators are quite happy with this mode because they have spending years to build NFE data center. But the drawback for the container in the VM is also very clear because different multiple containers are running in a single virtual machine. So it cannot take advantage of the agility and flexibility of the containers. It's quite equal to a single virtual machine based magnetism. Then we say the bare metal container mode has totally different environment. It's very agile, very flexible and to the system restart and boot is very quickly. And from some point of view, the performance is very good also because we eliminate the restriction from the hardware layer. But it has a very clear weakness, which is the security and isolations. So we know the isolation between different containers is not good at the virtual machines. So this is considered by the operator is the top one issue of introducing the container. Of course, from technical point of view, we have Cata container, but this is not mature enough. So we say maybe we can consider this later in several years. So from technical point of view, the difference between container in VM and bare metal container is very clear. But which one is suitable for the 5G systems? It's not a black or white question. It's also related with some enhancement with some environment point of view. So we will go on to see which things should be considered additionally. First, I want to explain something about the mandatory enhancement from technical point of view on the container. Here the container I mean the Kubernetes. These are the two, the top two issues we are talking. Top one is the technical network requirements. Top two is the 5G latency. For the network issues, it's also one major difference between the IT application and the technical applications. The IT applications, they are mainly running in parallel, running alone. But in the technical systems, different VNS network functions should be networking with each other in a very well-designed topology. And even within VNS, different microservices, they should be strictly defined how to talk in with each other. So this is the technical networking requirement. But how to overcome and how to do this kind of enhancement? We say we cannot modify the upstream Kubernetes software releases directly. Otherwise, it will be very difficult to merge this update into the mainstream unless the community decided to accept the change. So it's very difficult to maintain. So our suggestion is to keep upstream Kubernetes first and to do some non-intuitive technical enhancements as plugins. So operators can decide to deploy the major Kubernetes with the plugins together to provide this kind of enhancement. And according to our understanding, all the technical vendors are doing the same way. So what is the detail in the network requirement to Kubernetes? Here is a very clear requirement, the separation of the three network planes. So what are these three network planes? They are measurement plane, control plane, and the service planes. So when we are doing microservices, different microservices are not only talking to each other using a single communication planes. In technical systems, we need three. Why we need three? The measurement plane is mainly responsible for doing the communication about the operation and the maintenance information. It has external interfaces. So the operators is doing their OAM using this plane. And the second plane is the control plane. It's actually an inner plane between different microservices to communicate with each other for the service governance and the secondary control, these kind of functions. And one very clear behavior of this plane is that it doesn't need to support high traffic, but it needs very high security because of the measurement functions. It doesn't have any external interfaces on the control plane. So the third plane is the service plane. All the high traffic in 5G or in 4G system is running between microservices on the service plane. So obviously it needs to support high data throughput. And all the external data plane interfaces in the 5G network is supported by this plane, like the GI interface, GN interface, etc. So the requirement for the separation of the three planes is also very clear. In the normal situation, it's okay, but to consider it's in an abnormal situation, for example, some traffic burst in the service plane. So it should not impact the normal rendering of the control plane and the management plane. And of course, the separation is also from security considerations. And as we check the native Kubernetes releases at this moment, it only supports single network plane. And it doesn't support the single root IOV and DPDK for a high data throughput point of view. So here comes the requirements. First, one part needs to support multiple planes. Second, different planes should be isolated by different V lines. The third to support the DPDK and the single root IOV to accelerate the data forwarding. So what we did, as I mentioned, we do not modify the upstream Kubernetes releases. We do provide our individual plug-in. We called ICANN the intelligent container networking based on the CNI to support all the separation of three planes. And due to the time limitation, I will not talk about the detail about technical implementation of this plug-in. Then the next thing to enhance on top of Kubernetes is to support the low latency and the performance. I list three key features here, the CPU isolation and the binding, the NUMA affinity, and the huge page. Actually, these three features are already there for the virtual machine. The OpenStack already supports this in current releases. But as we check the availability details for the container, they're not there. So it's clear we need to migrate all the functions from a virtual machine-based OpenStack releases to the container Kubernetes releases. This is not only about to support the high performance, but also to support the low latency. Because it's very important in the 5G systems. In 4G latency is okay. We are talking about around 50 to 100 millisecond. But in the 5G defined in the 3GPP, we need to support end-to-end service 1 millisecond latency. But in a normal deployment, we do not need that strict latency. But we still need to support, let's say, around 10 millisecond. And we allocate these 10 millisecond end-to-end to different network functions to the core network part. It's still only 1 millisecond. So these features are very important. And according to our tests in our lab, the performance and the latency can be increased quite a lot by using the enhancements on top of the Kubernetes. Okay, so I explained the enhancements required for Kubernetes. But how to support this in our end-to-end network point of view? Not only on a network function. So let's see this picture. It's a very typical three-layer architecture in our end-to-end network. It's a central layer, the edge layer, and the fire edge layer. Maybe in some small network, we can only support two layers, the central and the edge. And the fire edge can be deployed on demand. The central DC, it's already there for the 4G NFV. As I mentioned, most operator has deployed the central DC based on virtual machine mode. And they are not willing to modify them to a container-based quickly because they have invested quite a lot to support this one. And since the central DC has quite enough resources, so the requirement for the container is not that strong. And also because we will not modify the software for the 4G, IMS, and EPC, they are still virtual machine based. So our conclusion or our suggestion for the deployment in the central DC is to keep the virtual machine based mode on top of OpenStack. And when we are introducing the 5G functions, which is suitable for the container, then we do support the container in the end mode. So the infrastructure in the central DC should not be changed, and a hybrid mode should be supported. So this is for the central DC. And for the edge DC, actually it's a dilemma to select which mode. It's related with the real environment in the edge machine room. Is it big enough for enough service, for enough resources? If it is so, then the virtual machine mode is okay, just as the central DC. But if the deployment environment in the edge is quite limited, we can support only several servers, then maybe we also need to support the container mode at this moment. But the things are totally different on the 5G. When we are talking about the 5G, actually most of the sites we are considered to deploy as on-premise in the industry environment, let's say. Or even in the operator network, it's a very small machine room, less than 10 servers. So the resource is very limited. So we recommend to consider a bare metal container mode in the 5G to improve the efficiency and density of the system. Then from a single operator point of view, it needs to maintain, as we discussed, these three modes in central DC in edge and in the 5G. But it's quite difficult or even impossible for the operators to maintain three totally different operating systems. So here comes a requirement to the cloud operating systems. It must be heterogeneous to support the fully distributed cloud. So at least it must be a dual stack to support both virtual machine mode and container mode. And both in the central deployment mode and in the edge mode. So we believe this is suitable mode to consider the container deployment. As I just mentioned, it's not a black or white question to select container over virtual machine or bare metal container. It's a mix and this mix is mandatory. This is another angle to check this issue from a migration point of view. So as I mentioned, the virtual machine mode is already there and the operator cannot and they are not willing to migrate the system in one day or even one year to a totally container base. This period may last over three or even five years. And which means that in this five year time frame, we must keep our dual stack platform and the manual system for the virtual machine and the container in a long period. And the last thing I will not discuss in detail, but just a highlight here. When we introduce the technical cloud, it makes the OEM in the operator system very complex. So we need the some mature automation tools and the CICB or DevOps procedures or even the AI engine supported to improve the OEM efficiency. And this even becomes more, how to say, mandatory when we're introducing the container into the systems because container will bring a normal OEM procedure even more complex. It's very difficult to do so. So today we do not have the time to discuss with this one. I just give the conclusions. We need automation. Okay, so simple takeaway for my session. We need to support a heterogeneous platform. It is the trend during the five network migrations. And the most important thing is to support our dual stack and the location dependent mode to support both virtual machine and container and both supported in the central DC and edge DC. And at the current phase, I would like to say the container in the end mode is more suitable or more suggested at this moment. And the system automation is very important. It's still a long way to go. Okay, so that's all for my session. Thank you all for your kind attention.