 Hello everyone. Welcome to my session. This is Yafeng Wu from Huawei Cloud. Kinetic is the most popular serverless project in the cloud native world today, as Kinetic has some terrific features, for example, portable when compared with other serverless platforms. At Huawei Cloud, we built a serverless platform based on Kinetic. There are tens of thousands of workloads running on it now. When we are building this platform, we found that improving the performance and minimizing the operational overhead are the key challenges. In this sharing, we will go over first how to minimize memory when you use Kinetic and the second how to improve the performance of Kinetic in grass data plane. So, what is serverless? For many people, serverless means for AWS Lambda or function as a service which is called FOSS, but let's abstract this concept a little bit higher and more general. Gottener says serverless is a method enables resources to be used as an OPEC, originally unlimited in a share pool that is continuously available without all the ones provisioning and priced in the units of the consumed IT service. FOSS is a part of the serverless world, but it is not the whole world. Gottener also says the evolution of cloud container service is toward distributed cloud and serverless. As you can see from this slide, some vendors have released multiple serverless serverless container service. For example, AWS Fargate and Google Cloud Run. The main concern about serverless of us is vendor locking and Kinetic is designed to eliminate it. So, what is Kinetic? Let's have a look at of Kinetic. Kinetic is the most popular serverless platform according to CNCF survey 2019. It is more popular than the second one on the list of the installable software in use. Kinetic is a tool of choice, followed by open FOSS and Kubernetes. Kinetic has two components, serving and inventing. They work together to automate and manage tasks and applications. Serving components helps us to run serverless containers in Kubernetes with ease. Kinetic takes care of the details of networking, auto scaling, even can scale to zero, and revision tracking. Teams can only focus on core logic using any programming languages. In one team contains universal subscription, delivery, and the management of events. It builds more than apps by attaching compute to a data stream with declarative event connectivity and developer friendly object models. In Huawei cloud, we are continuously innovating cloud native. We are the only founding member and the first platform member from Asia of cloud native computing foundation. Top one by committed code in Asia and the top one by the number of project maintenance in Asia. Huawei cloud also open process its capabilities in the cloud native field to a diverse range of industries such as Kube Edge, Volcano, and Commander. Now, we are building a service platform based on Kinetic. We met many challenges, for example, Code Star, China, and so on. In this sharing, we will only focus on two of these challenges, memory overhead and performance loss. So, now let's look at focused about two challenges. The architecture of Kinetic serving is shown in the figure. The user side request to the online gateway and the online gateway for what's the request for the corresponding code according to the routine rules. Kinetic serving provides automatically scaling for applications to match incoming demand. The autoscaler will collect specific metrics from posts to make scale decisions. In Kinetic, every port has a proxy sidecar, which is called helproxy. It has following functionalities. It's recording the concurrency request for auto scaling and buffering the request to improve the user container concurrency limit and recording some metrics such as latency. We did a simple test and found that when the concurrency request increased, the memory resources required by helproxy also increased. Moreover, even in the app size of any request, helproxy still requires about 20 megabytes of memory space. Imagine that in a cluster of 10,000 instances, our helproxy will take up more than 200 gigabytes of memory space, and this is the actual memory overhead brought by the helproxy. Kinetic serving provides two ways to config the resource request of helproxy. The first is fixed configuration. In this way, our helproxy will have the same resource request configuration, and the second is proportional configuration. In this way, the request resource of helproxy is determined by user container instance. For example, if we site in 50%, the user container request, for example, 100 megabytes memory, then the helproxy will request 50 megabytes. It is worth noting, however, that the ability of a user container to handle requests is not only determined by resource allocation. Even if the same size of the resource are allocated, the number of requests for different applications of helproxy may vary greatly. Which means on the one hand, some helproxy doesn't have enough resources to handle incoming requests resulting in low memory utilization of user container. On the other hand, some helproxy may have too much resources resulting in low memory utilization of helproxy itself. So anyway, there is a resource-based chain problem. Then how to maximize the resource utilization in order to fix the problem above? We make helproxy at node level instead of at that car. Our user container posts in the same node, share this node helproxy as left figure. Node helproxy will ask auto-scaler to scale based on its resource usage. This method has a following advantage. User container resource can be fully used, and the node helproxy results almost fully used. The much less resource cost when helproxy is in idle. Even if there are many idle instances on the node, that is, idle means the code that doesn't process any requests. There will only be one node helproxy instance on the node, which can save a lot of resources. Now let's look at the other challenge, performance loss. As shown in the figure, the user sender requested to the gateway, which was the request to the queue proxy in the corresponding code, and then the queue proxy was requested to the user container in the same code. Compared with direct access to user container, the traffic forwarding path in connectivity is longer, which will bring performance loss. The latency for users' standing request to avoid is affected greatly by the internet, so we only focus the following two performance loss points. One is the latency from the gateway to the queue proxy, and the other is the latency from the queue proxy to the user container. We optimize the performance for these two points. In order to optimize the latency from unwide to queue proxy, we have tried many methods, including CPU bindings, turning off some unnecessary additional functions, and they're rebalancing the latency of unwide. The optimized unwide has obviously improvement in QBS and latency. To fix the QBS, for example, for 4,000 QBS, the optimized unwide has significantly lower latency than the native unwide. The latency of top 90% can be reduced by up to 40%. When QBS is unlimited, compared with the native unwide, the QBS of the optimized unwide is significantly improved also, and the QBS increased can reach 100%. In addition, we use EVPF to accelerate the data transmission from queue proxy to user container. As we can see from the left figure, with queue proxy running as a sidecar in a traditional night walk, the path a packet has to take to reach the user container is the priority tortures. An inbound packet has to train with the host TCP IP stack to reach the port's network namespace via a virtual satellite connection and go through the network stack of a port to reach the queue proxy sidecar which forwards the packet through the loopback interface to reach the user container. With EVPF, we hook up our program to socket operations in the kernel, which can record socket in a hash map and redirect the packet according to that map. When the packet arrives on the host, our program will dispatch it straight to its destination. It's much more direct to route will result in lower latency. Now that's all, and we have some walks to do in the future. We have submit some full request to fake feedbacks or to implement feature to the kinetic community before and we would like to contribute more in the future. In addition, we are trying to software hardware in our scenario to provide the best cost efficiency to all users. To make more users benefit from the service is our constant goal. Thank you for your listening.