 Hello everyone, welcome to our session application auto-scaling through elastic Kubernetes part. My name is Kathy Zhang. I'm a senior principal engineer at Intel. Along with me is Teresa Shan, who is a senior cloud software engineer at Intel. Let me go to the next slide. This is what we are going to cover in today's talk. First, we are going to talk about what is elastic Kubernetes part. And then we will go through some use cases and why do we need it. And then Teresa will take over and talk about how do we support this. And she will also do a POC demo and finally will show some test results. So what do we mean by elastic Kubernetes part? By the elastic Kubernetes part, what we mean is that when the application needs to scale out, instead of creating multiple applications paths, with elastic paths, you can create multiple application containers inside an existing running path. For example, as shown in this diagram, the user first create a pod with one application container and another sidecar container. And then the user can specify a replica number. For example, a replica number equals 10. And then after that, the system will take over, will automatically create 10 new replicated application instances inside the same running pod. So why do we need it? The first use case is function as a service. In function as a service, the system needs to scale out, automatically support scale out and scale in based on the incoming requests. So for example, at the very beginning, the system is running just one function container. And then when the users updates the replica and the system needs to automatically scale out more container application containers. And so it will scale out that application containers inside the same running pod. The second use scenario is container network function, so CNF. As this diagram shows at the beginning, the user will create application container, and along with that, he or she will create a firewall CNF and load balancer CNF. And then when the network security attack is detected, a DPI container needs to be created or added as a sidecar to that application container. So instead of creating, instead of deleting the existing pod, which has two sidecars and create a new pod with three sidecars. With elastic pod, you can create, we can reuse the existing pod and create a new DPI CNF sidecar inside the existing pod. So this will reduce the creation time, the creation latency and also to save the resource utilization rate. The third use case is in confidential computing. In confidential computing, when you create an application pod, it involves three steps. The first step is to pull encrypted application, the encrypted application container image from a remote image registry to sub-enclave. And then the image management service will then do, will then decode and unpack the image. And a test agent, which run inside the same enclave, same step enclave will do the attestation and retrieve the decryption key. And then the image management service will launch the application container in a separate application enclave. So the enclave means a security running environment for the, for any container or any module running inside it. So if we need to create two application containers in the existing mechanism, we need to create two pods, then, you know, each pod creation involves the three steps. So basically if we need to create two pods, we need to repeat, you know, the three steps, the three steps twice. With the elastic pod mechanism on the two application containers can share the same stop enclave. So we only, so we only need to do the three steps once. So the step one, sorry, the step one, step two, once. And then, you know, we can do the step three for each as we launch, to launch each application container. So this will also reduce, you know, the overhead of doing, you know, the step one, the step two multiple times if we need to create multiple application containers. And the fourth use case is performance optimization. With, without elastic pod, if a pod, if an application container spec changes, there will be multiple pod spec changes request sent to the Kublai component. And the Kublai component will send multiple messages to the pods. Basically, you know, send one message to each pod. But if for suppose if we have, you know, 10 application containers, which, you know, maps to 10 pods, then, you know, there will be 10 pod spec change messages sent to the Kublai. And the Kublai will send, you know, 10 messages, you know, to the, to the pods, to the pod worker, to the 10 pod workers. And then each pod worker will send the pod status update to the API server. So totally, there will be 10 pod status update to the API server. And this elastic pod, the, the only because all these 10 containers running inside the same pod inside one pod, and there's only one pod worker. So there's only one pod spec change message sent to Kublai. And Kublai only need to send that message to one pod worker, and that one pod work and that pod worker will only send one pod status update message to API server. And Kublai reduces the message communication overhead between all the different components of the Kubernetes framework. Now I'm going to hand it over to Teresa to go through how we support. Hello, everyone. My name is Teresa. I'm a cloud software engineer for Intel. And welcome to our presentation. Today I'm going to walk you through the design details of our elastic pod method. And also I will elaborate what we've been, we've changed to the current Kubernetes is called base. And off that, we will go through the lifecycle management of the replicated containers. And finally, we're going to quickly go through a demo. Firstly, let's take a look at the high level architecture of Kubernetes. From a high level, a Kubernetes environment consists of a control plan, also known as the master node. And distributed a storage system, which is the ATCD database, and a number of worker nodes running on the running, which runs the Kublai agents. The control plan is the system that continuously manages the object status. It also works to make the desired actual status of the replica to match the desired state of the object. And as you can see from the left illustration, the control plan is made up of major three components. They are the API server, the Kublai controller manager, and the Kublai scheduler. The API server provides API to support the lifecycle orchestration, such as scaling, updates, etc., for different types of applications. And the scheduler is responsible for the scheduling of pods across the worker nodes in the cluster. And the Kublai controller manager is a demo that embeds the control loops shipped with Kubernetes. On the right hand, the worker nodes are actually the machines that really run containers through container run times and maintain the lifecycle of the pods and the containers on them. And the bullets below, this is the major change we made to Kubernetes. First of all, in order to specify the replica value, we add a replica field in the pod spec. I will discuss it later in the next page. Additionally, we also modified the API server to validate the replica value and set the replication state. We also modified the Kublai scheduler to calculate whether the node resource fit for the replication. And besides that, we add a logic in Kublai to make it create or delete containers by reading the replica value. In the pod spec, we extend the pod status by adding a new data struct to manage the lifecycle of the replicated containers. I will talk more about it in our POC demo. Here are the changes we've made to the pod spec. As you can see on the up left right corner of the slides, a field called replica has been added into the pod spec under the container struct. The replica value specifies the total number of the running instance of the container, including the original container and its copies. The value ranges from zero to a predefined positive integer number, which could be defined and loaded into API server staff command. Zero is permitted here because in a serverless environment, users are usually billed per their usage. In our POC, we allow users to delete all the containers in a pod, only keeping the pod sandbox running. In that case, the function instance are created based on the volume of requests. In addition, as you can see on the bottom right of this page, the container statuses are also extended to track the replication containers status. It also watches the replicated containers lifecycle changes by using the POEG component in the Kubelet demo. As well as the pod spec chains, we also define a set of states to manage the lifecycle of the replicated containers. There are five states. The first is the unset state. Unset state is already state before the replication. The replication state will also be set to unset when the replication is completed. The next is the proposed state. Proposed state is a state set by the API server. It is set when the pod spec in the patch request is valid. The important state indicates that the nodes allocated for resource fit for the replication. It is set by the Kubelet scheduler. The fourth state is an infeasible state. Infeasible state is set when the resource request is beyond the nodes capability, meaning that it is impossible to complete the replication at all on the node. The last state is a timeout state. A timeout interval is set when the Kubelet starts. A timeout state indicates that it takes longer than the timeout interval to complete the replication. Then Kubelet would stop the trying and set the replication state to timeout. Then let's move over to the left to take a look at the status flow. From the left on, containers follow a defined lifecycle, starting in an unset state, moving through the proposed state, when the replication value is valid. Then it will move through either an infeasible state or a progress state, or keeps in the proposed state, depending on whether the node resource fits for the request. In our elastic pod method, the replicated containers will be scheduled onto the node that hosts the primary container. As the pod can only be scheduled once in its lifecycle. Therefore, the scheduler continuously watches for the pod in proposed state and checks whether the node resource could fit the replication request. It will be set in infeasible state if the nodes total capacity cannot meet the request. Or it keeps in proposed state if the current resource cannot fit the request, but the nodes total capacity can. It will transit into the incogra state when there are sufficient resources for the replication. Then from the incogra state, the container moves through the timeout state or an unset state, or keeps in progress state, depending on whether the replication is achieved or not. It goes back to unset state when the actual replica equals to the desired replica, meaning that the replication is completed. It goes to timeout state when the replication time exceeds the timeout interval. Otherwise, it will stays in the incogra state when the replication is not yet complete. Let's flip over to take a look at the system flow chart. On the left corner, as you can see, when a user sends a patch request to the API server to update the replica value in the prospect, then the API server will validate the replica value set the replica stage to in proposed. In the meantime, the scheduler works for the positive replica stage. When it is in proposed state, the scheduler would reevaluate whether the node resource fits. The scheduler set the replica stage status to infeasible when the resource request is beyond the nodes capability. Set it to proposed when the node resource does not fit for the time being. Set it to in progress when the replication is good to go. After that, both the scheduler cache and the ATCD database will be updated accordingly. When the container is in progress state, Kubelet will be triggered to create or delete containers by reading the replica value. The Kubelet replica stage will be set to unset if the actual replica equals to the desired replica. The replica stage will be in progress state if the node is not available for the time being. Otherwise, if it takes longer than the timeout interval to complete the replication, the replica stage will be set to timeout. Alright, after having an overview of our detailed design, I'm going to show you how it really works by a demo. In the demo, we're going to demonstrate the following scenarios. Firstly, we're going to demonstrate scaling up and down container in a pod that holds only one container, including scaling down the container into zero replica. And secondly, we're going to demonstrate a pod that holds two containers and scaling up or down both of them concurrently. Then let's quickly go through the demo. Finally, we're going to create a pod that holds only one container, and the container has two replicas. Let's take a look at the pod spec. As you can see, a replica value is specified here, and the value is two, meaning that two instances of the container will be running in the pod. Then let's try to create a pod. Let's take a look at the pod status. There are two instances operating running. And take a close look at the pod status. As you can see here, we add a pod struct under the pod status. There are a couple of fields in the replica object. The container name lists the name of containers, including the original container name and the replicated container names. The field replica stands for the actual number of the running instances. The field replica time at the time COVID starts to make replications. The field reserve the replica stands for the desired number of instance to be started. As you can see here, the replica value equals to the reserve the replica value. And the replica stage is in unset state. That means that the replication is successful. Next, we are going to scale down the stress container into zero replica. Before that, we are going to send a patch request to the API server by specifying the replica value to zero. Let's check the status. As you can see here, both the actual replica value and the reserve replica value has become zero. Meaning that there are zero instance running in this pod. Next, we are going to scale up the stress container to five instance. We're trying to send a patch request to the API server. There are five instance are up and running now. Let's take a close look at the pod status. As you can see, there are three additional containers has been created for the replication. And the actual replica value and the reserve replica value are both five. Next, we are going to create a pod that hosts two containers. Let's take a look at the pod spec. There are two containers in this pod. The container named stress is the primary container. And the container named debug is a sidecar container for troubleshooting. And when the replica value is not set, the default replica is one. We're trying to create this pod. Check the status. And we are going to verify whether the default replica value is one or not. As you can see, both the actual replica value is one. And then we are going to scale both of the containers to two replicas. Send the patch request to the API server. And check the status. There are four instance are up and running now. Yeah. Yeah, out of the four, two are for debug, two are for the stress container, right? Yes, right. And as you can see here, the stress container has two replicas now. And the debug container has two replicas as well. Meaning that the replication is successful. Okay, based on the implementation demo on the above, we also performed a test to compare the time used to create replications use Kubernetes replica set with the time using our elastic pod method. The test is performed on an Intel knock with 16 gigabyte memory and eight core CPU. And it is performed in an all-in-one Kubernetes cluster. Our code base is implementing the base on Kubernetes 1.23 code base. And the container image we use for test is Devian 11, which is pre-downloaded onto the host. On the left hand, as you can see, the column chart illustrates the time used to create replication replicated containers. The x-axis represents the replica value, which is scaling from scales from one replica. The y-axis represents the time consumption in second. The blue column represents the time consumption by our elastic pod method. And the red column represents the time consumption by the replica set method. As you can see from the calling chart on the left, the startup latency goes up slightly when the replica value rises using our elastic pod method. The data fluctuates around one second because Kubernetes tracks the container status every one second. The default relays interval deeply impacts the startup latency here. In contrast, when using the replica set method, the time consumption increases rapidly when the replica goes up. In comparison, the elastic pod method dramatically cut down on the time to replicate the container, especially when the replica value is very large. Based on the test data, we could conclude that the higher the scaling concurrency is, the more performance gains of using our elastic pod method. And it also seems that the replica value does not have much impact on the scaling latency. And our elastic pod method reduces the startup latency up to 87% when adding 10 replicas. All right, that's all we have for today. And thanks for watching. If you have any interest in our project, don't hesitate to contact us and look forward to your feedback. Thank you. Yeah, if you have any questions, feel free to reach out to us by email. Our email is listed at the beginning of the slide. Yeah, thank you. Thank you.