 Hi everyone, welcome and thank you so much for joining us. I'm going to talk about a Kubernetes-based workload allocation optimizer, which is designed to minimize the power consumption of computer systems. We are Infant, Kazuhiro, and Morito from Osaka University, and we are so delighted to be here at KubeCon, Cloud NativeCon North America this year. Here is the outline of my talk today. First and foremost, I'm going to tell you a little bit about ourselves. Then I will give you a brief intro and the background of this proposal, which is about the challenge of increasing power consumption in data centers. Then I will introduce the details of our proposed approaches. A Kubernetes-based workload allocation optimizer, we name it as WOW. WOW includes WOW scheduler and WOW balancer. They work together to perform the optimal task allocation, so eventually it achieves the goal of data center power consumption reductions. You might be interested about what is WOW's performance and how much data center power reduction can be obtained by using our proposed WOW. We will show all of this based on an experiment of using 200 servers in our private data centers. Finally, the conclusion will be provided at the end of this talk. So, who are we? We are members of Matsuoka Laboratory at Osaka University. Our research mainly focuses on the field of data center energy savings. Our works always starts from building real testbed data center and operate them on 24-7 basis. Actually, this proposal is based on the infrastructure of two data center sites in Osaka City. So, after data centers are set up, we install various of centers inside the data center and then we collect all detailed power consumption related data from those sensors as well as the data from package air conditioner and data from all the servers. By using this data, we use technology such as building machine learning power consumption models on each server and CFD simulation technology to reduce the data center power consumption. We have developed several data center power reduction techniques in our laboratory. One of example like this figure was using liquid immersion cooling technology to reduce system temperature. We have also used machine learning techniques such as deep learning to predict data center power consumption. Based on accurate power consumption prediction, an optimal tax allocation plan can be generated because the user's request is dispatched based on this optimal tax allocation plan. So, eventually the power consumption of the data center can be controlled and reduced. The method proposed in this talk is to extend the concept of the optimal workload allocation in this slide to Kubernetes architecture. Due to the rapid increase in network services, cloud edge computing system management has become more complex than ever. Besides, the increase in total power consumption of cloud edge computing system is another critical issue. In this session, I'd like to talk a little about the background of this proposal which is the challenge of increasing power consumption in data centers. So, with the massive increase in IoT devices and the demand for 5G networking, the amount of computing resources required has also increased dramatically. Edge computing using 5G networks may reduce the communication time, but meanwhile, the management tasks are also becoming more and more complex. Also, the significant increase in the usage of cloud edge computing systems has lead to increase in power consumption. Kubernetes provide container orchestration solutions with functions of container deployment monitoring and the scale management. Many Kubernetes extension has been focused on high-performance container orchestration on large-scale networking environment. However, the current Kubernetes does not provide the container orchestration from the perspective of power consumption reduction, and there are relatively few discussions in the community on how to operate containers at runtime while considering both energy saving and service performance. For example, it will be great if we can consider various applications that require as well as different capacity and performance of computing resources before micro-service deployment. Okay, now let me talk about our proposed approach. The Kubernetes-based workload allocation optimizer, WAL, which consists of the components of WAL scheduler and WAL load balancer. So, in this proposal, we extend our WAL workload allocation optimizer to the Kubernetes-based platform for power consumption reduction in data center. We have proposed several versions of WAL in the past. The main concept of WAL is to use machine learning to predict server power consumption and perform optimal task allocation. In other words, we create power consumption model using machine learning method, then with this prediction result from the power consumption model, WAL allocates task to the server with the least amount of increased power consumption. So, actually, in the next session of this talk, we will show how WAL achieves excellent power consumption reduction in data center. This talk presents a WAL workload allocation optimizer that manages the micro-service container-based on Kubernetes architecture. In Kubernetes, part is the smallest deployable unit and consists of single or multiple containers, and node can be either virtual or physical machines. For task allocation on Kubernetes, first, the cube scheduler distributes parts to the nodes, and after that, parts process user's request. The cube scheduler consists only resource status of each node, and Meta-LB provides a basic no-balancing solution for Kubernetes-based clusters. In other words, there is no power consumption concept taken into account in this process as we mentioned earlier. To realize the power consumption reduction on Kubernetes-based, our approach consists of two major steps. First, we develop WAL-based scheduler. We name it WAL scheduler. This is for the part allocation and WAL-based low-balancer. We name it WAL-LB. It is for the task allocation. After that, tasks are allocated to parts depending on requests from the client. Regarding to WAL power saving operating, as we just mentioned, cube scheduler allocates parts by considering resources of the system, such as the CPU and memories of each node. Meta-LB is a young project which acts as a simple low-balancer by using standard routing protocol. As for our proposed WAL, WAL scheduler provides the power consumption priority control and WAL-LB can manipulate task allocation priority between power consumption and response time. Now I will explain the architecture of WAL scheduler, especially PCS. After taking the available nodes list from filter phase, PCS first collects information on each node. For example, results usage such as CPU usage in Kubernetes cluster, which is aggregated by metric server and temperature around node, which is obtained from each node. After collecting those information, PCS predicts the increase of power consumption of each node using tensorflow serving server. Finally, PCS scores each node by using the predict power consumption increase and determines the optimal node for part allocation. So, through this whole process, WAL scheduler is able to perform part allocation considering the power consumption. For the WAL low-balancer, the WAL-LB, it gets the part allocation periodically using Qube API server when receiving a request from a client. WAL-LB first collects information about each part, such as CPU memory and network status from C advisor, then WAL-LB predicts the increase of power consumption by PC model and the response time use RT models. Both models are based on the tensorflow serving server. By utilizing those information, WAL-LB is able to perform part allocation based on the tradeoff between the power consumption and the response time. As I just mentioned, WAL uses two machine learning models, the power consumption model and the response time model to determine the test allocations. I'd like to briefly explain the design of these two machine learning models here. First, both models are based on neural network. We have test other type of machine learning models, such as SVN random forest, and we found neural network has the highest accuracy among others. After hyper parameter tunings, we choose Adam as the optimizer for both models, and the final design of the power consumption model has one hidden layer, while the response time model has three hidden layers. So now you might ask what is WAL's actual performance or what level of data center power reduction that you are talking about. In these evaluation sessions, I'm going to give you a demo example based on operating of using a 200 server cluster in our private data center. We have just introduced our WAL for the purpose of data center power consumption reductions. It wouldn't make any sense without considering the performance, which is the responding time of the microservices. So how does it work and how is the WAL's performance? Actually, to verify this, we set up an experiment in our private data center. Inside the data center, we use about 200 cents back of servers for this experiment. We choose an object detection application to evaluate the performance of WAL. Object detection is widely used today such as security camera, self-driving cars, smartphone application. So we believe this use case is suitable to represent our work. This application is about upon receiving an image. The AI model run by microservices will search and add annotations to the images. So this is something like the photo in the slide. Air conditioners usually concern the most power consumption in data center and the temperature of the data center also dominates the total power consumption of the data center. As the first step of our experiment, we investigate the relationship between the power saving and the preset temperature of the air conditioner in data center. In this figure, the horizontal line shows the preset temperature of the air conditioner and the vertical line is the difference of power consumption increase between WAL scheduler and default cube scheduler. We found that the preset temperature was 24 C while scheduler reduce more power consumption than other temperature settings. So in fact, server fans also has high impact on power consumption. At temperature of 24 C, server fans started to rotate and this upgrading caused the increase of power consumption. Therefore, we adapt the temperature of 24 C for the experiments of WAL scheduler and for WAL LV. This 3D figure is the result of server power consumption model in this experiment. The horizontal line indicates temperature around the server and the CPU usage and the vertical line indicates power consumption. Blue lines predict value from power consumption model and the low screen points are the real values of the test data. The most remarkable finding here is that the server's power consumption increases significantly when the CPU usage was between 10% and 30%. In a proposed WAL, other than using power prediction model to predict the increased power consumption of servers, we also create a response time model to evaluate the response time of applications. In order to meet operating requirements of applications, WAL LV used an evaluation function to determine the priority of task allocation. The parameters of alpha and beta in the evaluation function are weights of index and this value can be changed depending on the application requirements. For example, for application related to self-driving, low response time is critical. On the other hand, for non-real-time related applications such as sensors that observe the weather for a long time, lowering power consumption can be prioritized. In this demo experiment, we choose an object detection application and we observe that correlations of negative 0.569 between increased power consumption and the response time. Okay, let me start from talking about our evaluation plan of Kubernetes-based WAL for data center power savings. As I mentioned earlier, WAL consists of WAL scheduler for part allocation and WAL LV for task allocation. We use default cube scheduler and the mental LV as default best night compared with some metrics. Then three combined effects of both WAL scheduler and WAL LV can be derived here. In other words, we like to see how much power saving can be obtained from our WAL by comparison then with the default Kubernetes-based baseline. The first evaluation plan is about finding the power saving from using WAL scheduler with mental LV. In other words, we like to find out how much power saving can be obtained by replacing cube scheduler to the proposed WAL scheduler. The left figure shows the dynamic of total power consumption from all of 200 servers in this experiment when WAL scheduler or default cube scheduler are locating the parts. This result is based on part allocation of 10, 30 and 50 parts with a different level of CPU usage. Obviously, the total power consumption increase with the increase of parts and CPU usage. Here we can summarize three observations. The first one, WAL scheduler achieves power savings regardless of the number of allocated parts. And the second one, the highest power reduction rate happens when operating on 10 parts. As you can see from the right figure, compared to the default cube scheduler, about 8% of power reduction was achieved at the total CPU usage of 20%. In the third observation, we find when there are more available resources such as idle CPU, WAL scheduler will achieve a higher degree of power savings. Next, let's see how much power saving can be obtained by replacing default metal LB to proposed WAL LB. In other words, we like to see the performance of power reduction from setting of default cube scheduler plus the proposed WAL LB. This figure shows the power consumption when WAL LB and metal LB are located task. The horizontal line shows the total CPU usage and the vertical line indicates the total power consumption. As for the WAL LB, we examine with three different patterns of the priority between power consumption and the response time. Those patterns are the first one, prioritizing only increase of the power consumption, and the second one, prioritizing only responding time and the last one is equal, prioritizing both increase of power consumption and the response time. When power consumption is prioritized as shown in the yellow line here, WAL LB achieve more power saving compared to the metal LB, the green line. Precisely, it is about 9.9% of power reduction at the CPU usage of 20%. Lastly, let me show you the full WAL solution, which is combined effect of both WAL scheduler and WAL LB. This figure shows the power consumption result between the proposed WAL, which is the combination by using WAL scheduler plus WAL LB and the default Kubernetes solution, which is based on cube scheduler plus the metal LB. So compared with the default cube scheduler and metal LB, the complete WAL solution reach about the maximum of 13% power consumption reduction at the total CPU usage of 27%. And in this figure, we summarize the power consumption saving behavior of WAL. We can see that the WAL scheduler and the WAL LB reach over 12% of power consumption reduction at the total CPU usage between 15% to 39%. So actually in general, server utilization in a typical data center is often between 20% to 40%. So this result tells us that the proposed Kubernetes-based WAL can achieve most of its power consumption reduction under the common data center CPU usage scenario. We also evaluated WAL in terms of the response time. Otherwise, it doesn't make any sense for a cloud edge computing system by only talk about its power reductions. Each box plus shows the response time when applying WAL LB with three different priority settings. The vertical line in the figure is response time and the horizontal line shows the CPU usage from 10% to 50%. As we can see here, in the most of comparison scenario the WAL LB has lower response time for task allocation than a metal LB. When looking at its balance mode, it reduces the power consumption and the response time than the default metal LB task allocations. Most importantly, WAL LB supports priority adjustment between power consumption reduction and application response time requirements. Here is my conclusion. This talk is about a workload allocation app in my WAL that extends Kubernetes's platform to realize power consumption reductions. WAL uses WAL scheduler for part allocation and WAL LB is designed for task allocation. Both can be used to reduce data center power consumption and when using both, WAL could reach about 13% power consumption reduction compared to the default cube scheduler and the metal LB. In addition, WAL LB provide application priority control mechanism so it can meet the response time requirement of each application and obtain power consumption reduction under such priorities. This is all about my talk today and thank you for joining us again. I'd like to take any questions from here. Thank you.