 We are from Osaka University and we are so excited to be here to present our work at this Kicking of Cube Day event. My name is Infant and Sokawa is the co-speaker and our team member Mr. Matsuda and team leader Professor Matsuoka is also here with us today. So we are so excited about planning this work and this is about the energy aware data center operating system that is dedicated for the environmental sustainability for the cloud net computing systems. So here is the outline of my talk today. So first and foremost I will talk about a little bit about ourselves and I will talk about the background of this talk which is the challenge of the environmental sustainability, a challenge of the power consumption on cloud edge computing systems and I will talk about the detail of the proposed solution energy aware data center operating system. We name it EADOS. So at this point you might be interested about so what kind of data center we are talking about and how much data power consumption reduction we are talking about here. So all of this will be demoed through a testbed data center about 200 physical servers. So this project is initiated and supported by the NATO and expect to develop the energy saving technology from Osaka University and several other joint research partners. So based on these open consortium alignments the success of this project will not only reduce the carbon footprint of the cloud net computing and also will bring the cost reduction and business benefit for the research partners. So I'd like to share about it our milestone a little bit. Our research group has been working on developing power consumption reduction related technology in the data center. In 2019 we created two data center in the Osaka site. After that various of the technology have been conduct carry on in this data center such as the emulsion cooling, CFD and machine learning for power consumption prediction and data center power reductions. Recently we moved our work through a workload allocation and it is based on the Kubernetes and this work was set in the CNCF last year in North America and later on it also be reported on a technical block. So as we know the environmental sustainability of cloud computing is indeed a crucial concept. So in this year quarter three we start our NATO project and almost the same time in the CNCF there is environmental sustainability technical advisor group tech group was created inside the CNCF. So this tech group was created to support and assist cloud native related project that related to environmental sustainability. We have been joined this group and we will continue work close with this tech and inside the CNCF. So let me first explain the background of this proposal. So as of today we know that we continue growing the scale of the cloud native computing and it is currently the data center runs about two percent of worldwide power consumption and it is expected to be 15 times more in the future 10 years. So that's why we are thinking about building a energy array data center operation system to reduce the power consumption of the data center. So this is a resource about how carbon footprint from the cloud computing will be different based on different rules. For example type of application type of infrastructure and type of cloud. So usually in general business related application consume much more energy than content manager applications and physical machine consume much more than the virtual machine and for the type of cloud public clouds because they have more control and it is more reliable and less power consumption than private cloud. So what is this important? Sorry. And this is because of this huge difference tell us there is always an opportunity to reduce the gap of the power consumption. For example by using the better management like container orchestration or low balancing. And this is what we are here for today. So let me first explain the product image of this proposal based on the technological development views. So the data center operating system to be divided in this proposal is in the middle of the orange block which is between the application layer the gray block in the upper and the hardware layer in the blue block. The hardware layer is consists of cloud edge and make computing environment. So now I will explain the technical view from the developing of this architecture. I would like to use a video processing service as an example. So it is starting from the great box on the upper left. Then the proposed EADO is residing in the middle. Then there is cloud computing resource the hardware part in the blue block. So inside the orange block layer are actually three of A and B and C that is the control servers control all the operations and the full technology one to four to operate together to achieve the data center energy reduction for this video processing services. So for the process one when the video processing service start it send the service definition to the server A and in the same time we collect the power consumption and other information from a server B. So all works together information will be sent to the server C and Microsoft service arrangement at air condition also including into the server C. So eventually server C we evaluate the air condition of power server power and the response time of information based on the evaluation function. And then they send a new parameter control parameter back to the cloud environment. So eventually everything is optimized and can be reduced the total power consumption in the cloud environment. So by repeating this kind of five process sequentially of each load energy saving can be realized as a result. In addition power consumption and the response time model can be flexible control based on the service requirements. So this kind of flash come to McKay's event will lead to high return of the business and also the environmental sustainability. So now let's talk about the actual information of this process EADOS and how it works through the goal of the data center energy reduction. So I will talk about starting from the data collection and how we built the machine learning model and how we use the machine model to build the workload allocation and eventually reach the goal of the data center energy reductions. So we built our test bed data center in different structure and that is why we want to ensure the purpose of EADOS works in different environment. As an example of these two test bed data center as you can see from the diagram they are different and they have different kind of cooling air flow. So we focus a lot for the air conditioner optimization because it is the largest single unit comes soon about 25 to 40 percent of total power consumption in the data center. So this is why we would like to make focus a lot for reducing the air condition power consumption. So the way we collect the data in this style so this is the example of one of our data center it contains about 200 servers. So we collect all of kind of operating parameters which is around 4,000 per minute and we use the premises and Zebix to collect the data. So because of this kind of big data so put in the minutes scale sometimes there is a duplicator and some data description issue or we need to fix. All of this data can be categorized to be three categories. One is the temperature sensor so it tells about different temperature inside the data center and we also collect the detail parameter of the air conditioners because as I mentioned it is the most important device in the data center. Most of the data actually comes from the server because for example there are 200 servers and each server has 8 to 16 fans so 200 times 16 so this is a lot of parameters it actually comes from the servers. So all together for one year data this will be about 2.5 to 3 billionths of the data. So as you can image this kind of big data when we want to use it to build a different prediction model it's very tight considering because the origin data is collected in the real time need to be pre-processed before it can be used. So recently we are starting to moving to use the Kafka platform and by using the Kafka stream process and the ksqldb so everything can be run in the real time. For example we create many different of Kafka topics. Each topic can be considered input of a model so in this sense we can greatly reduce the data amount because everything is processing the real time we only select we want and all data can be reproduced to build different models. Okay so after collecting the data that's what I want to mention we put a lot of effort for machine learning technique and because the accurate power prediction model will be lead to the result of optimal task allocation. So eventually with the accurate and reliable power prediction model the task allocation can be realized and achieve the power consumption reduction of the data center. So I think sorry next I would like to have Sogawa he will share his experience about how he use PyGrid and other software to create a lot of power prediction model. So thank you so thank you professor Su. I like to talk about power consumption of data center and its prediction model. Air conditioning equipment power consumption and server equipment power consumption account for most of the power consumption in a data center. To optimize power consumption in a data center it is most important to reduce the power consumption of these devices. The basic operations that can be performed on the air conditioning equipment are the temperature and the airflow settings of each air conditioner. Another operation that can be performed on the server equipment is to determine which servers to place the work rhythm. To optimize power consumption in a data center it is necessary to know how power consumption changes when operations such as controlling the air conditioning and the server task placement are performed. If we can quickly predict the power consumption of each operation we can optimize power consumption by performing the operation with a small rest increase in power consumption among the possible patterns. Therefore we build a model to predict the power consumption of each device, server and the air conditioning using information available from the standpoint of each device administrator. Since the air conditioning administrator cannot know detailed information such as CPU utilization for each server they can only use the power consumed by each server and backfiring temperature. In addition since the power consumption of air conditioning is greatly affected by the outside temperature and humidity this information is also used. Since detailed information on air conditioning equipment is not available to the server administrator the server CPU utilization, server ambient temperature and static pressure difference between the front and back planes are used. The static pressure difference is information equivalent to the amount of air flow through the server. Since power consumption is a characteristic that varies from device to device power consumption model must be created for each device. However once a model is built for one device it can be used universally for that device so the model is built using only information that can be easily obtained. I will discuss how we build the model I just described. On Kubernetes we have deployed MLflow which allows us to easily compare the results of our experiments. Using MLflow the results of the experiments can be checked and shared from a brother and the results can be aggregated from a Python script. Pycaret is equipped with a standard function to record the results of experiments in MLflow so it is possible to conduct sufficient experiments working with MLflow with a very small amount of code. Files shared by each node such as experimental data and training data are placed on the file server and downloaded to each port using FTP when the effort for model training is started. The file server is also used as a storage for experimental data that MLflow refers to. By establishing such an environment in our laboratory we are able to easily conduct various experiments in the research process in a unified model. It is also very useful for sharing the results of experiments. Comparing models using Pycaret can be done with just a few written programs. This graph shows the scatter of errors in each model when training data for CPU utilization and when temperature, static pressure difference and power consumption for each server. In this experiment models are created for each server's data so the number of models that must be created is large and requires the use of multiple computers. Therefore it is necessary to consider things that need not be considered when the experiment is done on a single computer such as aggregation of results. Even in such cases Kubernetes, Pycaret and MLflow can be used to conduct experiments without having to think about complexities. Next professor will talk about how we use machine learning model to reduce energy consumption in data centers. Okay thank you Sogawa Kun. I think we run out a little bit of time but I will take another couple minutes to go to FAQ time to go to a couple more slides. So I want to jump to how we built the workload allocations. So we extend the Kubernetes scheduler to both scheduler and we extend the low balancer to low balancers. So in this sense the power consumption can be controlled in the priority and also the responding time can be controlled in advance. So we can also be balanced between the power consumption and the responding time. So this is how we actually use the scoring function of the cube scheduler to extend our wall scheduler and we use the metal LLB to extend as our wall LLB. So this is how it's collecting the data and the built-in machine learning in the runtime. So we have set up the experiments as the demo. So in this experiments it's based on the data center with 200 servers and it is for using object detection applications. So we combined our wall scheduler and wall LB together with comparison with the default cube metal LB and cube schedulers. So the result shows the cube scheduler reduced about 13 percent of the base result shows reduced 13 percent of power consumption by comparing to the default cube. Cube burning is the scheduling result. So in this talk we talked about the challenge of the increasing power consumption of cloud edge computing system and we proposed the energy aware data center operating system. So the best result from our system by using the object detecting example is we successfully reduced 13 percent of the power consumption. So the last but not least environmental susceptibility is just starting to gain a lot of tension inside the CNCF community. Reducing the carbon hydrant is a social responsibility that everyone needs to be involved. So that is what we are here for and we will continue to work closely with the CNCF community to achieve the final product of this EA DLS. So finally I would like to give a special thank you for our project sponsor NATO for the great support of this research project. Okay. Thank you for joining us today. So if I'd like to take any questions from here. Thank you very much. Any questions? Thank you for your presentation. I have questions about this researcher. I understand this prediction model is kind of based on two death centers, the Saitama data center and the other sorry I forgot the name but how do you think your prediction model or this kind of wow WA can override to another data center? Excuse me. You say how do WAO is appropriate? Oh okay. So as you know we focused on the data center, two data centers in different structures. So this is the first step and even though different data center it's the power consumption is all from the server, air conditioner, and surrounding area. So we focusing on the both server part and air condition part so that we can ensure our power consumption can work in different environment. And as I knew my thinking about sometimes in most of the time we cannot control the air conditioner in the most data center. This is the ISP's responsibility but this project is dedicated for different so even though you are using a housing or co-location you can still control your own server. Even though you cannot control the air conditioner but use can still detect the surrounding ambient temperature of the server and that's why you can build ultimate your services. But if you are the service provider and you also run the applications like Google, Amazon that kind of a big player then you can reduce more power consumption. That's why the concept is saying like in the public cloud service has can greatly reduce much more than the private data center. Okay thank you. Thank you for answering my question. Thank you very much. Any questions? So I have a question. Yes. If you optimize the work load. Work load. Work load. Yes. You need to know the structure of work load. So yes. So how do you know the structure of work load? Okay this is actually a very question. So the work load actually is the application from the user request and in the past we started from very beginning. We used the stress command which concerns the stress command is the heavy duty task which concerns all the CPU usage and so you're right. We needed to know the application first and we needed to manoeuvre the different application in advance but I think some application for example is like a business oriented content management. They have a similar behavior. So that's why we focus on building many different machine models. In the future we want to have more advanced way to build machine learning models based on different category of applications, different process of the software applications. So even though there are a lot of different applications but I think we believe there still can be catalyzed. So that's why we can build different category of machine learning models and our focus will try to build machine learning models as quick as accurate as possible. Okay thank you. Thank you. Okay any other questions? Okay thank you very much. Thank you very much.