 Thank you. Thank you here for the introduction and it's my pleasure to meet you all and let's just celebrate for this afternoon and with some of the good ideas and Instructions of what we have done for sustainability and for the research community So if you I just want to put a certain words and certain Criteria's of what we do and what we see in cloud native sustainability We're probably kind of using a few characters Some colors to paint the picture so I call it's MC cube not to understand nothing to confuse with MC square So for M. I mean the metrics so as we are many of you have noted for cloud native It's observability that's given us a lot of power for scheduling for scale for scaling and for a lot of our management of cost and Availability so having enough metric metrics We'll give you the insights into what we what we can do. So the metrics we build is for energy conservation So specifically we want to know how much energy consumed by a workloads Incluing in Kubernetes that means the part level how much energy used by the node and by the parts The second is when we call a correlation So having these metrics makes sense to the end user but does not make use for the end user in terms of Doing the energy conservation. So when you schedule workloads, how do you correlate with the metrics with the hardware? Characteristics such that the scheduling of the workloads will be optimized for energy consumption So after the scheduling the workloads landed on the nose is keep running for days and weeks and even for months If that is of the original assumption about the resource consumption resource assignments goes wrong How do you correct this runtime behavior? So we call this as a correction phase in this phase We use an auto scaling capabilities in the Kubernetes community as very last one once you go beyond a single cluster You want to take into account of the carbon Deltas around the world between different clusters. That's as well The multi clusters use case comes in and we want to account for the carbon intensity differences in between multiple clusters and find the best the destinations for your workloads to land on Let's come down to the projects we are trying to The problem with our projects is trying to address the projects in the problems in the cloud native Power management is that's a first of all you want to have accuracy accuracy means there are metrics We presented to you about the parts about the containers as accurate as we can see everybody will agree on the methodology I will put everybody will agree on the end results. That's the way we want to achieve The second level is the fairness as you all know that's a having something Accountable on a shared operating system. It's not so easy Right, so it's just like the same way you live in the condominiums. You pay the management fee You have to account for every of the usage by the per square feet by each of the tenants But they were shared areas in the lobby is the swimming pools. How do you account for this a shared area? So this time the fairness we want to take into account is them in the methodology So when we present the numbers, they are representing the representation We reflect them the ideas and the fairness the accuracy We want to tell the end customer the very last one is the multiple Granities are you want to present the information not every now for developers who are interested in developing Wantful applications they were focused on the process right the running process will tell lots of story about how Efficiency the application is but for the containers we are talking about the container level the power level and for deployment level There's a multiple layers of aggregations and for end users. Maybe like our system and it means they worry about the tenants The names left space level. How do you get the information from all these levels to all different personas? So we want to have all these things gets you into account in the projects when we build up the projects So for capital projects if you have already seen the projects on the names actually stands for the cap Acuberance efficiency power level exporter. It's makes sense to us as you know The NASA telescope capital is give you the insights into the Universe we want to have the insights into the universe of the Kubernetes community So the capital has three goals in mind one is the reporting as I said you want to have the capabilities to tell the Customers carry any users developers how much energy used by the parts on different levels of you know, the electronics the components CPU DRAM GPU and eventually going beyond that level to FPA and almost like a mass working it could be the next one so we also want to use the capabilities of running capital across the across the hybrid cloud environments you want We wanted to run on bare metal machines or private clouds on your data center And we also want the capital to run on hyperscalers on you see to you know, Azure and GCP on these environments most of the The runtime computation form is virtual machines So we want the model to be a port as a portable as across the different Suspections and we also support the prometheus and that's the you know, if you are going to the key nodes Prometheus becomes a loss of a loss of a you know gets a loss of love by the community So we are not supposed to promise a level X 14 If you are getting all these resources and we want to get all the details it seems like this is a very heavy task So we want to reduce the footprint of capital projects itself We do not want to consume too much resources too much CPUs too much DRAMs as a matter of facts And you know to our observation we usually stay as 2% of the CPU level and the 14 megabytes of memory So it's very less weighted We do not want to introduce our own methodology I mean we do not want to invent something that's any customers have no visibility into so we are taking two accounts of the Scientific research over the last two decades. I put over the two core code here So if you have interest you can just take a look into the research papers What's the formulas is about I just give you one examples of one of the four linear regression formulas How to take into account of the CPU usage the performance counters and imagine the CPU The power consumption by the CPUs But this is not the formula we are using as a matter of fact if you are going to the next slide So this is our architecture. So we have three layers of actually have three columns So if you come from the very end of the on the left We have the epf that's where collects the information from the universe or from the kernel level the information includes the CPU instructions this are the cycles and CPU time and also process and container correlation And we pump these informations up to the user space the user space We're guessing yet another layers of information and I like the CPU power consumption that if you are running on x86 platforms you had a rappel runtime average power level that will tell you difference domains how much electricity and much how much power consumed by each of the CPU or DRAMs and on a different time and we also collect six stats see group stats from the C groups And we also collect information from Cubilies so we put all this information together and we use the machine learning models to predict how much power used by difference, you know These are Processes and containers and parts. So that's it's our methodology. So we are not dictated. We are not dictating the eventual Model we're less the customer to build a model on your sites So that's where we are true representation of how much power used by your environment by your workloads So this is the one of the dashboards we can we are using thanks to all the community contributors from Here and we are trying to give you some intuitions how much carbon footprint this your workload will generate and I was the variations over time your workloads on difference of the components on CPU on DRAM on GPU and we also keep your certain rankings as difference on namespace level With this matrix that's our unlimited of potentials how much information you can build up for your final Visualization and we also want to introduce yet another project by Chen He has built after this one for projects called clever using the matrix And before she comes in as I want to give the your community our shutouts So over the time this property a project has been contributed by different people representing from different companies And we really appreciate their helping shaping up the projects and giving us our feedbacks and even contributing code to the projects so because we have Capler in place we can have all types of observant abilities in terms of getting the energy Related metrics including energy consumptions for different components of your Workload right including CPU memory and you can also understand What is the current frequencies of your CPU and then whether you have tuned down the frequency to save energy? So here I just want to give an example on like how we can leverage Kepler to do some optimizations on your cluster so you can both reduce your energy consumptions as well as Guarantee your performance of the workload So the question we try to answer here is when the frequency tuning knobs There's a lot of frequency tuning knobs available in the Kubernetes community And then when you increase or decrease CPU frequencies, and then how you can guarantee the performance of the workload So before that let me introduce a little bit the background about the vertical part auto-scaler on Kubernetes so the the vertical part auto-scaler on Kubernetes includes three different controllers called recommender updater and Admission controller so what it does is it try to get some usage it usage data for your workload and then try to Understand do some analysis on the usage data and try to resize your part according to your actual usage And then all our work is built upon this framework and then early last year We have contributed some feature to this VPA on Kubernetes which allows you to customize any algorithms to use VPA by Changing the recommender by your own recommender. So based on this feature what we can do is a Clever recommender so the clever recommender is trying to guarantee the whole performance of the workload when you try to reduce energy consumptions of your whole cluster by Lowering your CPU frequencies and then How do we do that? So let's assume you have something like Tuner frequency tuner to update the CPU or GPU frequencies according to any target or metric or your energy consumption budget and then we will lower down the Frequencies intuitively that your performance of the workload will decrease and then what we can do is we can obtain like the status of the workload or the cluster state of the workload the CPU frequencies from Knows where the the frequency are changed based on Kepler exported Magics and in this way we know there's something change changes in your cluster or a node your workload would be impacted then how How can we guarantee the performance when the frequency is done your performance is done Especially when it's for the computational intensive workload Then the total instructions per second you can get will be done then the recommender what the recommender does is we develop a recommender called clever and It can trigger to resize your part to either increase or decrease Accordingly to your frequency change to guarantee you get the similar performance for your workload and then When you use the clever recommender working with all other controllers in the vertical part auto-scaler The VP update or an automation controller will go ahead change resize your part to guarantee you have the same performance So the idea is like like let's take the computational intensive workload for example and they usually the the most important metric you are watching is The instructions per second and if you look at IP is formula It's related to your frequency of the CPU your allocation of the resources on CPU and something like that the average number of circles per instruction and then if you want to target the particular IP as here and then you would have a default request on that assuming you have you have the CPU in Full the maximum frequency running in the full capacity and then when you tune down the frequencies And then you will derive a current IPS and what to the clever project is doing is trying to force the IPS target equals to IPS current and then you later find out it's not related to the number of average cycles per Instruction and the the the current request that you should tune to it's only related to the maximum frequency your current frequency and your default request. So this is how clever is deriving the Targeted request value for your part to guarantee the same performance And then next I will show a simple demo on how we deploy capital in a cluster What's the capital how we deploy the capital dashboard and then we will go ahead deploy Kubernetes default vertical part auto scaler and Deploy the clever recommender run the testing workload using the default VPA together with the clever recommender to ensure the performance of the part and You can scan the barcode of those two to get the more detailed information of Kepler and clever So now we are in a Kubernetes cluster with one node and then we already pre-installed all the Premises and Grafana support What we are going to do is we first go to the Kepler project We clone the So there are a lot of documentation on how to use it. Basically, it's just one clay to install and we clone the ripple with what we are going to do next is we will Check out a stable release for Kepler project let's pick up the release 0.3 and At the same time, let's forward the part of Do the cube forward a powerful word to get to the permutus UI and Grafana UI to local host and Now we are in the permissions UI. We just Try one of the default metric To see if it's working The way we deploy Kepler is very simple. We have a YAML already available under the manifest for both Kubernetes and OpenShift and then the YAML includes all the necessary RBAC rows and service account to to create for Kepler deployment and it includes the demon side for the Kepler exporter So we go ahead to create Kepler and deploy Kepler demon side for all the nodes in the cluster and You can also check If the Kepler demon side is running Okay, now it's running Just for For exporting all the metrics to permutus we go ahead to create a Customer resource called service monitor to actually expose the Kepler endpoint to script the Kepler endpoint for all the data and That YAML also include all the RBAC rows. We need access to Permutus and you can configure the interval of the scraping In the service monitor CR as well We can double check if the service monitor resources is successfully created under the Kepler namespace and You can also check from the permutus AI whether the Kepler service Monitor endpoint is available. It takes some time for it to be up and The other way you can also check this Kepler service endpoint to get Whether it's on or not and the port number is 9102 Now we can see all the permutus is starting scraping the data from the Kepler endpoint Let's go and load the Kepler exporter dashboard and the dashboard is also available in the gate There is a folder called a Grafana dashboard. You can import it in the dashboard or you can quantify you can see the like the equivalent CO2 amount and patrolling amount of Energy consumptions for a particular part and then it also shows the energy consumptions for different components over time and then it also Accumulated the energy consumptions for different namespace of your workload So the next step is let's go ahead install the default Kubernetes vertical part auto-scaler It's just two steps. The first one is we want to clone the auto-scaler repo Okay, make sure you CD into the folder of vertical part auto-scaler and then what it takes is Using their default script which is hack VPA up then you install all the controllers for the VPA and The support of alternative Recommenders for VPA is already upstream. So anyone using the default VPA controllers can develop your own recommender can use your own recommender together with the existing VPA Okay, now all the three VPA controllers are running Then then we want to clone the project of clever clever is just rating a simple code VPA recommender rating in Python and It's less than 200 lines of code and then you can take a look and write your own recommender as well The way to deploy it is very simple. We already prepared the manifest file for that so let's go ahead clone the repo and The clever YAML includes all the necessary deployment cluster roles cluster of binding and service account and The only thing you want to configure is the involvement variable of permissive host and then let's use the default endpoint here So it will get all the necessary metrics from the from premises from also capital exporter Okay, then the clever recommender is running and Let's double-check it if it's running the same namespace as the VPA controllers Just for debugging purpose. We here open another window to show all the log messages from clever Okay, so the next step we are going to test is we want to create the workload using the all the VPA and using clever as the customized the recommender for the VPA and for any Workload so now if we go back to see the the clever dashboard and in the clever dashboard we will show like Something like what is the current frequencies of the node the maximum minimum frequencies of the node and this dashboard is also available on the clever project Ripple and you see currently we have one node the maximum frequencies is four gigahertz and the current frequency Is almost at four and the minimum frequency frequency of the CPU you can tune to is one gigahertz then let's go ahead to take a look at Sample six bunch workload we want to test it's just creating a CPU intensive workload taking the whole Try to taking the whole CPUs, but we set the request and limit for the CPU as 250 millicores So it can only use you use one quarter of the CPU and then if we go ahead created the workload We see by the way in order to use the the VPA you probably want to have more than two replicas To be running at the same time. So you don't lose the high availability of your workload and if we refresh and we the data will pop up in a minute And the second window shows the the current CPU request of the workload The third one shows the instructions per second performance metric for the workload By the way, the IPS metric is also available in Kepler So if you install Kepler, you can not only see the energy consumption metrics, but all types of performance metrics for your For your workload it is using ebp episode is very lightweight a So what we are going to do next is we will run a simple script trying to Tune the frequencies on the note to like Two gigahertz right now. It's almost a four gigahertz. We want to tune it down to the two gigahertz. So we save Like 30 to 40 percent of the energy for the note, okay, then let's watch the recommended value for the VPA we created by the way when you want to use the alternative recommender For your workload in the VP object, you can specify The name of your recommender by using recommender's keyword in your VPA YAML Now the the default Value is stood 250 let's wait a moment for Kepler to get the actual frequencies of the Now the extra frequencies are already Drops right you see the the top window the frequency drops from almost four to two gigahertz And then at the same time the IP is drops for the workload and It will take a while for the VPA to kick in and There's some log messages popping up the last message the the recommender shows is still taking Four gigahertz of the okay Now the VPA already recommend the five hundred mili cores for the workload and and We see the request value. So actually I cut off a little bit because it takes a long time like one minute for the for the power to be resized and Let's Go back a little bit to see it here You see so So actually when we tune to the frequency down The IPS drops for well, but to the VP and clever controllers kicks in to Increase the allocation of the CPU and equivalently we will keep the same IPS over time This is how clever Utilizing the Kepler metrics to Guarantee you have the same performance as well as saving some energy Okay, so looking forward we will try to enhance the whole Kepler project with some carbon footprint output we will increase Try to increase the model to be available for both physical machines and VMs and So I mean we'll handle the other Projects that are going on. Thank you. So this is my observation. So We have seen a loss of one for scaling mechanisms for performance for resource And but we have never seen something to scale for the purpose of energy conservation without disrupting your service I think this is one of the greatest achievements of what is integration. Thank you for this huge contribution So looking forward we believe capital and integrated with other ecosystems Without a matrix we can make a bigger things happen. So Going back as we said earlier we want to address a certain things in sustainability Carbon energy way water and waste. So these are the topics we believe we can make a contributions as a Team as a community. I'm not going to dictate. What is the most efficient ways? What's the best solutions? But I want to keep the questions open so for the whole community we can build up the better solutions for the better tomorrow and I want to also want to give a shout-out. So because of during the process of preparing for Cubecon and for other community-based developments, we have have a lot of inputs from The communities like a kilo. We are discussing bootstrapping mechanisms to auto scaling Based on carbon intensity and we potentially can also using the same mechanisms build up even high impacts Auto scaling mechanisms for service workloads for auto scaling So we'll take both energy and carbon into accounts so we can reduce the energy and carbon footprints with high impact workloads And that's all we have for this presentation Thank you for attending. We are like to open So thank you for the presentation and any question we do have a time for two question Yeah, please just raise your hand and I will give you the mic Thanks. This talk was great and I really love to see where this work is going Just a question on how you're sort of Thinking about energy beyond just the CPU. So like obviously, you know, if you're using a cloud provider They have to power, you know, everything in the box, right? Yeah, in some ways it can make more sense to Bin pack denser nodes because you amortize the cost of things like next Over sort of more containers, right? So how do you sort of juggle that? Yeah, so if I understand correctly, so the one question is about the The components like the CPU and the other one is like the type of workloads So just let me ask you the first question and then you can help us help me with the second question about the type of workloads So I think that you are definitely right. We are talking to CPU a lot In this presentation one of the reason is that the CPU actually accounts for 60% of the whole energy consumption of the server I mean relatively speaking because if you have a huge amounts of RAM that could the number could be a vary But a CPU is that place around that those things the other workloads is GPU We also have the GPU capabilities the metrics we can report the challenging with GPU is that's a The frequency and eventually QoS are hard to define like if CPU can define this Instruction by cycle GPU You probably have different metrics is very hard to define if we know what's the metrics we can use We can build that into a yes another exporter and the capital could provide the energy information And you probably can provide this usability metrics and we can correlate with each other and you can build up Yes, another VPA based on the new metrics and for the type of workload Now she knows much better about scheduling and scaling than I do Yes, definitely we were thinking about like why don't we just to pike all the workload on Smaller number of nodes so we can turn other nodes off to overall save energy consumptions But that doesn't work for all types of workload for certain workload like especially like a telco workload some way in the community might answer it better, but you need to keep the machine on all the time But the workload get is really fluctuating a lot And then you cannot really move because you have to have a location that there is a server there Doing the job and then in this case if we can tune down the frequency and then save overall save a lot of energy As well as guarantee the performance of the workload That helps a lot. So not all the workload can be packed or migrated in the data center case Yes, it is but in a lot of other cases is not Any other question one more? Okay, this gentleman Thank you for the awesome shirt. So I have a question regarding the Carver project. So after I change the CPU Frequency you really how long we take effect to update the resource. Yeah, that's a very good question so it takes a while in our current setting because we can look like a checking the Frequency changes only every minute to reduce the resource consumption of clever and If you want you can just increase a little bit the frequency you can tune it to 10 seconds 5 seconds whatever As I didn't need it because in our case, it's like the frequency We don't to fix it that frequently. So one minute interval would be fine, but it's configurable Okay, thank you Thank you for everyone and lots of aim for the first session in the research and academic of track 20 minutes after we will stay here for the second presentation. Thank you. Okay. Thank you