 Hello, I'm about to start. We are on time already. Okay, so I'm Marcelo Amaral from IBM Research Tokyo, and I'm going to present today this talk about measuring the energy consumption of applications on the cloud. I'm going to introduce our open source solution that it's been recently become a CNCF Sandbox project. Just a very quick overview, start with motivation, going to the details about the Kepler, that's the open source project, describe the architecture and how everything works. Then I have an example running an AI workload and showing the energy consumption and the workload will be there on our Perth. Okay, so the motivations is, I think probably many people have already been looking at this, motivations is the climate change, it's becoming something important for nowadays. We have this all the chain that impacts, which is the human activity using more energy, also all the CO2 that is produced by industry and users. It's increasing the greenhouse gas emissions, which actually impacts the global warming, and in the end it impacts the environment, but also impacts the economy, and it's becoming something important to be discussed and proved. So there are some ways to mitigate that. For example, reduce the greenhouse house emissions by minimizing productions, but also doing some actions to minimize the energy consumption and emissions. So also there is some importance to do some preparations for the companies and minimize also the risk about that. So IBM has done some, how the years do research survey to understand what are the requirements of the clients, and given that there is now some necessity that comes from governments and also clients. From the surveys from the previous years, sustainability has become something even more important. For example, in 2003, 40% of the G20, 2000 group, the companies that are inside this group is saying that sustainability is something important. Also, 6% of the companies on this group is saying that they want to implement sustainability actions in their company. And still like part of the motivation, it's well known that the data that it's been transferring, the internet, it's growing. So all the computation power, it's also growing for that, for the end users and also in the companies. Also, nowadays, we can see a big increase on more energy computation power, hungry application, which means also impacts on the energy consumption, like AI workloads, because they are very intensive. And we can see that something that it's important to know is that the NRD scaling, it's reaching the limit. It means the scaling that was for those ones that are not too familiar with NRD scaling, say the smaller transistors consumes less energy, but we are reaching the plot tool where we can go, so the energy consumption will not decrease for the next years with the current architecture that we have. So the energy is not becoming bottleneck to scale, but also something important to analyze. So given that recently, especially in the European Union, some governments are pushing for more efficient, energy efficient applications. So there is some requirements, again, especially here in the European Union, saying that companies that are using AI workloads, they should report their energy consumption. So given that, it's important to measure that. So what's how we can expose, you know, create the observability of the energy consumption of applications. And this talk is about that. So here's the problem. So how we measure energy consumption per containers in the cloud? So how much does applications are consuming, really consuming there, especially if we have a mix of applications running on the same note, how we partition this power consumption across the applications. And can we give this detail, aggregate the energy consumption of applications up to the users? So here where we can, the Kepler steps in. So Kepler is an open source project. It's created to measure the energy consumption of process that we can aggregate later to other levels, container level, pod levels, jobs, and then the user. And I'm going to detail more about that. So Kepler basically has recently become the CNCF sandbox project. So it's one of the official then Kubernetes project strategy to measure energy consumption. And there are many companies involved into that, Red Hat, IBM, Intel, and other companies as well. And it provides power models for bare metal nodes and power models for VMs. I'm going to detail better what I mean for that later. And so it's report the process container and pod energy consumption for the entire node and also when it's possible, break it down for different resource, CPU, DRUN, and GPU when it's available. And we support for different architecture, X86 and S390X and it's an open source project. So we envision to support for more architectures given the contribution for other companies as well. It has low overhead. It's written to has low overhead as possible. We also use some special tool that is called BPF for those that are not familiar with BPF. This is basically some, we can do some kernel extension in the kernel without need to recompile the kernel. So it's very flexible and very lightweight to collect metrics. So we are using that. And for the VM power models where there is no available real time power metrics from the node, we need to use estimation. So we use regression to create the power models. Here is the architecture the Kepler architecture. So I was mentioning we use the BPF to collect harder counters, CPU time and also network throughput with the soft RQ metrics. We create power models for VM, bare metal and VMs. We can see here on the bottom part that the node energy consumption when Kepler is running it can collect directly to the bare metal using available APIs for real time power metrics. When it's not available we need to use the training power model that will be pre-trained with data and we do regression to create the estimate the power curve consumption for the node. We collect metrics then we use the metrics per process the total energy consumption of the node or if we break down for resource for a given resource and then by using a power model that we call the ratio we calculate the energy consumption of containers. What I mean by the ratio is just the ratio. For example the CPU utilization of a process divided by the system utilization. So this ratio multiplied by the total energy consumption of the resource that we are analyzing for the node gives the energy consumption of the process. This is a very simple way say if a process is consuming 10% of the CPU utilization 10% of the energy consumption of the CPU is related to this process as simple as that. And then we have this other component that is called model server when we are going to do the model with regression for environments that doesn't have access to real time power metrics. So I just want to show here examples of different deployment, different architecture as I was mentioning. So we can have the first one is when we are ready in the bare metal. It's the best scenario. We have access to the power metrics and then we can accurately calculate the energy consumption of all the process and containers and so on. There is another scenario that currently it's what's available on public clouds where we have people are using VMs and no power metrics exposed to the VM. So in this case we need to use a pre-trained power model of course it has limitations when we are doing estimation and there is a third scenario which we envision that in the future especially because now governments and companies are requiring a more accurate energy consumption measurement from the infrastructure perspective maybe in the future cloud providers can use Kepler or something similar to expose this and will be very good once we reach that point which means we can run Kepler on the bare metal node in the cloud control plane calculate the energy consumption per VM and expose these metrics for each VM. It's not exposed node metrics that can be some security issues. It's exposed already divided energy consumption per VM. So VM can only know its own energy consumption which is not leaking information of other users. And by doing that we can then take these metrics inside the Kubernetes deployment on top of VMs and accurately calculate that just to make like why this is also important what's the limitation of power model there are some limitations of power model not cover everything here we can talk offline we are also writing a blog post in cncf blog to describe all the limitations that are there because everything needs to be very transparent people need to understand what they are using but just to quick comment here is the idle power so we can consider these two definitions the idle power when the machine is not the energy consumption on the machine without any load if there is nothing running the machine it has some constant idle power and then the dynamic one that is the increase on the energy consumption giving a load CPU load memory any resource the idle power for a fair division for that and especially for the green gas protocol that is defined the idle power should be divided across all the VMs all the process based on their size but some related works divide the idle power just evenly across different VMs regardless of this approach the idle power is something that needs to be divided across it's not related to resource utilization it's something constant that needs to be divided if we are on the public cloud inside one VM there is no way to know how many VMs are running on the node so it's impossible to divide the idle power so we can expose the dynamic power but the idle power on the public cloud we cannot do that because we don't have information it's misinformation but in the future if the cloud provider itself exposes information then we can accurately report everything so that's one of the motivations and again I'm saying public cloud but in private cloud in private deployment we can go for third architecture and deploy that so in public cloud again as I mentioned we don't have access to real time power metrics from CPU or D-RUN things like that but typically the GPUs are completely past through it's exposed so the GPU we can get the energy consumption from the driver and estimation per process we have power models for GPUs we get the energy consumption and also the GPU utilization per process for training use case normally a GPU is not shared but there are some use cases especially for inference new works that since inference workloads are not utilized in 100% of the GPUs it's possible to partition the GPU and then run different process and by use checking the utilization of different GPU Kepler we can also split the power consumption of the GPU across different process which it's becoming like when we use MIG or MLPASS tools to partition the GPU to share the GPU yeah so we want to use power estimation this is just to show here more or less in a high level view the breakdown of the energy consumption what's the CPU memory the GPU normally is the more power hungry resource and we also have other components in x86 we can get damage consumption from the CPU and D-RUN damage consumption of the entire node from the sensors in the motherboard and by doing that we can get damage consumption of the entire node and also per component that's what we are doing Kepler okay just to for the ones that are not familiar with MLPASS MLPASS is a benchmark tool that it's created to analyze machine learning workloads especially and have like for training and for inference different benchmarks so this presentation I'm going to show more information about the inference but the MLPASS also has so the industry and research entities are also using MLPASS it's become some defective benchmark to make comparisons so MLPASS interestingly has different workload generation so it can generate the load in different partners that is the offline which is just a batch send everything to the GPU and make it processing single stream it's kind of a queue model and the server is just changing how the data is being sent for example it's using a Poisson distribution and the mood stream scenario that it's mood stream it's kind of more multiplexing thing request that comes from different source like camera driving I put an example here so that has source from different place basically if we are thinking about for example nowadays shot GPT or IBM what's next that's it's a system that's receiving inference request it's more the server use case here so running Kepler with MLPASS so for everyone that want to try you can deploy Kepler using the Helm charts it's public available on our documentation but it's as easy as run everyone that's familiar with Helm just run that then include the Helm chart and then install Kepler optionally it's also possible to just do a git clone for our repository and create the manifest with make command manifest and then deploy it so both approaches fine we also have our operator to help to deploy Kepler we have the three different approach and then Kepler expose metrics so the power consumption metrics will be the container it's the lower one it will be the aggregated of everything but we can as I mentioned break down for different components you can have only the GPO D run package energy consumption for that and then we enable the observability to users so users can be aware of their energy consumption and then also there are some side projects in Kepler to apply optimization for a scheduling perspective for example when we do consolidation we try to be in package with all the workloads in the same node it's means it's more energy efficient because the power curve is different when they have more utilization of the node the energy consumption it's not completely linear isn't it so it's a little bit less energy consumption with more load it means as much as we try to maximize the load on the nodes we become more energy efficient so we can there are some rules to improvement through scheduling other approaches are changing the GPO frequency we have other projects for that but Kepler itself the main Kepler project is the observability so just very high level way to deploy ML perf so I didn't see like too many tutorials about that that's why I'm putting here so it might be interesting for people so if you are in the Kubernetes you create a persistent storage claim there and then so that we can deploy ML perf on that and using this pvc later we can just run workloads because why you need that just for efficient because ML perf is very heavy especially when it downloads the data so I don't remember now but it's many gigabytes so if you want to run it many times you just have a cache for that and then you run and here's just an example to fill the volume we can just run a simple job that downloads the ML perf and then if you go for example for inference and BERT for the BERT example ML perf has this documentation you just do make setup and it downloads everything so it's everything is very automated very easy to do once you have this it will fill the volume and then we can use that to run work so here is an example actually to run ML perf we have the pv the persistent volume that we filled before as I showed and here we can create a job requesting a GPU and when we run ML perf with GPU we need to expose the environment variable use GPU yes and then we can just run so ML perf has different back end for example PyTorch, TensorFlow Onyx depends on the workload they have different performance we have done tests it's very workload dependent the performance for different runtime so people need to test that and that's why it's good to do benchmarking here ok and and then we just run and it will compute so there are more parameters in the ML perf so please who is interested and not completely aware of that check the ML perf documentation here is just an example so in kepper we have agrafana dashboard ok so after running this is an experiment that we were changing the CPU frequency, change batch size things like that you see what's the impact in the energy consumption not going to go to it's not the goal of the application going in a lot of details on that it's just to show in high level view how things goes and and then kepper has as I mentioned agrafana dashboard it's exporting the energy consumption especially the GPU energy consumption that we are showing here and then for different execution so you can check users can be more aware of which parameters they change what's the impact in the energy consumption not only for performance that normally it's the goal just a very quick introduction here about the measurements of kepper if we get the measurements from the NVIDIA DCGM and then we get the power consumption from kepler for different systems running on bare metal running on top of VMs we see that the exported values are very similar very close just to show that we are reporting the right metrics we have done also other kind of tests to check the occurrence of power models using different metrics if we was on CPU CPU time or when we have access to hardware counters to the VMs the occurrence again so this is just a very high level overview when we are creating power models we also use different regression algorithms and check which one has better occurrence and this is the power model that will be actually exposed to the users and be used it's part of our research paper was published this year in the IEEE cloud so if someone wants to get more information about this analysis it's possible to check more details of the paper so we have actually five minutes for questions now I think I finish on time here any question any activity to expose similar type capability within the public cloud right now as I mentioned public cloud we need to use pre-trained power models for example if you go for Amazon Amazon has the VMs there but it's also possible to allocate the bare metal we need to allocate the bare metal to create the power model we are discussing with the community to try to do this for more machines to make available more power models and once we have pre-trained power models people can use these power models inside the VM so in the VM it's possible to see the CPU model so it's exposed once we know the CPU model we can just download the power model created for this kind of machine and use that but as I mentioned to you so power model has some limitations for example the idle power but we see that in the future especially if clients put more pressure to cloud providers that they want this information cloud providers might expose that any other question ok thank you very much for your attention