 Thank you everyone for being here on a Friday, very, very late, on a Friday on QCON. So my name is William Caban, and this is Federico Rossi. We are some telco guys working at REHA. So why does two telco guys are working on sustainability? Well, it happened that the telco itself, the telco industry, represents about 2% of the total humanity consumption of energy. So, yes, we should be doing something. And this is part of an integration work that we created. And what's in the name is following. So SMART, because we want to make sure that we can do far more than just traditional automations, we want to be able to use machine learning models and apply AI to achieve certain goals. In this particular case, we are talking about the goal of green computing and following cloud native principles. And it should work for experimentation and the same sexual work for operations. So that's really what this means. Now, some concept here, goal-driven workload management. Let's start with that. And the idea here is that we don't want just to go and have another specialized orchestrator. We want to be able to define goals. Today, we're talking about, for example, the goal of, okay, let's do a certain amount, a maximum certain amount of CO2 on cluster upper node, a material of this source that that particular node or cluster can be being powered by. So you see there, for example, on carbon, petroleum, and the others, each one of them has certain penalty on the CO2 emission. But when we talk about a goal-driven, basically orchestration, so for example, the traditional way here, everything will pretty much be spread across and our CO2 will be off. Now, if I set up a goal, when I go back to the goal-driven workload management, when we set up a goal, what's going to happen is that it will really focus on achieving that, even when there are compute resources available. See, if, let's say in this case, the future workload cannot, for example, that has five units of penalty, and there's no way, even when there's compute, there's no way to put that there or make it run without breaking the goal. So that's part of the idea behind the goal-driven workload. And the same applies to any other goal, which could be like, okay, I want to run this on more power-efficient nodes or run this to reduce my energy consumption or to have higher availability or lower latency to our customers. So that's the idea here, and it applies the same for node or clusters. Another concept is the idea of domain-aware workload management. And these two need to be together because not all the workloads can be treated the same. So understanding the goal, we can also map that to the realities of that type of workload. And again, being from telco world, that means we can relate to, for example, what happened, that, let's say during HubCon, it's a big conference. From what we can see here during any other time of every night, there would not be too much activity. So basically, you can really deem down or basically remove workload that will help, for example, on the caching or things like that, on the telco side. Now, once it starts detecting that there's people coming, then it can go and increase those. So all of that knowledge on how to do it is very specific to that domain. The same thing will apply to training or other specialized workload. So with that, so, okay, let's make sure that we can do an integration that is composable, that it works for experimentation and the operations team. These are problems that have been solved already. So we start very basic with adopting the GitOps model because that give us basically a declarative way and a version way of rebuilding anything. So that means that if my infrastructure here or platform highlighted there by letter E is destroyed, I still have a way to recover that and redo exactly to the point right before it was destroyed without having to worry about other mechanism or traditional way of doing this. The other part of this is each one of these blocks here can scale independently. So even when they have certain relationship, each one can scale as needed by our organization. So that means I can do this from a centralized location or I can have this highly distributed. Again, it's based on what the organization will meet. The same will happen, for example, in F where we are collecting all the metrics and logs. Yeah, on certain organization we can centralize those on certain metrics. We can do that as well. But for example, again, on our wall of the telco world, we have situations where just a single device, for example, antenna can be generated around 10 gigabits per second of metrics. Imagine now having 100 of those in a small city like this. How much bandwidth you will need to have if you want to move that outside the area. So that's why it needs to be able to have those blocks in a way that can be scale and distributed whenever needed. The other part of this is that, okay, so we have the metrics that we can gather from infrastructure. We can gather from the platform and the application. But if we want to do, in this case, the goal of, okay, the CO2 emissions, something that happened with the CO2 emissions is that during the day, they will vary from your power provider. So during the day, if they are using renewable energies, they can be using more solar or they have wind, wind. But that will be changing every single day of the week and every single time. So that means we need to bring those as an additional metadata and metrics to enhance those values and other metrics that we already have in the system. So that's where the calculated or derived one will come in. There's another notion, for example, outside there's equipment, there's switches and routers. So we also want to account for their consumption, power consumption and how that impacts, for example, when we communicate two services in the same location. So when I count for that as part of the CO2 emissions that that workload is generating or that service. From there, we have to evolve into, okay, so GitHub is great, you have a declarative mode, but at scale maintaining those declarations, it will be quite painful. Think of the declaration of all these YAML files. So we need an easier way for, let's say, a secluded or the business owner to declare what they really want. And that's where the intent driven identification of those constraints come into the table. Now, everything else, we know how to do it. That part, for now, we're going to integrate it here. When we bring the actual, basically all the MLops area or stack in this case, that also have the governance and the ability to do experimentation with actual data. But remember, since we have to design this in a way that it works for centralized or highly distributed environment, that means that part is also something that can go centralized or can be highly distributed. So that, for example, if I go back to my previous example of when everyone is here in KubeCon, so those models, if I want to really be accurate, I have to run those models in this area. I cannot try to centralize that because it's just too much data to move. And by the time I'm finishing moving it, it is useless because it's a time-bound value metrics. So in this case, the intent driven constraints basically will work at that layer. And now we have basically a goal-driven versioning system that we can do all these operations. So at the end, that's the full stack. Now, there are some initial use cases that we were looking to tackle with this. There are some that are completely on the telcosites, but some that are applicable to everyone, for example, here. The maximum CO2 emission per node or per cluster that everyone can relate to. So that's the one that we're going to be talking today. Federico will be doing that part. But let me, for example, explain the others. You see that auto-terrain of services. So a challenge that it will happen to any organization doing user-facing type of services at scale, which can be streaming services, it can be any of those, is that when we start and we have very few subscribers, for example, it makes a ton of sense to have that on a public cloud. Now, if you're doing streaming service and you start having a lot of subscribers, just the egress cost will affect your financial models. So that's when you want only under those circumstances to bring that on premise. And now think having that stack where we can correlate and identify all these patterns. So now that identify the patterns based on the financial model, this time I can maintain my revenue or the profit per se. Then I can do this auto-terrain of the services and bring services as needed and bring them back out again based on what it makes sense at that moment for the financial model. The last one I want to highlight here is the smart maintenance. And what happened is that up to now a lot of the industry still consider, okay, we'll have maintenance windows. And it might work on a lot of the industry, commercial industry, but that's not working, for example, for the world that I have to work with. This is the telco. A regular deployment, for example, of 5G antennas, that can range from 50,000 to 100,000 of those. Each one will have one to 10 or more servers directly tied plus everything else. So four hours maintenance windows to just be able to serve 100,000 locations and catch up with Kubernetes release schedule will not cut it. It doesn't matter how many years I have. So we have to be more intelligent. And that's where this smart maintenance operations comes in. Because if we can detect, for example, there's a lot of people this week here, so I will not touch anything. I see there's less usage next week. So I can go and increase the coverage by software on some antennas and remove one service, do the maintenance and continue. So I don't need a human doing that process. And that way we can be really progressive in adopting all these new versions. So for that POC, I will let you know. Thank you. All right, so we've been working for this proof of concept to demonstrate this goal-driven aware scheduling. We have three nodes on which they have different power source. Of course, we're just simulating them. One for call, petroleum and natural gas. And as William was explaining, every single power source comes with a penalty. There are APIs that you can access to find out exactly when you have a server on a data center or even at home, depending where the area where you are. You can find out if the electricity, the power that you receive, is powered by call, petroleum or natural gas. So the goal of the proof of concept is to demonstrate that we can move a workload on the nodes, on the cluster, and keeping a certain amount of CO2 emissions. How do we achieve that? Well, this is the high-level solution architecture. There are quite a lot of components involved. This is the high-level, and we're going to drill down on every single one. At a higher level, we have a governance that uses a policy engine. And the reason why there is this governance is because we want to enforce every workload that runs on the cluster to use this optimization for the CO2 emissions. And that means as a cluster administrator, I can enforce everybody that deploy on my cluster all the energy efficiency policy that will be applied. Now, you have your manifest with the deployment pod or whatever. When it goes to the QBAPI, when it gets to the scheduler, this scheduling part, that's where we have the intelligence. And we will drill down on all the specific components. But the magic happens thanks to another policy system, a policy that controls how the scheduler decides where the workload needs to be scheduled, to be placed on the nodes. And next, a metric pipeline. As William was mentioning, we need to feed the system with metrics. Without the data, we don't have visibility of what's happening. So let's move to the next slide. All right, let's start with the metrics pipelines. There is quite a lot going on. So I know the level we're using the following components, okay? Kepler, that is an efficient power level exporter. They use EBPF to get power and metrics consumption for individual pods that runs on the cluster. And all the metrics are exported for Prometheus. Then we have Telegraph. And the reason why we have Telegraph is because if you're using the BMC with iLow or iRack or whatever, there is an SMP interface, and you can get measurement about power usage on the host also from the BMC. In addition, you could have a collectD, that collects other metrics as well. So all those metrics are exposed using exporter. And since we're talking about Kubernetes right here, we have the Prometheus operator on which you can use the service monitor. And all the service monitor does, it creates a job on Prometheus that does the scraping of the metrics. And so we start to fill in up our Prometheus with all this data. Now, the next component is the analytics AIML engine. So we take all those metrics, we combine them, we add the penalty that we discussed before, and we feed it to a model. They use, in this case, the XG boost. It makes the machine learning model. It spit out an output. That values for us are the new metrics that are called smart ops metrics. Then what's going to happen, we export also those other metrics for Prometheus. So we have another service monitor, and then the metrics goes back on Prometheus. So you see the flow. We start from the node, we collect metrics, goes on Prometheus, we pull it from Prometheus, we process it, we run it with a machine learning model, we spit it out, we get it back on Prometheus. Once we're on Prometheus at this point, somehow we need to expose those metrics for consumption for the scheduler, to do this intelligent scheduling that we're talking about. So in order to achieve that, we have a component called metrics proxy that all it is is just a simple Python flask that it does a query on the backend. It uses API service. If you never use the CR for API service, remember in Kubernetes everything has an API, and with API service you can expose metrics. So this is the metric pipeline. All of this to just expose back to the scheduler something that is called a Kubernetes node metric. So why is a node metric? Because the scheduler makes decisions where to schedule based on nodes. So it's filtered on nodes. So they need to be presented to the scheduler as a Kubernetes node metric. All right, now we get into the interesting part. The governance and scheduling that I was talking about before. So we drill down on our metrics pipeline, right here you can see in yellow. Now let's see what's happening to details as far as the scheduling and the policy. So when you deploy and manifest on the cluster, where is Kivierno? If you don't know what it is, look at it. It's awesome. There was a talk also yesterday about it. Using Kivierno we have a lot of granularity of how you can manipulate in CR to admission, generation, mutation, and so on. So we use Kivierno to pretty much every pod deployment stateful set that gets into the cluster. It gets injected. We have the spec to use the custom scheduler. Okay. Now let's get a little bit on the scheduling because you know we have our Kubernetes scheduler, right? But you can see right here we have something called secondary scheduler. The reason is we did this proof of concept on OpenShift and let's say that you don't have much flexibility if you want to make changes to the scheduler on OpenShift. You're not going to be able to do it. We provide some profiles but you cannot make changes at the Kube scheduler configuration. So we have a component that is no operator that is called a secondary scheduler operator that only does it's running another Kubernetes scheduler. Just the standard Kubernetes scheduler. The one that has the intelligence and actually does all the decision and the policy processing is a component called Intel TAS. Intel TAS is a telemetry or scheduler. So we use a policy-driven decision for scheduling and the scheduling on nodes. So let's look at this flow. We deploy the manifest. It goes to the secondary scheduler because we injected the information from the scheduler and Intel TAS is implemented as an extender. So when the scheduler gets the message you can see that there is the extender configuration on the Kube scheduler configuration so it triggers a filter request to the TAS. So the TAS is always reconciling policy. Those policies are based on strategies and you can define several strategies. For example, schedule strategies that says, okay, if the CO2 emission, it's less than 100 units, schedule allowed to schedule on the node. Otherwise, if it's greater than 100, the schedule. We'll cover the schedule part because the scheduler, of course, it's not capable of the scheduling. There is another component here that is called the scheduler. And the way it works very quickly is pretty much when the policy is in violation, right? It adds a label to the node that it gets an affinity so the scheduler sees there is any affinity on the node and what it does is it evicts the pod that is gonna get the schedule, right? And the other one is don't schedule. So don't schedule if the CO2 emission is greater than 100 units, okay? So as you can see, you have a lot of flexibility and control of what you can do with this. And look, we were talking about CO2 emission, all right? By using this architecture, think about, I don't know, any kind of metrics. It doesn't need to be the CO2. It could be CPU memory. It could be query an API for the weather in that area. I don't know. You have a multicluster and you wanna distribute the workloads based on the weather. In some area, the temperature is higher. That means the power implant are working harder to provide electricity or even if it's too colder, actually. And so you wanna do this kind of things. But not only that, since now, the scheduler on Kubernetes get the intelligence to make decisions based on metrics, this open a world of opportunities. Your application, your workload or web application, or whatever you're running on the cluster could become aware of, you know, condition externally to the cluster. So you could get an holistic view of the network and your application knows what's happening around and in the network. And you can make schedule decisions based on that. So if you ask me, as a telco guy, I see this extremely powerful because in telco, we talk a lot about network slicing, so guarantee the SLAs or some throughput or bandwidth and all of that. Right now, you cannot do it in Kubernetes. How do you guarantee the SLA? Yes? Okay. On the pod, request, limit, whatever, you can do some QoS, but once we get into the networking stuff or, you know, other things, you don't have that capabilities. So by using this stack, it opens a world of opportunity of things that you can do. Now, let's see if we have a little bit more about the policy right here. There is another strategy for the policy that is called labeling. So if something is on violation, you can configure some custom labeling for the node, and then you can decide what to do for the affinity for different rules. So let's go again from the flow. Manifest, you deploy. It uses the secondary scheduler. Secondary scheduler, send the filter request to the task. The task is already reconciling this policy that you use specific metrics. And by the way, here you see just a simple if metric, gritter, or less than. You have much more flexibility. You can do if and or you can do combinations. So it doesn't have to be only one metric. You can combine the metrics, right? And then using this information of the metric, the task send the API request to the custom metrics API that we expose with that service called the metrics proxy. And it gets the information. And at this point, we know that the node, for example, node four as a 37.75 unit. So it's not evaluating the policy. And we will tell the scheduler that, yes, you're good. You can deploy that workload on node four. Simple, right? Now, we created a little dashboard just to demonstrate this. So we have the three nodes, the different power source. And then we have a meter bar for the CO2 emissions. And what you see below here, it's a node 234. And those boxes is for the pod placement. So when there is the green box, that means the pod has been placed there. So we generate a bunch of loads to simulate more usage and more energy usage. And what's going to happen, our goal is to make sure that it stays below 220 units per node, right? So as you can see on the top screenshot, the node W3 in this case, node three, is in violation. So it has no workload, nothing that runs on that. But as soon as the node one, where we have our workload running, it gets some violation of the policy. So the CO2 emission go over the value that we define. At the point it gets rescheduled in this case on node four, that is not on violation of the policy. So this is for CO2. But again, the solution architecture that you saw and the components put it all together, it opens a lot of possibility of things that you can do. And this is thanks a lot to the Intel task that makes the scheduler intelligent and metrics aware on scheduling and the scheduling decision. So this is it. Thank you, everyone. Let's go ahead and take some... I'm going to do some finish. So some of the projects that were used here, so Kepler, so that's the URL, as Francisco just mentioned, it's a really cool project that allows us to measure or on per pot energy on millijoules. There's a PLC that we also did with KEDA, with the KEDA team upstream, Microsoft, and the Red City Office. And that used CO2 to do scaling. So it's a CO2-aware scaling. Intel tasks, that's the link. The data science hub there is based on open data hub. So as you can see, all of these are elements that already exist, so we just are using them in an integrated fashion. And as the next evolution of this, this project that precisely was announced by Intel last week, and they have been showing this week, basically that's part of what we intend to work on a future iteration of this. So that's it. So any questions you want to provide feedback? Here is your question. Thank you, everyone. Questions? Yes, please, go ahead. Oh, it's coming. Thank you, it's a very nice talk. I just have a question about the policies. If you set it to like these schedules, so you do some evictions of the pot, right? For the know that well later. So do you have considered like priorities to deschedule the pot or you treat every pot the same? Which pot are you going to deschedule? Yeah. Okay, so right now this, we are treating every single pot the same. So what we did is all the, all the workload where we were testing this because this is actually a multi tenant cluster. So we were only doing the calculation based on a single namespace and the pots there. So we did not account for the total across the whole node. Otherwise we'll have to deschedule far more than our pot in that cluster. And the task policy are namespace scope. So we will need a key here, no policy that every time and maybe a developer creates a project on the cluster. It inject the task policy on that namespace. And then at the point is going to apply the schedule and the schedule and decision based on that policy. Okay, thank you. Does it answer your question? Yeah. All right. Any other questions? Yes, I can tell you for example, I know that AWS does have a tool for that. I know that GCP is also providing something and Azure. Outside those three, I don't know. And the level of granularity on each one is very dependent on the cloud provided. There are some that is really about what happened three months ago, not what's happening now. And there are others that are far more accurate. Actually that lady over there, Niki, she's far more familiar with the APIs for this on the cloud service provider. Got two minutes left. Any other questions? And if anyone really is interested on anything related to green computing, I don't know if you're aware that CNCF just created and approved the environmental sustainability tag. And we will have several working groups there. Hopefully one of the working groups will be on the technical side. Some of them will be on how do we fix the APIs once and for all. So because we need accurate data and that is really, really hard. And the places where it is available is usually behind a paywall. So we need a better way to do this. So that's it. Thank you very much. Thank you everyone.