 Hello, and thanks for joining. So this talk is about, well, serverless and power. So we were kind of curious to see, is serverless actually powerless? Like, if you scale it down to zero, does it not use energy? So we're going to find out in this talk exactly how that works and how we came to some interesting conclusions. So my name is Kevin Dubois. I'm a developer advocate at Red Hat. I'm based in Belgium, so I didn't have to travel too far today. And I also speak some English, Dutch, French, and Italian. So if you have questions afterwards, I can talk to you in those languages. And I'm a really big fan of open source, of course. Yeah. Thank you, Kevin. And I'm Jose, Gomez said yes. I work as a product manager at Red Hat, mostly in observability for products like open telemetry, distributed tracing, and Kepler. We will be discussing a bit on those, some of them. I'm based in Madrid, in Spain. And despite what the White House is saying these days, I love programming in C++. And also, I've been playing heavy metal for quite a long. I should be better, actually. And simracing. So if you want to talk about those things afterwards, I'd be very happy to. So I did a PhD in materials engineering for which we were trying to find new materials for, to find new semiconductors to perform better so we can make better transistors so they get more efficient. This was like 10 years ago. And trying to find some material for this talk, I found that I failed miserably and everybody's failing. And you might think, well, this is a talk that you can check later. We will upload these slides. And very clever people, IBM, MIT, Google, whether you think this is important or not, doesn't really matter. Because the rate that we are producing more energy and noise lately is higher than the capacity we have. So this is happening, guys. The good thing, if there's a good thing, is that we are not alone. If you care about the planet, if you care about anything related to sustainability, there is someone that can be an ally that is someone who doesn't want to spend more money. Yeah, exactly. And that's what we're seeing, right? We see, on the one hand, we care, of course, about the planet. I think most of us do, hopefully. And our CFO, he might also care about the planet, but he also likes to look at what are some of the financial factors into all of this. So the CNCF did an interesting interview in December. And they were asking what factors are leading you to overspend. And we can see that there's quite a few that kind of have a relation with the power usage, with the actual resource usage of your applications. So you can see that 70% are reporting that they're over-provisioning, that they're using more resources than necessary. And then you can also see 43% says that resources are not deactivated after use. So they're just sitting there wasting energy and then fluctuating consumption demands and poor planning and prediction on cloud consumption. And those are all factors that, of course, affect not only the pocketbook of your organization, but those things are sitting there consuming resources and energy, right? So if we take a look at how our traditional deployments work, we'll over-provision, right? Because we want to make sure that when there's load coming in, that we can handle it, right? And then so you can see that most of the time we're actually having too many resources. We're spending more money and more energy than we should. And then still we run the risk that at some point we get, ideally, we get a lot of requests to our systems and we have happy users. But if we get too many requests that we end up in an under-provision state, even though we try to be safe and over-provision, so that's not so ideal, right? So what would be better if we can scale based on the demand of the moment and use only what you need, right? So you can do that more or less with regular, let's say, Kubernetes with horizontal pod auto-scaling and stuff like that. But with serverless, you can actually scale all the way down to zero, right? So if there's no requests coming in, if you have no pods running, you use no, well, we would assume that you're not using any power, but that's what we wanted to find out. So a project that does this in the CNCF landscape that works with Kubernetes is Knative, right? So Knative enables serverless on Kubernetes. It provides auto-box auto-scaling to zero. So it basically looks for a request coming in. If there's no request coming in, then it'll scale your pods to zero. And if there's a lot of requests coming in, it's going to scale it up. And it can be very flexible with the scaling, which is pretty cool. And Knative is not just about auto-scaling, right? It has a bunch of different features, like eventing and logic and stuff like that. It has this built-in load balancer as well to make sure that all your pods are using the resources efficiently. And the idea with serverless, of course, outside of the scaling itself is that your server complexity gets abstracted away, right? That's kind of where that name serverless came from. It's not that there's no servers. We always have to mention that when we're talking about serverless. And so you would think, auto-scale to zero, that means there's no power usage, right? This makes sense. I mean, if there's no pods, then how are they using energy, right? Well, of course, there's a little bit more to it, right? So we have the pods themselves. But they need to be scaled up and down based on the demand that's coming in. And like I said, that happens based on the requests coming in. Your usual Kubernetes auto-scaling happens based on CPU and memory. But that means that there needs to be a pod running to collect the usage, right, the CPU and memory. And so with Knative, it's kind of an inverse situation. It does not look necessarily at CPU and memory. It looks at requests. But so you need some way to know that requests are coming in or not. So the way that this happens in Kubernetes, basically, there's a control plane. So you have traffic coming in. It goes through an HTTP router. And it's going to send traffic by default to this activator, which is going to notice, like, oh, there's traffic. So we need to auto-scale. We need to scale our application up or down. And then if there's a lot of requests coming in, it also has this burst feature, in which case it's going to send traffic directly to the pod and skip the activator to increase the performance. And there's this Q proxy sidecar container inside the pod that will also report back to the auto-scaler then to determine whether it needs to scale or not. But so, of course, there's a little bit of overhead to this, right? So it's not just the pods that we need to look at. It's the whole K-native control plane. So that's what we wanted to measure. So because we can guess that the pods are not using anything. And maybe what is the control plane using? Yeah, and if we only had a tool to do that, how many of you know Kepler? Let me try. OK. I would say, like, 10%? Yeah, we need to. So actually, there is a booth there for the community. Please pass by. There are very nice people. So Kepler is a great acronym, actually. And it stands for Kubernetes-based efficient power level exporter. But let me explain to you about or give you some introduction to it. The cool thing about it is that it's able to, for the first time, give you granular power consumption metrics. So it's able to tell you how much energy and power your pods, name spaces, nodes are spending. So you can take a look at, uh-huh, this guy is spending a lot, right? It uses eBPF and Rappel. I will get into that in a second. And just for you to know, it was accepted as a sandbox project just a year ago, right? So we will be celebrating next month. So how does it work under the hood? Just a few notes. This is much more complex. But I try to make something easy. You know, I am a product manager. So every time you want to tell your CPU to put this example to do something, you will end up calling the kernel, which will end up calling the hardware itself. So what Kappa does, it installs a program that hooks into the scheduler and says, hey, every time you finish a task, let me know, right? It will do it asynchronously, but it takes no. So what it can do now is say, uh-huh. So it was you with this ID, cool. And you were spending this many CPU cycles. OK, and you did these cache misses. Not that good. OK, and takes no. So please, store this one in your brain for a second. And let's put it back. So we had EBPF doing that thing. Now let's talk about Rappel. Rappel enters the chat. And what it does is that it's able to, we are in Linux. So yeah, it's in a file, right? But that fancy, the technology is fancy itself. But then you will have available the amount of energy that is coming from different components, so CPU, RAM, other anchor components, and so forth. So we have these two things, right? So clever people thought, hey, let's try to make a ratio because I know that this guy is spending these many cycles. I know all the energy that was spent. So I can do a ratio. So this 20% of this energy is because of you. And then we can point fingers, which we love. So then we love modern observability, so we put them in Prometheus. The end of the day, what you will be doing is you install Kepler. Kepler will install this program into the kernel, the EBPF program. And then every time you call to do your fancy servers and Kubernetes stuff, Kepler will be knowing. Kepler will be watching you. And then you will be able to see it in a cool dashboard, right? But then, OK, this was a lot of theories for that. There won't be more. We were wondering, is serverless power powerfully powerless or not? Yeah, exactly. It's funny, we were kind of rehearsing this talk before. This powerfully powerless is quite tongue twisters. I like that. You were the first one to trip over it. So OK, so we created a different scenario to test, right, to see what is the power usage of applications using both serverless and non-serverless. And we were thinking, OK, how can we do this in a fair way? Because Knative also has a load balancer. And so we were thinking, OK, well, so it would be fair that the application, the serverful, the non-serverless application, would use kind of a similar system. So we're thinking maybe with Istio, it's comparable. So that's our first deployment, right? So we have a traffic generator. So that's going to send traffic in over a certain amount of time. And then we're going to not send traffic for a while and then send traffic again to mimic fluctuating traffic. And so you can see here our deployment one with our pods with the Istio proxy inside, so with the side container. And then we're collecting the metrics from Kepler. And then we're going to see how that behaves. And then our second deployment, of course, is the serverless deployment, where it's a similar setup, right? So we're going to generate some traffic, send it to our pods. And they're going to scale up when the traffic comes in, things to k-native. And then we stop the traffic, and then it's going to scale down to zero. And so let's see what happened. So we had this couple of deployments, and we just need a setup to set it all up. So we just created a couple of Kubernetes Chrome jobs. So just remember, we will be sending in name space one traffic 2,000 requests per second from the traffic generator to six instances of this very thin mock server. And then we will rest for 10 minutes, then another 20 minutes to k-native. k-native will be upscaling our deployments up to six. And then there is a concurrency factor that is not very relevant for this, but it's just there. So we just press play. And we, of course, Kepler is watching. I told you, Kepler is always watching. So we build this dashboard based on the data coming from Kepler. So I was very happy because I saw a great difference between energy. And I will explain those. No, I'm throwing a lot of colors and things here. But they look really fancy. Yeah, I love Grafana. So what I first found was this overlap. But then if you look at the labels, the one that is consuming more energy, so I'm showing now in that red box at the top, real-time power consumption in watts, the big one is serverless. So I was very sad. And then the other one is the one we call serverful. And then we have this amount of energy that we are also seeing at the right. Those are the numbers of the energy consumption over time. And so we had this observation. We also observed, you see there is a small box in the big box that says this small overlap. That is something that happens when we are idle, when we are not sending traffic. So we wanted to understand what's there. So there is also this panel telling a breakdown on the different namespaces that are contributing to the serverless experiment. So the big one, which is the red one, is the total, which is the addition of the serverless namespace with our experiment plus the key native serving namespace. So we have now these two contributions. And that's the one to blame here, right? So there's something in the background doing this stuff. What happens with the Istio part or the serverful namespace is that the Istio namespace is not contributing to the overall energy consumption itself. It's contributing inside our serverful deployment. That's because we get the Istio proxy sidecars, and that's the one doing the load. Then we can repeat this one for three hours and for six hours. And it's more or less the same, right? Like more or less no, it's the same. So just for you to know that this is not just we took a look for one hour, and that's it. It's also remarkable that we can also check the amount of the different contributions from the CPU, from RAM, and so forth. And we didn't find any noticeable contribution or difference. So this is a summary of that. So just, OK, we saw that more energy coming from serverless in this case, there is this idle power that got us thinking. So we continued. We jumped into experiment two. We added a third deployment, which is just, we called it plain. OK, let's forget about load balancing, about throwing requests through six instances. Let's just deploy a server and a client and put it together and see what happens. And no surprises. So now we have created a stair, right? If we are plain, not load balancing, not anything, we don't spend a lot of power. If we put Istio, then we have a contribution of power like 2.5 times. And actually, everything is inside the bots with the Istio proxy sidecar doing this contribution, which makes sense, right? Because we have our traffic generator plus the Istio sidecar, which is more or less doing the same. And the servers that are doing nothing, they are saying, hey, hello world, 200 back. Cool. OK, so I was starting to get nervous because people, our friends from KeyNative, were helping. And we were telling them, or we were finding out that now. So yeah, I got frustrated, like that guy. And I started to add more load and more load. So OK, 4,000 requests per second. I did more experiments, like, hey, what if we change this burst limit so we don't bother a lot KeyNative? What if we, like, playing around just to let you know that those things didn't work? So I will not, I'm collapsing the results here, but the trends are the same just for you to know. 4,000 requests per second, the same. Going back to 2,000 requests per second and changing these parameters, no. So again, I was a bit frustrated. And then I called Kevin, like, I was, yeah, not crying, but yeah, we needed to talk here, right? And then Kevin had this great idea. Right, yeah, so I was telling Jose, OK, so we're trying to send more and more load and we're getting the same results. So what are we, like, what can we do to change the scenario here? And then I was thinking, well, the applications that we're using are, you know, a really efficient application. So I was using this Quarkus application. So it's like basically a Java implementation, but I was compiling it down to a native binary. So it was, like, you know, really fast. And using not very many resources, but for our use case, that actually didn't quite make sense because we were trying to, you know, get some real measurements. So we were, like, come on, do something, you know, do something real. And so, yeah, I just, like, made up this, you can laugh at this, but I just basically created this little endpoint where we pass in a parameter for you need to run for this amount of time and then generate some load, right? So do some calculations increasingly more and more, and then append to a string. So we also have, you know, the memory building. And, well, it worked surprisingly well, right? I mean, so we actually did all of a sudden create a whole bunch of load on our system. Yeah, and actually, this was doing some, like a lot of stuff, like what happens with an efficient code that is, like, filling a lot of strings, right? And all of that, that we couldn't load the system with, like, 2,000 requests per second, 4,000 requests per second, because all the pods were having, like, yeah, I mean. So in this example, which I think it was good, we came to six requests per second, but now our servers are doing stuff. And what we found now is that a case in which, actually, when we compare a use case with server full and server less, right, is the aversus generative, which, by the way, you can combine, but just we want to try to understand what's going on, then we get more or less the same numbers. And this was exciting because then we got to understand that it really depends on what you are measuring, what your workloads are doing, what they are doing when they are running, what they are doing, what they are idle. And then with that information, then you can take actions, right? And also, in this use case, the main contribution that we had for the serverless use case, it's only coming from our serverless name space with our experiment. It's not coming from all the other serverless name spaces like a key native serving, which was the one that was contributing too much energy or power, in this case, to our experiment. So we actually found some situations in which it can help. And we will tell you why. Actually, coming back to our question, we are close to wrap up, is serverless powerfully powerless or not? Right. And so the answer, as always, in IT is, it depends, right? I mean, on the one hand, we could see that if we have very small workloads and limited amount of workloads that are running, you're not going to get much gain from scaling down to zero, right? And especially if our pods that are running aren't doing much, if they are not getting the request in, like just like with serverless where they scale down to zero, if they're not getting any requests, we found that those pods really are not consuming much energy. So that's one interesting outcome from our experimentation. But on the other hand, we could also see that in certain use cases, serverless definitely did save some power. And then, well, what we didn't measure is what if we actually have different kinds of density in our nodes, right? So on the one hand, if we don't use serverless, we have to schedule a certain amount of provisioning. Even if we can autoscale with Kubernetes, we still need to overprovision. So let's say these are representations of nodes. And on the left side, you see this is kind of a non-serverless node where we need to overprovision. And then on the right side, you see that we can actually use our resources more efficiently on this one node. So when requests are not happening, we unschedule some loads. And then if some requests are coming in, it kind of fluctuates a lot faster and better so that at the end of the day, we can use our nodes more efficiently and potentially use less nodes. And I don't think we really need to measure with Kepler the difference between a node that's up and running versus a node that's shut off. I mean, even in the cloud, you could say, well, the cloud providers, they still have those nodes running for other workloads or something. But it's kind of like the less power we use, the less power they need to provision, right? So we definitely can save energy by using these kinds of serverless scheduling. Yeah, so actually, that's the whole point about this talk. The important thing is about measure. We can now measure these things. And for those challenges that Kevin was sharing at the beginning of this talk, which were about overprovisioning, then you can measure, do I need that or not? Then you can measure what's the overhead of serverless, what's the overhead of Istio, whatever you have in the background, and then understand the price it has. Every time you add something that is doing stuff, it has a price. There are also a constant contribution to these energies that are also important. So don't only measure when you are running traffic under load. Also, take into account those things because they don't scale linearly, right? We found out that the contribution to the energy from in our naming space was more or less the same for 6 requests per second and for 4,000 requests per second in terms of traffic, right? So really, it depends on what you are seeing or what your workload is doing. And so there's one size that's not fatal. And again, if I were to tell you to take away from here is measure and provision accordingly. And maybe a third one because this is something I'm always thinking and I miss from my programming days. Now you have the superpower. Whenever you have a pull request, you are reviewing code, you were looking, hey, I'm more a space guy, so I'm more like a top guy, and then you will get your pull request rejected. Now you can actually measure if that one is efficient or not and tell your colleague how they're you, right? So coming back to feature work. Yeah, so I think we're still experimenting. It's a really interesting project, so I think we're going to continue trying with different services, maybe trying with different topologies, maybe add those scaling nodes with machine auto-scaling and seeing, I mean, again, we can assume that turning a node off is going to save you energy, but maybe we should measure it too, I don't know. And then seeing what the actual relationship is between the requests that we're sending in and then work that the application actually is doing. So I think we can continue with some work, and I think we will continue with that. Yeah, that's very important. We will continue with the backup of these great people that we really want to shout out. Vibu was helping me a lot, so Vibu, thank you very much. Sunil, Vimal, Kai, Huamin, Parul, Marcelo, everyone that is contributing to this great project. I saw a lot of, I didn't see a lot of hands here knowing about Kepler. Please pass by their booth. They are very nice people. This is a great project, so please go there. And also Keenative, they were helping us a lot. Yeah, and they have a booth here too. And thanks to Roland, who has a talk at the same time. So we asked him to come to this talk, but he couldn't. Jerk. No, I'm just kidding. Anyway, so thank you for joining. So please, we would love your feedback. So if you go to the sked of this talk, you can leave the feedback. You can also find the slides for this talk. And then Rapazer is right there too. And I'll give you just a second to take a scan, because I hate when somebody shows a QR code and then skips to the next slide. It's like, wait. But if you registered for this talk, you can also just access the slides there. And then just a quick shout out to our friendly employer, Red Hat. But they actually sponsor some of our books and then make them available for free to download. So if you're interested, we have a whole bunch of different books available that you can download. And again, we will share the slides, so they'll be in there as well. And then one particular note of interest for me personally. So we're working on this serverless Java in action book. So if you're a Java fan and you like serverless, then keep an eye out for this. It's coming soon. And I think I saw Daniel here in the room, my co-author. But yeah. And with that, thank you. And if you have any questions, let us know either through all these channels or here in the room, because I think we still have a minute or two. We have five minutes, yeah. Thank you very much. If you have any questions, you can walk up to one of the microphones, I think, so don't be shy. Or you can also be shy, and then we can talk afterwards. We'll be at the Red Hat booth, I think, afterwards for questions. And if you want to see, I think you can probably show at the demo, or at the Red Hat booth, how Kepler works, right? Quick. Yeah, I think so. Out of the pocket demo. Yeah, yeah, I can do that. OK, we have a question. Thank you. My question is, did you ever check out if C states, like, we all know that a lot of hardware just has deep C states disabled for dubious reasons? Did you ever check out how much of an impact that enabling that actually makes in a whole fleet of thousands and billions and trillions and nodes? We didn't check that, not at all, but noted. And we want to take a look at more stuff, so that's good feedback. Thank you. Yeah, you can add it as an issue to the repository. Yeah, that's a good one. Thank you. Yeah, that's a good one. Come on, don't be shy. OK. OK, going once. Going twice. Thank you. Thank you very much.