 Hi everyone and thank you for joining me today. We're going to be talking about energy monitoring and energy benchmark tests and various GitOps architectures and looking at flux and Argo CD as well. I'm Niki Manoledaki. I'm a software engineer at Weaverx and I'm also a maintainer of open GitOps and a contributor to the environmental sustainability tag of the CNCF. My co-presenter Al Hussein wasn't able to make it today, unfortunately, but we'll hear from him in the video in the in the video in the presentation. So the cloud's carbon footprint. There are a lot of numbers out there trying to estimate what this is. But it's difficult sometimes to find a reliable source that will give you a definitive number, which is also a guess that is always changing and it's going to be growing continuously in the next years as well. But the International Energy Agency, the IEA, has some numbers about data centers and data transmission networks where each account for one to one point five percent of global electricity usage, which is about three percent. There's other numbers that measure the ICT industry as a whole to be around four percent. But yeah, so in the diagram that you see here, you also see, you know, a larger diagram of global net zero targets and what that has to look like, net zero being a reduction of 90% or more of greenhouse gas and other carbon emissions, where 10% only can be offset. So the 90% reduction cannot be through offsets. And so let's go a little bit deeper in this conversation on energy and carbon monitoring and optimization in the cloud native ecosystem. There's some drivers and markets in the market such as net zero. A lot of companies have these in place and other sustainability targets. So that means reducing the energy usage and carbon emissions of your infrastructure. There's performance optimizations to be gained from energy efficiency as well as cost savings. Also, you know, encompassed by the term FinOps, where you can deduce that if you reduce your resource utilization and cloud usage through cost optimizations, that may be a proxymetric for sustainable metrics. And there is also regulatory drivers in the US, the Security and Exchange Commission, the SEC, has an upcoming rule called the Climate Disclosure Rule, which will require publicly traded companies to report on their carbon emissions. Scope one, scope two and scope three. That's direct, indirect and embedded emissions. And in the EU, there's already been a law that's passed and is going to take 18 months to be implemented. And that's the Corporate Sustainability Reporting Directive, or CSRD. That's going to also require companies, especially publicly traded, large companies to report on their carbon emissions. And if you want to hear more about this, the Green Software Foundation, which is part of the Linux Foundation, has a great podcast called Environment Variables. Amazing name. And one episode is on the legislation aspect of this and how it will impact the tech industry and the cloud industry and cloud users as well. So there's a lot of community momentum at the moment in the CNCF. There's the Environmental Sustainability Tag, which meets twice monthly and you're very welcome to come and contribute and learn. And there's also in the GitOps working group, we have the Environmental Sustainability chapter or subgroup, where we actually this talk came out of that. And you're also very welcome to join us if you would like to do more of the test that we're going to be talking about here. And last but not least, there's, well, there's the Green Software Foundation and that I'm going to talk about more in a minute. And the Project Silver, which is a new project by Linux Foundation Europe, which is focused on telcos and how to measure and reduce the energy consumption of especially 5G architectures using cloud native software. So about the Green Software Foundation, let's go, let's zoom in a little bit deeper as well before we get into why energy benchmark tests matter. The Green Software Foundation has been working on the Software Carbon Intensity Index, the SCI, which they're in the process of of making into an ISO standard for measuring the energy consumption and carbon emissions and as a rate of a software component. So there's a formula that they have put together with various variables such as energy consumption, carbon emissions, embedded carbon, and all of this per rate. So per rate of something such as this could be an API request, it could be a reconciliation like what we were going to see in this in a minute. So that's really important for comparing the software intensity, the carbon intensity of software. There are many open source tools available for measuring energy consumption as well as the hyperscalers have some offerings as well for measuring carbon emissions through carbon dashboards. These offerings are very limited. There's definitely an issue with data granularity, like you can't necessarily on AWS, for example, you can't set a tag and view your carbon emissions via a tag. You see your entire infrastructure. You can't see by region, which is helpful, but you can't really deduce a single EC2 instance what it consumes, for example, which is what we need. And there's a three month lag. So if you, whatever you're running today, you're going to have the carbon emission of it in the three months from now. But it's helpful for the long term reporting. And in the open source side of things, there's still a lot of guesstimating, but Kepler is what we're going to look at today. It's an EBPF based tool. So it's Kepler stands for Kubernetes based efficient power level exporter. And it's a great acronym as well. And it's a Prometheus exporter. So it looks at your kernel syscalls. That's the EBPF part. And it looks at various components, such as it uses the Rappel for energy data aggregation, and also other components that are in the kernel to aggregate all of that and provide energy consumption in joules. Joules is a metric for energy. Joules over time is watts. So if you look at joules over, for example, two hours, you're going to get the watts equivalent, which you can use with a carbon coefficient for a given electricity grid to get, to multiply that with that coefficient and get the carbon emissions for your software. And so, yeah, EBPF for monitoring is really cool. And it's a growing topic as well. And so in Prometheus, what you end up having is this Kepler container joules total, which is amazing. It's magic. So the test environment setup for these tests have included Ubuntu and Arch Linux. So very painful. I was using Arch for this. It hurts a little bit sometimes. But trying to do this on a Mac was very difficult, even in a VM. It doesn't help. Yeah, it's, it's challenging. But on Linux, it's much easier. We used Minicube to create the cluster environment with CPU count of four. And three is also possible, but it did run out of, yeah, compute power sometimes and crash my machine multiple times. But we then use Prometheus, Grafana, and then the magic, the secret sauce Kepler to measure flux and Argo operations. So idle GitOps dashboard number one over two hours, we can see the flux system namespace and the Argo CD namespace without anything, anything to reconcile. So just idle. We can see that Argo CD is consuming slightly more energy consumption at a base level. And we can also see that the flux system, the flux has some spikes. Now any guesses as to what that may be caused by customized controller. Yeah, very close. So you see those spikes. And so, yeah, there's a little bit of a different pattern that we can already see. Now my co-speaker, Al Hussein will present a little bit more about the benchmark test in the architectures. Thank you, Nikki. Hello, everyone. Today, I would like to take you through the experiments scenarios we conducted to evaluate the energy footprint of a set of GitOps architectures and patterns, which are first, we examine the concurrent deployment of a sample application with Argo CD and flux. Next, we compare the power consumption of deploying applications from a mono repo versus a multi repo approach. Moving on, we explored the impact of using a standalone cluster versus a happen-as-poke or multi-cluster setup for Argo CD. And lastly, we vary the reconciliation interval comparing a three-minute interval with a 30-minute interval. For each of those, our long test starts with having the GitOps tool in an idle state for the first 15 minutes to ensure a stable starting point before deploying an application. After another 15 minutes elapsed, a code change is introduced to trigger an update, followed by a rollback after 15 minutes. At the end of the hour, the application and the GitOps tool get deleted. Now, let's dive into the architecture of the first scenario, in which we aim to evaluate the energy footprint of deploying the GitOps sample application to a mini-cube cluster, using both Argo CD and flux simultaneously, but to do different spaces. There is a rise to independently deploy and manage the application using the set tools. The power consumption in the flux namespace remained relatively consistent throughout the hour, hovering around 0.1 watts. However, there were periodic spikes observed approximately every 10 minutes or so, which could be potentially attributed to flux internal operations, such as synchronizing with Git repo history or checking for changes in the application's configuration. On the other hand, within the Argo CD namespace, we observed a different power consumption pattern. Initially, the power consumption started just below 0.1 watts, however, after approximately 15 minutes, there was a spike when the sample application was deployed. Furthermore, after another 10 minutes, the power consumption in the Argo CD namespace increased to 0.3 watts and remained at that level for the rest of the hour. The sustained increase in power consumption in the Argo CD namespace may be linked to the ongoing activities performed by the Argo CD application controller. The analysis of all namespaces reveals interesting insights. Notably, the cube system namespace exhibits significantly higher power consumption compared to the other namespaces, which can be connected to various system components and processes that are essential for the functioning of the Kubernetes cluster, such as kubelets, kubroxy, and other control plane components. On the other hand, the GitHub's tools demonstrate much lower power consumption. Similarly, Kepler itself also shows lower power consumption, which suggests that its overall footprint is relatively minimal. In our second experiment scenario, we focus on comparing the energy footprints and impact of code repository strategies, namely monorepo and multirepo. For the monorepo approach, we deploy both guestbook and pod info sample applications from a single Git repository. This means that all necessary configuration files and charts are stored in a single Git repository. In contrast, for the multirepo approach, we deploy the sample applications from two separate Git repositories. Each application has its own dedicated repository containing its hand charts. It appears that there is not much of a change in the power consumption between the two approaches. The power consumption in both our Gostidian flux system namespaces remains relatively consistent, regardless whether the applications are deployed from a single or multiple repositories. This implies that the choice of repository strategy does not seem to have a significant impact on the power consumption when it comes to deploying applications following a GitOps model. Our Gostidian repository seems to have peaks when there is an update to the code in the Git repository, which happens around every 15 minutes, which is the sleep time that we have set in the experiment script. While ARGO CD application controller is the one most of the power consumption is attributed to. In our third experiment scenario, we investigate and compare two architectures, standalone, a single cluster, and Haban spoke for multiple clusters architecture with ARGO CD. In the standalone approach, we deploy a sample application to a single cluster, specifically a mini-cube cluster where ARGO CD is installed. In contrast, the Haban spoke approach involves deploying the same application to multiple clusters, two AKS clusters besides the mini-cube cluster. We utilize ARGO CD's application set feature for this purpose, which allows us deploying and managing the applications across multiple clusters. In the case of multiple clusters deployment using the Haban spoke architecture, it appears that ARGO CD consumes close to twice as it does when it deploys to a single cluster. The increased power consumption in the multi-cluster scenario can be linked to factors inherent to the Haban spoke architecture. Managing deployments across multiple clusters requires additional resources, resulting in higher energy consumption. Regarding flux, Niky will explain why it's not included for now. Over to you, Niky. Yay, thank you, Alhusein, who is definitely here among us right now. Thank you, Niky. Hello, everyone. Okay, so why did we not include flux in the single cluster versus multi-cluster experiments? Primarily due to time constraints, but also because doing so the equivalent would be to use, for example, cluster API. Cluster API, if this is new to you, is a way to create a management cluster and then create a bunch of workload clusters that kind of copy what is happening in the management cluster. So you can manage a fleet of clusters that way, and it's also very GitOps compatible because you can configure everything with manifest in a Git repo. So cluster API and flux is a great combination of tools to use together, but part of why I could literally not run this on my local machine because it's very resource intensive. I had just like Lenovo ThinkPad 13 to work with and using the cloud would, for example, require using cluster API for Amazon, the Amazon provider, which works with EKS. And full disclosure, I was previously a maintainer of EKS CTL, the CLI for EKS. And I've been researching which AMI, the image for the node to use, like the operating system that would be the most ideal for running Kepler on the cloud. Which is, it's very challenging again because of the EBPF constraints and you need your kernel headers to be exposed in a read-only secure way. And you need C groups V2, and so you really need quite a lot of configuration in order to run Kepler on the cloud, and it hasn't been done yet. But yes, the last reconciliation scenario that we tested was trying three minutes versus 30 minutes of reconciliation interval. And the results for that were like slightly, I'm not sure if it's what we expected for flux. For Argo, you do see a decrease. So there is some optimization there for flux, not so much. And actually what we realized is that we were testing the reconciliation for specific resources, like Git sources and Helm sources. But this is not the same as brick-conciling everything according to these intervals. And Kings and my colleague had made a really good comment that sometimes optimization are not that and they may actually be misconfiguration. So we have to be really careful with that so that optimizations don't stop us from using a tool in the most optimal way. So other GitOps architectures that we want to try, like I mentioned before, is to use it with Cluster API to run it on EKS. And pull-based versus push-based reconciliation would also be really interesting to try. So some of the challenges and lessons learned. The operating system is an EBPF portability, something that is very challenging and it's quite well documented. I recommend this article by folks from Pixie who also work with EBPF on the challenges of deploying EBPF in the wild and accommodating various different operating systems and how difficult that can be. So, like I mentioned before, we used Linux machines and MacOS is pretty challenging. The Kepler does support VMs, but it's still, I've tried with kind, I've tried with Minicube, I've tried with VirtualBox and no luck so far. Hopefully one day. That would make the DevM much easier for Mac users, otherwise you need to also have a Linux machine. Resource limitations, like I said, the test environment for running all of these tools and all of these operations can be really intensive. Again, DevEnvironment is a challenge for this. And our machines, like we had to try all of this so many times because sometimes we just had Kepler not returning any data and the GradFinal dashboard was literally just five dots, so we couldn't really show you that. And our system often became unresponsive and we had to start over and so it can be really difficult. And cloud limitations, so we really should be trying all of this in the cloud, right? Currently there's no support for EKS, if there is interest in doing that or solving for any of these challenges, please do contribute to the sustainability subgroup of OpenGitOps. And lastly, oh, but Kepler is supported by OpenShift. Kepler is a tool that is developed by Red Hat and IBM primarily, so there is support for OpenShift. And some statistical limitations is that we need way more data points because we can show you a couple of examples or averages, but ultimately we need way more data points so that we can have actual information that has been tested on different environments hundreds of times, ideally, and for much longer periods of time as well, so that we can have a better picture of what the energy consumption of the software is. So you can try it yourself. There is, if you scan this QR code or go to visit this page, you will see scripts for how to run all of this yourself. And we are planning to move all of this work to the OpenGitOps organization on GitHub very soon, so that's going to be something that you can also help to maintain and spread the word and contribute to and measure whichever tool you are building or maintaining using, etc. And lastly, the Environmental Sustainability Tag of the CNCF is going to be meeting on Wednesday around 12.45 outside of SustainabilityCon's room, very targeted room location. But if you're interested in talking more and joining the tag, you're very welcome to join us there. And also attend some SustainabilityCon talks if you're at OSS in the next days. And if you scan that other QR code, you can see more talks about flux happening these days. I don't know if there's time for a question or any questions. Yes. No, we didn't compare the energy consumption for different ways of setting up your customization, but that's a really interesting test to do as well. And I don't think it would be too difficult if the difficult part is setting up your test environments. But after that, you can really measure anything that you want. Okay, so the question is if flux was intended to be more energy efficient from the get-go, given that the data shows that flux has a lower energy consumption. I don't think that it was necessarily intentional. I don't think that we had this data before now to be able to measure things. But there have been, I know there's been some performance improvements very recently. So maybe that played a role. We haven't tested different versions before those optimizations and after, but that would also be really interesting to be able to share that kind of information. Thank you. I understand that they're there. So the question is if we tested get-sub modules, we haven't tested that. What you see is everything that we've tested in the past about for this talk. But that's another use case that would be really interesting. And I hope that we can see more features being tested before they're released in that way to also be able to show that the consumption is low compared to some that might double or triple your consumption and hopefully that becomes a test for the future. Yeah, so I don't know if this was picked up. But the question is if sustainability can become a topic that is as important as, for example, security and if it can become part of the test that we run, the auditing that we run on our software. There is a lot of demand creation being created right now around sustainability in the cloud-native ecosystem. There's a lot of questions and it's very new to a lot of people. So we're still doing a lot of education, a lot of awareness. And there's also sometimes sustainability can be kind of a buzzkill sometimes because it sounds like something that you do out of, for what, right? And actually there's so much that's been shown that improving sustainability can reduce financial costs so we still have to kind of link those together to people. There's also regulation coming that's going to require those metrics of whether this is a goal or not. And I've seen a lot of reports that are saying sustainability is in the top five for the years to come. So I hope so and maybe we can make sustainability or measuring the SCI part of the graduation process. I think that would be really interesting for the CNCF. Yes, I think we're out of time by four minutes but there's three questions and I don't know, maybe we can have these questions afterwards. Sorry, I would love to but we're a little bit behind. Thank you.