 And before I start, the first thing I'll do is answer a question by Amit here. So Amit asked, is this event going to be recorded for later? The answer is yes and no. The event will be recorded, yes. So one or two sections may be edited, so it may not be the exact same stream, the recording may be edited slightly, but yes, it will be recorded and you can watch the session again in the future. So if you're worried about dropping off early, you're fine, you can get a recording. Don't feel too worried about that. And yeah, I think that's it. There's questions about sound here, yet you should be able to hear me, so if you can't hear me, refresh, check your volume and things like that. So let's get started. Welcome to DevNation Day, we're focusing on GitOps and CI CD. I've been short as a developer advocate and I'm just here to introduce our awesome speakers. So on our agenda today, we're gonna have Harriet and Kusta from Red Hat. They're gonna talk about Tecton and Argo with a community update. We have Citi here to talk about their fortified DevOps factory. Ford will be talking about OpenShift Lifecycle Management using Tecton, Argo and PAC. And then lessons learned from Olina Sal and Ian on using GitOps and OpenShift. We have AWS talking about Argo CD scalability and testing. And then finally, we'll have a Q and A at the end. So that's the overall agenda. We're gonna be here for a little bit. I hope you stick with us. Do not hesitate to use the chat. There's a few of us backstage, we can answer questions and we're also going to be able to answer your questions live as well. So I think, yeah, let's get started. I'll hand over to Harriet and Kusta for their community update. Thanks so much, Evan. All right, I think we'll get started with Kusta talking about Tecton. Yep, hi everyone. So I think to start with a brief intro about Tecton itself, it's an open-source project which provides a comprehensive set of standardized components tailored for building Kubernetes-style C&C system. So currently it is governed by the CD Foundation. However, one community update from Tecton is that we are trying to move Tecton to CNCF to gain even more momentum and amplify go-to-market activities. Now, to see how Tecton shines within the OpenShift ecosystem, in OpenShift we offer a supported and tested Tecton operator and it does not stop there. We enhance Tecton with additional capabilities like seamless integration with Dev Console, et cetera. So how OpenShift pipelines are Tecton users and their use cases to talk about that. So it caters to primarily three kind of users. One is your platform engineers or site reliability engineering team who actually, for example, manages the Tecton config or let's say the controller resources, all sorts of things. And then presents that as a service to the end developers which is your application engineering team and the release engineering team. So one important thing to note here is that the platform engineers, they provide a lot of tools as part of their internal developer platform. And we will hear about it when we hear from Citi Ford and other speakers that they are providing Tecton as a service to their end developers. And there will be a lot of tools along with Tecton. There can be CD tools, there can be sports scanning tools and all sorts of things. Now, once Tecton is served to the application engineering or the release engineering, the release engineering what it does is basically, it obviously, for example, prepares the application engineering team's pipelines. Obviously, sometimes what happens, the platform engineering team provides some golden pipelines or golden tasks, which are the building blocks of Tecton to the release engineering or the application engineering. Now, the whole point is basically the end users or the application engineering or the release engineering, their use cases becomes very broad. And obviously it includes CI. CI is obviously the top most use case that we are seeing with Tecton. But apart from that, there is obviously batch processing. There is orchestration of different kinds of automation tools or any kind of tools in that nature. For example, you'll hear about, for example, from Ford itself, how they are using Tecton to orchestrate with Tecton. And then there is obviously custom business automation that that obviously you can do with Tecton itself. Now, next slide please. Yeah, so what's the value of Tecton in their whole internet and developer platform? Then why should they use another tool with Tecton? So there are a couple of reasons behind that as we see and these are the things that we are hearing from the customers itself. So one is obviously the hybrid model and cloud and vendor independence. You can take a Tecton YAML file and you can run it in any cluster in any cloud. So that's one of the biggest value proficiency. Other thing is that when I talk about, when I talk about the Tecton 4 model, you will understand that Tecton is a very flexible in terms of integrating with other tools. Because it is a very much, when it is native in nature, fundamentally all your building blocks, that task, those are pods itself and there you are writing, for example, how the task will behave. Because of that, it has a tremendous flexibility to integrate with any tools of their choice. Now, because the internal developer platform that the platform is in engine provide, there are a lot of tools that the developers prefer. So that's why you need probably some sort of tool in CI which integrate with other tools much better. The other thing is obviously to adopt advanced use cases and you will see that when our customers speak about it. Finally, obviously because the Tecton itself is Kubernetes in nature, all the tasks, et cetera, those runs as pods, you have your Tecton Kubernetes controller which can scale pretty much well. So scalability and modularity is another value that the customers see to include Tecton in their internal developer platform. And finally with Kubernetes RBAC itself, you will see that it is very easy to include security in Tecton and finally, Tecton has certain components specifically tailored for keeping the whole depth secops narrative in place. One is obviously all sorts of scanning tools, lots of tools inside the Tecton Hub itself and the other thing is Tecton change which I talked about in a little bit. Next slide. Yeah, so fundamentally, if you see the Tecton core model, right? So what happens in Tecton, it's very simple to understand. So basically you have your source code management system you have something in GitHub, GitLab or Bitbucket or anywhere for it can be some custom event as well. So there is a component called Tecton event listener which actually listens to those events and then it can do additional events with filtering on those and then pass it to the Tecton triggers. Now Tecton triggers will pass those payloads from the event and then pass it as a parameter to the pipeline itself. Now the whole pipeline itself is obviously will be a building blocks of, as I mentioned before, tasks and that task can run parallelly sequentially. You can define all sorts of conditional things on the task and inside tasks there will be obviously steps. Now for, because each task itself run as a pod, right? There you can, this is the flexibility that I was talking about. For example, if you want to run a different image for each task, you can do it. If you want to mount some volume or work spaces in any task as a two Kubernetes way that you can do it. And obviously there are, for example, as an input to the task, there are parameters and then you consume the parameters and then from a task itself, you emit some results and those results can be taken by another task as a parameter. So that's the whole Tecton 4 model. There is event list, there is trigger and then there is pipeline. In the obviously the reusability part of Tecton itself, the task can be reusable in nature. So there is Tecton Hub, which is basically a collection of all the tasks. Harriet or Netflix? Yeah, so ultimately if you see where Tecton is right now, so the customization, it is in the second quadrant, which means that the customization scope with Tecton is pretty high, right? But there is also, for example, the integration effort to make it work with, obviously it can integrate with everything, but the effort that has to make it to integrate, so that's where Tecton, the effort is a little bit higher. So the direction of the whole Tecton community itself is to make it go to the first quadrant, where it will increase the customization scope and then obviously it will make the integration effort much lower and that's where the whole community itself is moving and that's where also Red Hat is also moving with Tecton. Now, what I'll do is that I'll talk about some of the new things that Tecton has, as for example, introduced and some of the, I mean, basically some of the new things that is coming on that. So one is Tecton dissolvers, so I talked about something around, for example, platform engineering teams providing some sort of golden pipeline or golden tasks to the internal, so the end user sites. That's where the Tecton dissolver comes into picture. So it makes the usability aspect much more better. So for example, if there is a golden pipeline on a golden task or et cetera that the platform engineers make it available to the end users and that can be by anything, that can be for example, those tasks for those pipelines can be inside a grid repo. If they want, they can place it inside some sort of name spacing in a cluster or let's say if they want to go beyond and want to have that as OCI images as Tecton bundles or maybe as a task for pipeline in Tecton, how about Tecton? The resolver itself, it will fetch those tasks and in the user pipeline run and it will resume that and it will make easier for the usability part. So another, this is something that Tecton recently launched and some upcoming enhancements that we are seeing on Tecton dissolvers is obviously multi-org support et cetera, but one important thing I've highlighted the cash in support in the resolver that is something the community is working towards. The next part, which I want to talk about which is pipeline as support. Again, this is an upstream project which is built on top of Tecton and it is available as part of OpenSheet Pipelines. Now pipeline as support, this has been recently introduced and we are seeing a lot of customer adoption here. The beauty of it is that while I explain the Tecton 4 model itself, so there is event less map, there is trigger and then there's pipeline there. But setting up all of that, that there can be a little bit of pain. And that's where I told that the Tecton is moving towards, for example, making the integration effort a little bit lower. The pipeline as support, what it does is that it provides out-of-the-box solution to integrate with, for example, your STM providers can be with Gate, Bitbucket or anything. For GitHub, it is with GitHub applications for other GitLab or Bitbucket, it can be by the way. So once you set up the GitHub app with a particular repo and your pipeline as a port controller starts up, it's very easy. So for example, in your pipeline, you don't have to write all those trigger templates, you don't have to write all those event listeners. Inside the pipeline as code, inside the pipeline run YAML itself, as an annotation you can add. So for example, at which event you want to trigger a particular pipeline, it will be a full request or a merge request or for example, if, let's say, if you want to do some advanced CL-based kind of filtering on those events, that also you can do. Furthermore, as an annotation itself in the pipeline run YAML, you can add concurrency. So let's say you can define that how many concurrent pipeline runs you want to run for that particular GitHub or GitLab repository. And also how many pipeline runs you want to keep inside the cluster ones, let's say the PR is complete or something like that. And then ultimately after the annotations, it will be the same kind of pipeline run YAMLs. One of the couple of announcements, obviously as you see it growing adoption is around the FAC interceptor. What it essentially means is that right now you see that whole pipeline as code, it stays inside the Git repo. So inside your source code management repo, inside the .tectron file, you will have the pipeline run YAML. But you don't have to keep, for example, the task run or even the pipeline YAMLs there. So you can paste it in a separate repo. You can provide it as a golden template. And what FAC interceptor does is if not only dissolve those pipeline runs, it can also do some additional operations. For example, you want your CI pipeline or any pipeline to happen. And then after that, there should be some kind of scanning task or some kind of change task that happens. So you can include that in the pipeline as code interceptor. Apart from that, there can be obviously other enhancements that the community itself is planning, including multiple bit of app support, et cetera. One, two important things that I want to highlight here. One is, as I mentioned before, pipeline as code itself supports some sort of concurrency via the annotations. There will be much more advanced concurrency control that the community is planning. And the other thing is that you will hear that many of our customers, they use multiple clusters with Tectron. So we want to have some sort of multi cluster load balancing with pipeline as code. The next item is around Tectron results. And those who know Tectron basically know that observability itself is a big pin point in Tectron itself. Now, what I mean by that is that, for example, to access the logs, et cetera, your pipeline runs here, et cetera, those needs to be there inside the cluster that itself is not viable or not, in fact, that is not scalable in nature at all. So that's what Tectron results comes into the picture. So what Tectron results essentially does is that it's a separate component of Tectron which you can deploy. And then if it watches the pipeline controller, it looks at whether a pipeline run is complete or not. If a pipeline run is complete, then you can configure the Tectron result to send the logs, even logs to a separate place, to a logging infrastructure that you might have. For example, right now, Tectron results support, GCS S3 sending the logs there. And then obviously you can, there is a Postgres DB that comes with Tectron results and obviously you can bring your own external Postgres DB as well. You can query from the Postgres DB all your logs, all your pipeline run records and everything. So there are, this is one important aspect that we believe that that will solve a lot of problem with Tectron observability itself. Couple of announcements on that Tectron results will be a much better for namespace logging infrastructure. We know that customers do provide namespace as a service to their end developers. There'll be some sort of enhancements on the log retention policies, et cetera. And finally, from OpenShift itself, we will integrate Tectron results with our console itself so that even the older pipeline run logs, everything, you can view it in a nice dashboard and there will be some sort of metric through on top of that. Next. And I'll quickly cover this is obviously there is Tectron chains. It has been there in the community, but gradually we are seeing a lot of adoption in Tectron chain. And this is where another benefit of Tectron comes into the picture. So it is the supply chain security manager for Tectron. What it essentially does is that it can create your sign provenance for your Tectron gates. Not only that, so it observed, for example, all your tasks and executions or pipeline run executions. And then, for example, it converts those snapshots in a payload and then a sign source payload. There are a lot of enhancements that is going on on Tectron chains as we gradually see that customers starting to adopt it. Finally, I think I'll quickly cover the pipeline for update itself in the community. There is now V1 API, obviously. And then there is custom tasks which has been included in pipeline and this custom task is a very powerful feature in that regard. So for example, all your different kind of things like your manual app, the world of sort of things you can do by a custom task. Isolated workspace has been there in Tectron for some time. The recently what has been introduced is called pipelines in pipelines. So for example, if you want to chain two pipelines in Tectron, previously this feature was not there. So this is something that has been introduced in the Tectron pipeline itself. Pipelines in pipelines, it helps better in the orchestrating part itself. And finally, this is something that the data itself is doing which is called Tectron ecosystem. Now, as I always mentioned that we want to make the integration effort itself is lower. Even though there is Tectron catalog, now we want to have much more better collection of tasks or pipeline inside from and that which will be supported by data and their partners. So we will be launching the data Tectron ecosystem soon. So yeah, fundamentally you can see all those from Tectron resolvers to Tectron results to pipeline. For all of them, they are trying to do what they're trying to do is to make the customizing scope high and the integration effort is lower. And Tectron will be moving there. Thank you, that's all. Over to you Harit. I need microphone on as well. Thank you so much, Kostav. All right, so at Red Hat, we focus on Algo CD as our GitOps engine and our recommended application and configuration deployment tool. But I'm guessing not all of you here today are that familiar with GitOps. So if this is the first that you're hearing of it, GitOps has come out of DevOps. It's a kind of an evolution there. It takes the DevOps lifecycle and it ties it into Git and it adds in continuous reconciliation. So it's mostly focused on the continuous deployment and delivery part of the DevOps lifecycle. But GitOps also brings in elements of continuous monitoring and infrastructure as code, as well as the cultural aspects of collaboration and communication. So there are these four core GitOps principles and these were defined by the GitOps working group in 2021. So the first one is that your system is described declaratively. It means you've written down somewhere, usually in YAML, what your system should look like, be that an application with access control or how a cluster should be configured. Next, we want this desired state to be versioned and immutable. We need our YAML files to be stored centrally and accessibly and that will create a single source of truth for our system. So this usually comes about by using Git, but any system that complies with this principle can be used. Next up, we want any changes that we approve to then be applied automatically. So usually that approval process is through a pull request to your config repo and then once that merges, we want those changes to be rolled out automatically. So not only does it pull from Git, but it pulls at regular intervals to check for changes, which leads us into the last one, which I feel is the real key part of GitOps. So you've got a controller that monitors your repo and it pulls for changes and detects drift between your desired state and the actual state of your system and it can act on that drift. So that might be notifying you of the discrepancy or it could be automatically bringing it back into sync. So there are a lot of reasons why an organization or a team might look into GitOps for a new approach. Some of the challenges that we hear frequently from our customers are changes to your config is making it hard for your QE teams and developers to do their jobs. Perhaps you're spending a heap of engineering time on deployment. Maybe you need a way to create an audit trail for your configuration. Though even in less strictly regulated environments, change management by itself can be difficult. Maybe you're currently managing config by hand on each cluster. Maybe the history of why things are done the way that they are is stored only in the heads of your long-term staff. And then when you go and try to implement automation, it's hard to do that without a supporting framework. So many of our OpenShift customers have faced these challenges and have found GitOps to be a good way to address them. So if you're new to GitOps on OpenShift, we have an operator available that will set you up for success. So that's the OpenShift GitOps operator and that's available on Operator Hub and it's managed by OLM. When you install it, the operator will set you up with an instance of Argo CD with cluster-wide permissions and then you can go and install as many more instances as you need wherever you need them. So what's coming next? So obviously, our upstream is Argo CD and that comes into OpenShift GitOps shortly afterwards. So there's a lot on the horizon that is very exciting and I've just picked out a couple of highlights. So 2.9 release is coming very soon and has a lot of awesome features. We'll be getting support for rollbacks and history and multi-source applications as the Azure DevOps web hooks will be part of the Git generator. A new feature that allows you to, instead of having to set up manual resource exclusions when you have insufficient RBAC permissions for something, it will set that up automatically for you. We're also getting self-signed TLS certificates for the GitLab provider and a heap more things. Helm lookups. So this is a really popular request from all of our customers and the community but it is a really tricky one as how Helm works with Argo CD is really baked into the design. It is early days yet, but there is a very exciting POC in process from one of the code fresh founders, Dan. And utilizes the new feature that was released as part of Helm 313 that allows dry run to perform lookups. So keep your eyes peeled for that one. Source verification policies. So this is a new proposal in the upstream community by our own GitOps architect Red Hat, Jan Fischer. And these provide the ability to define how strictly and how far back you need to verify commits in a repo. So it'll start just by using GPG which is already supported in Argo, but we are looking to expand it into include things like SIG store and Helm provenance as well. The last one I've put on here is the scalability SIG. So you'll be hearing from our friends at AWS today who are some of the main drivers in the scalability special interest group in the upstream community. This group has been incredible and done some really awesome work so far. It's making fantastic progress kind of testing the limits of Argo CD scaling capabilities, finding the root causes and going after them and fixing them. All right, we've got two big events coming up very soon. There's ArgoCon NA and GitOpsCon EU. So ArgoCon is co-located with KubeCon again and it will be in Chicago on the 6th of November. And if you're not able to make it there the talks will be posted online as usual by the CNCF after the event concludes. GitOpsCon EU is a virtual event this year and it's gonna be happening on the 5th and 6th of December. The schedule will be announced soon so keep your eyes out for that. I was lucky enough to be chosen for the program committee for both these events and I can tell you that the agendas are going to be jam packed with awesome sessions. If you're looking for more information about GitOps in general, Argo CD or OpenShift GitOps in particular there is a bunch of stuff out there and I will drop the links to all of these in the chat so that you can take a look at them. And that is me. So I'll hand back to Evan. Thanks Harriet. So up next we have Jason from City I believe to talk excuse me, to talk about their part of our DevOps factory. So I think it's time to introduce Jason. All right, thank you very much. Can you hear me okay? Am I on? Yes, you are indeed. I am indeed, brilliant. Okay, good. Yeah, apologies everybody. This is the first time using this platform there seems to be a little bit of a lag on my camera at times. So if I seem to be saying one thing but looking like I'm saying something else then I might drop the background if that happens but let's see how we go. So my name is Jason Morris. I am the head of DevOps Enablement at City for what we call an institutional clients group. And yeah, I'm a developer by trade. I was doing development first 20, 25 years. I primarily working in the electronic trading space. So a lot of high frequency stuff in the city trading space. And but more recently have sort of gravitated towards DevOps. I mean, I've been doing this role now for about five and a half years. And I found that I was maybe more attracted to the process of delivering software and how we do that at high quality and high frequency. It may be more than I was about actually just delivering the specific business logic that I was in. So that's what I do now. I was gonna be joined today by my colleague Segal Duak. She was, she's my sort of core engineering lead. Unfortunately, Segal is based out in Tel Aviv, in Israel. I'm sure you'll understand that, you know, logistically that's been difficult to do. So you're stuck with me and I am gonna be flying solo for the next sort of 15, 20 minutes. Right, now I am supposed to be sharing some slides. I apologize, let me just put those up for Harriet. Brilliant, here we go. All right, yes, so that's me and Segal. So let's get started. So, you know, Citi, I hope, is a very well-recognized name, maybe in more so in certain parts of the world than others, but, you know, it is a truly global bank. Those of you who do know it, probably very familiar with Citi Bank, certainly if you're in the US, you know, we have a very, very big presence there. A lot of people just use that for their day-to-day banking. That is only really half of our business. You know, the retail side, we also have a massive sort of institutional clients group of which I'm part. We are the bankers to governments, to multinational corporations, pension funds, hedge funds, all the very big players. And, you know, into some ways, I guess that's where we like to think the real money is. You know, in terms of size, you know, Citi is huge. It is around 270,000 people globally, about 50,000 of those work in technology. We're a very, very big technology shop. You know, my job is to try and enable those 50,000 people to deliver software, you know, safely, securely, and swiftly. And the people who are using it, you know, there are varying levels on the CICD DevOps journey, which is important, and we can talk a little bit more about that as we go. We also have a huge diversity of technologies, right? You know, pretty much in anything that's been a fad in the last 40 years, we've got it, right? We go everything literally from mainframes to AI. We've got Java, we've got Python, we've got Go, we've got Node, we've got Scala, we've got Clojure, we've got F-sharp, you name it, we've got it. And we also got, you know, just as a varied set of targets, servers as well, you know, we're delivering to physical VMs. We're a very big consumer of OpenShift, and we use that as our internal Kubernetes platform. We're increasingly moving to cloud. Obviously, mobile, we're on Linux, we're on Windows. Like I said, you name it, we've got it. And, you know, as a part of my job there is to try and make sense of all of that. So in terms of my own operation, I like to think that's pretty big, probably bigger than most people outside of the sort of big tech. You know, we are managing, yeah, in excess of 10,000 software projects, we have 35,000 pipelines, CICD pipelines that do that. And, you know, we are hosting in excess of a million builds a month. So again, I'm sure we're not the biggest out there, but the scale, again, is a very big challenge for us. And I'm gonna talk more about that as well. And then my other very big challenge is that, you know, banking is one of the most highly regulated environments and industries in the world. Everything we do is scrutinized, both internally and externally. You know, we have regulators, you know, these are people who, you know, I say at the top here, we're a big tech company with a banking license. You know, that license bit's very important. It's, you know, which is a privilege. It's not a right, it's something we could lose and very keen not to. So we keep, you know, do everything we can to keep our various regulators across all these different countries and jurisdictions with different rules, very happy. To do that, pretty much everything we do is extensively audited. And again, this is a really big difference to unregulated industries. I don't just have to deliver the right software. I have to prove that I'm doing it. And, you know, that's a big part of the challenge. And, you know, to do that, we need to make sure we've got full traceability of everything that goes on. Now, we are big users of Tecton. We have been now, we started that journey about three years ago. We don't exclusively use Tecton. We still have sort of previous generations which are using a combination. Some are using Jenkins OpenShift Pipelines, again, but the Jenkins version. And then prior to that, we have, you know, sort of more sort of team city-based stuff. We're moving away from all of those. Tecton is very much our strategic play. As said, we've been doing that now for about three years. Why did we do that? I think there's a combination of the two major things, the one on the top left and the top right here. You know, the main driver was that, you know, we are a bank. We do not want to be the SolarWinds tribute act. The worst thing that could conceivably happen to us would be to be in the press with a major supply chain compromise. And, you know, Tecton, one of the most attractive features about it is that pretty much everything is ephemeral. So as opposed to something like Tecton, where you have, sorry, as opposed to Jenkins where we had very long-lived processes, masters and agents, you know, in Tecton, every task, every sort of step in your pipeline spins up in a pod, executes, and shuts down and disappears again. So that makes it very, very hard for someone to compromise it. And, you know, the next time it spins up, it comes up with a fresh copy of your immutable Docker image that it runs. And, you know, that really, you know, that makes our InfoSet team very happy. The other thing was, I put down there partly as a joke, but, you know, about four or five years ago, took a close look at Google Cloud Build, really liked it. At the time, certainly, you know, our bank was not willing to go into the cloud and, you know, they're taking a very cautious and conservative approach around that. But Tecton seemed to be the closest thing out there with most of the same concepts. You know, industry certainly seems to be moving that way. We like very much that it's Kubernetes native. You know, we are a big Kubernetes shop. As I said, we use OpenShift extensively. And so it all seems to fit pretty well. Right, so, and then along came Lightspeed. So what is Lightspeed, right? So Lightspeed is our internal CICD platform, okay? Now, I talked before about the large scale of everything that we do, and the need to keep that controlled and demonstrate that it's controlled, prove that it's controlled. Now, a tool like Tecton is great. You know, we like it. It is very flexible, as was described earlier. But we don't want it to be too flexible. You know, we need to, because if we just let everybody do their own thing, when I get that person comes on and says, right, prove to me that everybody's doing it right, that becomes very difficult, okay? So Lightspeed is a platform that we have, which sort of, it's a combination of, it sets up your pipelines for you. It controls your, it's a curated platform. So we actually provide a set of shared, curated Tecton tasks to all of our users. They don't really get to define them themselves. So at least not without a lot more oversight and governance around that, but they can use the tasks that we provide. And we also, as we're gonna see in a minute, you know, have a lot of control over the pipelines themselves. The other thing that Lightspeed does, of course, is to do all of the onboarding to all of the tools in the chain that doesn't just include Tecton. There are others, all of our security scanning tools, our Git repos, all of those things. So you can go into one place and get onboarded to the lot. But ultimately what you end up with is a joined up pipeline that from commit to production deployment, the only things that really happen through there from the commit, okay, there's a pull request. The pull request has to be merged to master. That triggers a Tecton pipeline. The Tecton pipeline will run, build it, execute automated tests, apply scanners, et cetera. And then we use another tool called Harness to handle the deployment side of things, which rolls it out to our various environments. But ultimately it's just somebody pressing next, okay? So there's no real manual intervention from after the point of a, certainly nothing that could affect the code, but no manual intervention after that pull request is pushed through. And I put it in here just for fun. So actually we do believe very much in dog fooding. Lightspeed, we build and release Lightspeed using Lightspeed, okay? Completely blew the minds of my auditors. They took them quite a while to get their heads around that. Once they did, they love it because all of the same concepts that I'm providing about how we deliver software on behalf of all of our thousands of application developers is actually used to deliver our own platform. Yeah, so the magic trick that we've got here and actually it was quite interesting hearing about the pipeline as code piece. We basically came to the same conclusion some time ago. And so we found that the original construct where you sort of define a pipeline and then you invoke the pipeline and it spins up an instance of pipeline run and that executes. But the definition of the pipeline itself was too static. One thing we really liked from the Jenkins world was the idea of the Jenkins file. So that control file is in your Git. It can be branched. You can have different versions of it, different flavors of it. So you can test different new features on it, et cetera. And ultimately your pipeline is based off of what was in that version of that commit, of that branch. We do exactly that, okay? So we have a service that we call our pipeline factory. What happens is that, you know, when you commit, a webhook fires immediately into our pipeline factory. Now, we have a very cut down YAML file, much, much, much smaller than a full tecton one. It gives us all the core things that we need from our users. We then hydrate that or inflate that into a full tecton pipeline run and we execute the pipeline run, okay? Now, why do we do that? There are a variety of reasons. Again, some of which I'll talk about in a minute. But one of the biggest ones is actually, you can see here we've got a very small, like 12 lines of YAML. Actually that tecton is great. It's very flexible. It's very logical. It's very Kubernetes native. It does suffer slightly from the same problem that a lot of people associate with Kubernetes, which is that death by YAML, you know, it's very verbose. It's very logical, but it's very verbose. We've tried to cut that down to sort of brass tacks and just the bits that we think that your average user needs. And then we take care of all the ins and outs of the tecton YAML itself. So it's very approachable. It's very quick and easy to learn. Oh, actually I did mention here, and I'll come back to you. So, you know, there is a kind of event stream out the back of tecton, I think it's related to the tecton results that we were just talking about. You know, we use that extensively for observability of what's going on. Again, I've got thousands and thousands of pipelines that are running. We need to know what's going on for a variety of reasons. So observability is key. We capture a lot of that data and use that for analysis and support. So what have been some of the challenges on the journey? I mean, there have been a number. I tried to pull out some of the headlines. Now some of these are peculiar to us and the, you know, this strange regulated environment that we work in, but some of them are not, okay? So I think the first one here, cost of entry. As I said, that tecton YAML is very, you know, it's very good, it's very flexible. It's not that easy to just get hello world working. And, you know, that can be a big challenge for a lot of people. That's why we've cut this down to our kind of, you know, domain specific language. We've, you know, we've really paired that back. We think that that helps to address, makes it much more approachable for our teams. It also gives me much more control over what ends up in that final tecton pipeline. And that's really important when it comes back to proving that everybody's following the same patterns. Everyone's doing the same things. And then on top of that, as I said before, we supply a curated list of tecton tasks and those are shared out to all of the application teams. Another one of the big challenges here is that, you know, it's again, fantastic that this thing runs on Kubernetes. We love that. You know, we like that pods spin up, do their job, go away, free up those resources that are available to go somewhere else. And one of the great strengths and one of the big attractions was the fact that we can even size the workloads for every task in a pipeline, not just a whole pipeline, but every individual task. So something like compilation, which is more CPU intensive, gets more CPU than something which is just saving something in a database. Of course, the downside of that is you then also have to try and figure out what are the right values are. And anyone who's been using Kubernetes for a period of time will know that that sizing workloads is itself quite a challenge. You're always trying to balance. You want to give it as much resource as possible for performance, but you also want to get the most out of your dollar spend. And so you want to give it as little as possible to just get it at what you want. And again, at the heart of this observation, observability is totally key. Security, again, we talked about this, we have to prove everything. Again, I like a lot of the things that both the OpenShift and Tecton give us out of the box. They give us the ephemeral builds. We can have dedicated namespaces with limited entitlements to get to them. We can control the resource usage. And again, we have this Tecton event stream off the back of it, which is really great for an audit trail. We've gone through extensive modeling with our, sorry, SISO is the city info security office. So that's our Infosec teams, worked hand in hand with them. And actually as part of that, we are introducing Tecton chains, which Gustav was just talking about a few moments ago. And one thing, again, that we've run into is that the tasks, we have these curated tasks in its initial incarnation at least, those tasks have to be replicated to every one of the namespaces. Just that job is actually quite challenging. I'm very interested in the resolvers that we were just talking about. I think that's going to help us a lot because we can centralize those and they can all just reference rather than have copies. One thing we have also observed here is that because these tasks are very short lived, they come, they spin up, the spin up time we've noticed is longer than we want it to be. We're seeing 15, 20, 30 second delays sometimes at the start of that. And I think part of that might be down to our unique environment, but whereby you don't notice an extra 15 seconds spinning up a microservice, you do whenever you step in your pipeline does that. So this is something that we're working through. Red Hat have been very supportive. We're getting through that, but I'm just mentioning it because I think this is something which is worth knowing about. And then lastly, game, one of the things that we're very hot on is disaster recovery, continuity of business. I think take time in its current state really, everything is sort of very cluster centric. We need the ability to move workloads between clusters very quickly and easily. In an ideal world, I could just low balance across different clusters. There's still a bit of state that's in there. And again, that's something I'm sure that we will work through as the product evolves. So to summarize our experience, again, it's been three years. There's been highs, lows and everything in between, but just I guess a few key points we picked up. We love the ephemeral bills, right? We love them for a security perspective. We love them from a capacity management perspective as long as you get that right and you can work out how to size those things and get them right. We, for our use case, because of the sheer scale of it, to keep things consistent and concise, we have, so we've gone with this custom DSL, your mileage may vary, your developers may hate that. Mine don't all love it, trust me, but it's a necessary evil. And if there's one thing through all of it, as is the case with any kind of distributed application microservices, Kubernetes-based stuff, just observability is absolutely everything, right? So if you're gonna do this journey, then I can only recommend that you invest heavily in observability and invest early because you will reap the benefits of that for a long time as you go. And that is the end of my presentation. I will hand back to you and hopefully I'll get a chance to chat to a few people later. Thanks, Jason. Okay, I don't know if Harry is gonna jump in, but I'm just gonna mention two things. So the first thing is, you may have seen Gerald in the chat, so Gerald is a red-hatter, he's been answering questions that you've had, so do not be shy, feel free to ask questions and some of us will answer them. Gerald is prolific, so he will probably be the one answering most of them. And then the next thing I wanna talk about is that we have Ford coming up. So we have Arthur from Ford to talk about Openshift Lifecycle Management using ARGO CD, Tecton and PAC. I'm sure you'll clarify what PAC is, maybe its policy is code. So yeah, Arthur, let's get you in here and we're looking forward to seeing your presentation. Hello, yeah, PAC would be Pipeline, so that's code, the extension of Tecton. And I work over at Ford, I am one of the platform engineering leads over on our team. So we manage a fleet of Openshift clusters, currently at about 50, we have about 2,000 unique applications running across those 50 clusters, spread across about 8,500 namespaces. And then we use both Tecton and ARGO CD to manage the lifecycle of these clusters. So with ARGO CD, we're managing about, from the ARGO CD perspective, 75 configurable apps for each of the clusters to maintain all the configuration aspects, because we do not run kubectl commands directly against our clusters under normal circumstances. But before we can talk about the clusters, we need to talk about kind of where these clusters run. So our primary hosting environment, we do bare metal, vSphere, Azure, et cetera, but our primary hosting environment is in Google. So to manage Google resources, we are also using Tecton. So we've got our Pipeline set up that way we can configure our resources in Google. And since our clusters run in Google with the recent versions of Openshift, we're able to use a concept called Workload Identity Federation, where we can take these Google identities and then put them directly into the clusters. So their Google identity is linked to Kubernetes service accounts. So there's no credentials required to talk to Google similar to the STS that previously existed for Amazon, I believe. So the Tecton Pipelines are managing Terraform configurations that manage the DCP projects that host the clusters that operate Openshift. So obviously there was a bootstrap problem. The very first time we set all this up, it was done manually, but now that we had one cluster up and running with Tecton, it can then continue to intercept more and more. So since Tecton results still isn't GA, we have some custom tooling put in all our Pipelines that basically once the Pipeline is done, it basically sends a information back to GitHub with a link to a bucket in Google that contains all the logs. So our primary mechanism for all this configuration is GitHub. So you submit a PR, Pipeline kicks off, does some checks, merge it, everything else continues. And now that we have got our GSP environment configured, we need to actually insect our clusters. So with the way the Openshift installer works is you cannot have a true declarative setup. You cannot have a true infrastructure setup, let me clarify, because the installer itself is a complex mechanism. So we have a more declarative setup where we have an ENV file, which is just the set of configurations that are stored in GitHub. And then that is then passed into a custom Pipeline that operates on the Openshift installer and then creates those environments. Similarly, we're able to thin up these clusters with no long-lived credentials. Everything is short-lived and that works great. And after the PRs run and all the checks are made and you're insured everything is good, we run the installation. So we have some business requirements at Ford, so we have to mutate the installer on the fly a little bit to adjust for those. And after the adjustments, the Pipeline itself realistically is fairly simple. You run the Openshift installer and then you're done. Once that's done, Argo CD takes over. So it's able to then configure all other aspects of the cluster. So when we take a peek at Argo CD, we use Pipeline as code with Argo CD. We use just regular tecton to inset the cluster. And there's a couple of differences there. With regular tecton, you get a lot more control and flexibility around how those Pipelines are managed, particularly around authing and control. With Pipeline as code, there's some future enhancements, those same functionality will be brought over, but due to certain risk aspects, we have those two differences split off. So then with Argo CD, we're able to manage essentially every aspect of the cluster. So the versions of the clusters are managed through Argo CD, certificates, ingress, et cetera. And the way we operate Argo CD is on a distributed model. So we have an instance of Argo CD running on each one of our clusters. We kind of went with that route over the centralized model for a few reasons. One being, we don't need to have cluster admin tokens for every cluster, because then now we have to manage those tokens and that's a security vector that if those tokens leak, then we have to deal with. So with running distributed Argo CD, we're able to not have to worry about the managing of those secrets and having that attack vector. And then also with a little bit of scalability, we're able to at least tune Argo CD on a per cluster basis to fit the size of those clusters. One thing we really love about Argo CD is the plugin mechanism, because the plugin mechanism allows us to enhance Argo CD with our own logic that we need. So we'll circle back to the logic. So how Argo CD is great, but all it really does is take YAMLs from one place and put it on the cluster. So we need to wait to make sure those YAMLs that get into our GitHub to begin with are good. So we've got pipelines set up through pipeline as code. And what they do is there's some basic syntax validation on those YAMLs. All their secrets are stored in Google Secrets Manager and then they are referenced using the Argo CD Vault plugin. And that is a binary that runs anywhere. So in our pipeline, it grabs those references from the YAMLs, replaces them, and then makes sure those, they're real. And then we have got other custom tasks, custom checks that run. So for example, a task that is modifying the certificates on the cluster, we wanna make sure that those certificates belong to that cluster. They're not expired. They're not missing the private key or there's not, part of the chain isn't missing. And then we also run kube form, kube conform against our YAMLs, which is running against the Kubernetes specifications for those CRDs. Not all the CRDs, such as the Techon CRDs, have a kube spec, proper Kubernetes spec specifications, but hopefully in due time, the rest of those CRDs will catch up and we can then validate those also with kube conform. And then with those custom plugins, we're able to then have Argo CD also run the same tests that our pipelines do. So because Argo CD is pulling the secrets directly from Secrets Manager to apply to our clusters, we need to make sure that if for some reason, the secret inside Google Secrets Manager was changed, that that does not get applied to the cluster if it's invalid for whatever reason. So we've got custom plugins we wrote to, essentially run the same checks and then stop Argo CD from continuing in those scenarios. And similarly, you also got a bootstrap problem, particularly with IPI installs, because some of the components have randomized names. So there's some additional logic being done inside Argo CD to pull those randomized values at runtime and swap them in. So then here's kind of how we've structured how we use Argo CD. So we've got a similar to, we just customize, first of all, to actually manage all our YAMLs. We use Helm in a few places where customize doesn't make sense, but for the most part, we're using customize. So we've got our base, which contains everything that a particular application would need. And then, so for example, on the right, this is a screenshot of configuring Argo CD on one of our clusters. So we've got the base contains the basic information that's required for all clusters. Then we go into our components section. We've got all our apps listed under components. You could put that in your base realistically, but we haven't split under components and then we've got all our repos for all our GitHub repos listed there. And then we kind of take it a little step further. So we got different versions of Argo CD version control and these are all unbased. So for 192, we've got all the version configurations for 192 and then all of the YAMLs associated with that. In particular with 192 to 110, there was a CRD change. So in this case, it actually was nice to have. So when not 10.0, there are basically a duplicated version of all the configurations in 192, but with the updated CRD configurations. And then each one of our clusters get pointed to a particular version. And then we've got our test file as well, which basically just runs in Argo CD's case, just a build and a secrets check. So we talked about the moodstrap problem. The issue is some of the overlays require of a field that do not exist before the cluster is built. So Argo CD will, additional RBAC has been granted to those components to pull that information from the cluster. Another key problem is the cluster UID, which is also randomized, but also not immutable. So what we have to do, or if you didn't know, it is not immutable in the cluster UID. So you could delete your cluster's unique identifier and it'd be gone. So we make a back of that just in case. And that way we can apply that dynamically without having to worry about it being overwritten. And it can be overwritten because the location that that UID is actually stored is in the cluster version, which we also then mutate from GitHub because that's how we update the clusters. And another component of OpenShift, which is not really GitOps friendly is the config map for the monitoring stack. Because it's a config map, not a CRD and you can't run customize commands against it, for example, or if you have multiple controllers mutating the same config map, such as open cluster management, then you end up in a kind of weird scenario where it's not following traditional Kubernetes practices. And it's just a YAML in a config map anyways. So we wrote some custom tooling to effectively clone that config map into a CRD. And all the operator does is turn that CRD into a config map. So then we mutate the CRD and we can control all of that through Git and then the operator just handles a conversion to a config map in that sense. And that's kind of what we do for a platform perspective for configuring all of this. And what we're moving towards is managing our tenant namespaces. Previously we had an API that would write directly to the clusters, the namespace, resource quotas, et cetera. But we're also moving that towards Git. So what we're doing there is it's a very similar pipeline towards this that we do for the platform with a few additional checks with like policy enforcement. So it's a Git repo. All our application teams have their namespaces listed in there and then they're able to make pull requests to make changes to their namespaces. And then we use the policy enforcement to make sure that any changes they're making are fine changes. So then right at the PR stage that we'll get either accepted or denied. And then we'll take a look, approve it and merge it as needed. So then we're using that to configure all of our namespaces and how we onboard and users to the clusters. And then kind of what we're looking forward to in the future of OpenShift is, since we're operating in GCP, we currently operate in all our clusters in a single project, but with some of the new enhancements coming, we're able to now finally split that off all into each of their own projects per cluster. And from the Argos CD side, we're hoping to start offering Argos CD as a service to our tenants so they can deploy their own applications with Argos CD. So that would be an interesting endeavor to do at scale. And that would be everything that I have. All right. Thanks Arthur. That's pretty good. There's tons of questions in the chat. I was just monitoring. Some of them are going towards Jason, but I'm guessing they're gonna come in for you too in a moment. It's just, it's been pretty busy. So that's great. I think people are getting a lot of value out of this. And thanks for your presentation. Carlos, Andrew, welcome. And we're looking forward to hearing about your testing in the upstream with Argos CD. Hey, you're here as well? Hello. Loud and clear, loud and clear. Sounds great. Okay, we're both. Okay. Hi everyone. Let's get started. So we are here from AWS. My name is Carlos Santana. I'm a senior specialist solutions architect with Kubernetes, anything Kubernetes on AWS? Go ahead, Andrew. Yeah, my name is Andrew Lee. I'm a senior prototyping architect with AWS and Strategics. And we're here to present about Argos CD scalability testing in the upstream. So I'm gonna advance. And here's our agenda. So our agenda is, what is Argos CD? I think everyone knows what Argos CD is. But why should we care? And then we have motivations and goals for the scalability SIG that we started or co-started. And then our approach to scalability testing, what we learned from the scalability testing and then conclusions and future activities. So now I'll throw it over to Carlos. Okay. So to talk a little bit about Argos CD and the last presentation talk about what was Argos CD, but it's a CNCF graduated project, including Argo has many projects that are Argo workflows, Argo events, Argo rollouts, Argos CD. Today we're focusing more on Argos CD. And one of the reasons that me and Andrew and others in AWS have joined the open source community in Argos CD is because we're working with a lot of end users that are leveraging Argos CD and encountering either issues or lack of documentation or have a lot of questions about how to best, how to find best practices to use Argos CD. Some of the end users that we have worked with are large organizations that are currently adopting Argos CD for various reasons from using it for the native way of doing Argos CD RBAC with multi-cluster environments where they have one management cluster and have that difference both clusters in Kubernetes to the ability to have that GUI. I think everybody is very familiar with the Argos CD UI. They fall in love because as they get started in the Kubernetes config management having a visual representation of what's deploying the cluster is very appealing. So either for multiple reasons, the large organizations are adopting Argos CD. So we wanted to get involved in the community to see how we can help those end users take advantage of Argos CD and build a community around it. Another aspect of it is Argos CD is becoming the multi-cluster, multi-tenants. As you saw in the last presentation talking about teams and projects and namespaces, it's becoming a multi-cluster, multi-tenant solutions to build internal developer platforms, IDPs. And if you want to learn more about how Argos CD and Argos workflow is being used in IDPs, you can check a project called Canoo, C-N-O-E.io. It's a recent effort by a group of these large organizations of end users coming together and selecting a set of technology of stacks to build IDPs. So Argos CD and Argos workflow is one of those main components. Other ones are backstage and other CNCF projects. So they're coming together. So if you want to join that other community, you're welcome to. So that's why we are involved with Argos CD in terms of how do we push its boundary to cover use cases in the enterprise where we're seeing more adoption and name users are asking for help in terms of resiliency, observability, security on using Argos CD property with Kubernetes. Next slide. And talking about motivations and goals, I started asking in the Argos CD community, I was a heavy user of Argos CD, I was encountering myself problems of finding good documentation or good patterns or good examples on how to do scalability testing because we wanted to see, observe and do testing on these large environments, mostly that help and spoke environment where you have a small number of Argos CD controllers managing a large set of applications. So we wanted to find the bottlenecks and prototype like that's why Andrew is involved. Prototype, what are the different set of configurations from a large number of clusters to a large number of applications? Does Argos CD can scale? And in terms of scaling Kubernetes controllers is something that scales differently from your stateless application. And with that, the SIG, we reached out to the community and the community was very accepted to our proposal to get together between different companies including Red Hat, Acuity and other main maintainers that into it, Codefresh, all of them are very interesting on solving these and providing assets to the community in terms of blogs, documentation, tooling for enterprise to do their own testing inside their Kubernetes environments to get those possibility metrics and then act on them and how to tune them. So we recently published a blog post and we're working on a second one and also working on enhancing the performance. But the idea is to provide kind of that operator manual information for end users and admins of these clusters. We have Kubernetes admins and these admins actually need help with this type of tuning. And the last thing that is to introduce, we started with the benchmark just like understanding what are the problems? How far can we get with the tuning the parameters and Andrew's going to cover like, what parameters to tune based on which use case and then what are the trade-offs of this tuning? When you come into tuning for performance you're doing a trade-off. So Andrew is going to talk about that and then how we intend to contribute to solve this bottlenecks next. So Andrew take over. Yeah, so our approach to scalability testing is we stood up environment where we had a Monorico Git repo. We stood up Argo CD and they're actually all running on Amazon EKS. So we developed a test environment with over 10,000 applications and 97 EKS clusters. We actually viewed observability through key metrics in Grafana. So there is actually a dashboard that's provided by the community in Argo CD that has all the key metrics that you need to kind of look into what is going on in Argo CD. The other thing that we wanted to make sure that we document and test is all the key scalability parameters. I know when I started working on our scalability testing was there was not a lot of information or documentation on these parameters that you can tune. A lot of the information would either be buried in the documentation or I would have to go through GitHub issues and find some of these parameters. So putting them all in one place, it's actually in the blog post but also updating the documentation with Argo CD is one of the goals that we want to have from the scalability testing. So let's move on. So some of the key metrics that we have as mentioned before, there's a Grafana dashboard that's provided by the community provided in the Argo Project GitHub. You can find that there. And that's what I use generally for looking in and performing the scalability testing. So making changes to Argo CD parameters and then running a sync test and then viewing the performance through the Grafana dashboard. So the key metrics that we use for our scalability testing was sync time. So sync time is actually just, when we have 10,000 applications, they're all connected to one Git repo. We make a change to that Git repo. And because we have, I think it's auto sync on, Argo CD will see that there's a change in the upstream Git repo and then just push all those changes down into the target clusters. So that's kind of like how we were able to do sync test and then we would view from the sync status when we have out of sync applications to when we don't have any more out of sync applications. So that's kind of like where we would determine our sync time. WorkCubeDeb, we actually use WorkCubeDeb to see there's actually two queues in Argo CD that is for kind of like the main operations that it's doing. There's a status and operations. So what the status queue does is that the status is actually checking first the upstream Git repo and also reconciling that with the downstream applications that are deployed to your clusters. And then operations, the operation processing queue is actually where if there's any changes that need to be made to the downstream clusters, the operations queue is where all these operations would be occurring. So making the changes to the downstream resources. The last thing that we actually were viewing was the CPU usage. And so what we found was that Argo CD was at least the app controller is more CPU bound. And well, we didn't really see any memory usage issues that when we were doing our scalability testing. But there has been talk in the community about memory usage being really high with Argo CD. And so in our next round of testing and the next blog post, we will post memory usage statistics. So what did we learn from the scalability testing? So one of the first things that we actually looked at was the reconciliation timeout. So there was actually a really great blog post by IBM earlier before we started the scalability testing and they keyed in on this reconciliation timeout. And so what is the reconciliation timeout? It's actually the interval which the status processors would be checking to get repo and reconciling that with the downstream clusters. So if you set this too high, what happens is that you would be going and checking to get repo more often and then trying to reconcile that with the downstream environment. And then you would basically get into a constant state of this reconciliation, which can cause some problems like CPU usage would be higher than it needs to be. And what it does, how it affects the sync times, it doesn't affect the sync times because this is more on, just basically on the reconciliation performance and what to watch for, which is what I mentioned before, which is it could cause overlapping status intervals. And what I mean by that is if we go back a slide, you would see the yellow graph here is actually the reconciliation queue. It would basically be pinned at 10,000 at all times because the reconciliation timeout is too aggressive. So we do suggest to users that they check their reconciliation timeout. If they're seeing something where their reconciliation timeout is at the maximum at all times, then they should check this out. The next thing that we want to look at is the status and operation processors. So there's actually a parameter that you can set for each status processors or operation process. And what this determines is actually the number of concurrent operations that can be run for either the status or operation processor. It determines the parallelism of the app controller. What we found was that it didn't have any effect on our sync times during scalability testing. And what we wanted to do was it probably requires further investigation. My hunch is that it had to do with the type of applications that we're using. We're actually just using applications that are very simple, two kilobyte config maps. So in that way, it could be that the, we're never able to load up the app controller to cause making these settings actually actually make an effect on our sync time. So I think we need to look at this in part two of our scalability testing. And I will go over it there. And then QPS and client QPS. Now, what we found was that there's a client go Kubernetes client that is making all these calls to both the Argo CD cluster itself and also to the downstream cluster. And it actually was what regulates how many Kubernetes API calls you can make. And changing these actually had the greatest effect if we don't go into sharding. So we still have only one application controller. We were able to changing these settings, these QPS and client QPS. We were able to change decrease sync times from 41 minutes all the way down to 12 minutes. So it actually had a big effect on it. These type of settings is actually, I don't think it's in the documentation yet, but if you kind of go through to GitHub issues, you can see them talking about it slightly. Now what could happen if you change the QPS or client QPS settings? You could overload your Kubernetes control plane API. And that's why these settings are important. Let's say you can't shard your Argo CD deployment out to different clusters because it's sharding by cluster, then you would want to change this setting but you need to monitor your Kubernetes control plane API. And then the last setting that we kind of played around with is our controller sharding. So what this does is it takes the application controller and actually shards the controller by clusters. So you would have, let's just say you have five shards. Each of the shards would be managing a subset of all the clusters that you're managing in Argo CD. And from changing this setting by actually sharding, we were able to decrease sync times from 41 minutes down to eight minutes. So sharding, the Argo CD application controller had a big effect on performance, but what it requires is that you distribute the apps across clusters. And we're actually gonna go into, in the next round of testing, we're gonna go into how those sharding algorithms actually work. There's actually some sharding algorithms that determine how many clusters go to each shard and we could actually see some imbalance if that happens also. So that's kind of like one of the drawbacks of sharding. So our conclusions and future activities. So what we found was that sharding is the key component of scaling Argo CD. And so if you're not, it kind of forces you to have more clusters if you have a lot of applications. So if you have 10,000 applications, you don't want to have these 10,000 applications going to one cluster, because then you can't take advantage of sharding. So it kind of forces you if you're using Argo CD to kind of split up all your applications and going down, being sent to multiple clusters and so that you're able to take advantage of the sharding features of Argo CD. And what we found was in the beginning the sharding algorithm, there's only a single sharding algorithm called legacy, it's now called legacy. What it did was it actually did not have perfect balance across the shards. So what we were seeing from the legacy sharding algorithm was that some shards will be handling way more shards than another shard, way more clusters than another shard. And that would cause higher CPU usage on one particular Argo CD application controller shard. And causing imbalance. And I know Red Hat has introduced a new sharding algorithm kind of like a framework so that other people can contribute their own sharding algorithms. And they also included with that change, they included a round robin sharding algorithm. And through testing I've done, the round robin is able to keep balance between plus or minus one cluster. So basically the number of clusters per shard is equal across all the different shards and that helps with scalability. The only issue with the current sharding algorithms that we have today on the sharding method is that it only shards by cluster and it doesn't know about the number of applications you have in those clusters. So in that way, if you have a cluster with 10,000 apps and you have another cluster with 1,000 apps, the sharding algorithm would just kind of give it each of them to one into one shard and you can have imbalance that way. And so kind of like what we're looking at is we're looking at introducing new sharding algorithms, one that is going to shard by the number of apps. So it kind of takes into account the number of applications on each cluster. And so say you have 1,000 apps on one cluster, it would go to a shard and then every other shard, every other cluster that comes in would be sent to shards with less apps. So that's kind of taking the apps into account when you're making these sharding decisions. The other thing that we want to do is work with Red Hat on exploring re-architecting the app controller because as currently the application controller has to be sharded by clusters. But if possible, if we're able to separate that from, instead of sharding by clusters, but we actually shard by apps natively, then I think we can actually get around some of the issues that we're having with sharding and scalability. And then I'll turn that over to Carlos. Yeah, so thank you for your time. And this is the core to action to the community. Argo CD is like I said, it's a CNCF project. It's an open source project. So if you contribute to CNCF, you get the benefits of joining a large community of different practitioners. We have a sick scalability interest group. It went through the proposal process of the Argo project. So we submitted the proposal between different entities, organizations, and they got accepted. We meet at the second and the fourth window of the month. There's an agenda doc that if you want to go back of what are the issues, what are the topics that we're discussing, you can access that. We have a CNCF Slack. There's multiple, I encourage you to join the, that's the type of there's Argo sick scalability, but there's also Argo dash CD and others Argo CD Slack channels. And you can join them and ask questions. And also one of the things that I have said, gain from joining Slack is like learning what end users are doing. So for example, last week, somebody was asking help because she had 10 clusters, but it had 10,000 apps. So it was even like province with scalability and that's where you can find people to help you with your problem and you can help others. There's the other links to our, the Github repo which have an example and a pull request on the benchmark tooling that our Andrew developed from the CLI to be to easily take these benchmarks and run them in your Argo CD environments with Kubernetes. So I want to thank the red hat for opportunity to let the upstream team here to join and presenting our findings in benchmark. So if you want to join, go ahead and join in Slack or join our meetings. Yep, thank you, red hat and everyone. Fantastic. Our host coming back, okay, Harriet. Yes, hello, thank you so much. And I will grab my other hosts as well. And if everyone could remember to unmute this time, that would be lovely. All right, thank you so much to all of our presenters. This has been absolutely fantastic.