 I'd like to thank everyone joining us. Welcome to today's CNCF webinar, Argo Real Enterprise Scale with K8s. Am I saying that right, y'all? Sorry, it's early. Sounds good. Okay. I'm Libby Schultz. I'll be moderating today's webinar. We would like to welcome our presenters today, Al Kimner, Principal Software Engineer and Architect at New Relic, Daniel Jimbal, Staff Engineer at New Relic and Caleb Trotten, Product Manager, Telemetry, Data Platform at New Relic. A few housekeeping items before we get started. During the webinar, you are not able to talk as an attendee. There is a Q&A box at the bottom of your screen. Please feel free to drop your questions in there and we'll get to as many as we can at the end. This is an official webinar of the CNCF and is such a subject to the CNCF Code of Conduct. Please do not add anything to the chat or questions that would be in violation of that Code of Conduct and be respectful of all of your fellow participants and presenters. Please also note that the recording and slides will be posted later today to the CNCF webinar page at CNCF.io slash webinars. And with that, I will hand it over to you guys. All right, hello and good morning. Good afternoon or good evening, everyone. Welcome. Before we get started with the presentation, I wanna have a word from our legal team. So this is our Safe Harbor slide. We can move on. Next slide, please. I'm Al Kimner, Principal Engineer, Architect. I help engineering teams build software and systems that's simple to maintain and scale. My favorite hobby is scuba diving. Hey everybody, I'm Caleb Trotten. I have been an engineer at New Relic for a while and I'm now a product manager with the Telemetry Data Platform focusing on CINCD. When we're not in a global pandemic in my free time, I like to spend it in a bowling alley. Hi everybody, I'm Daniel Kimbell and I am a Staff Engineer at New Relic. I engage with teams on a temporary basis to help them with their projects. And when I'm not in front of a keyboard, I enjoy planning and trying to learn. Cool, and we're excited and grateful to have the opportunity to talk to you today about Argo. We have a packed agenda for our presentation day with two compelling demos. Going to give you an overview of New Relic's ingestion, streaming and storage architecture which should set the stage of what problems we have and how Argo fits into that. I'll cover our use of Argo CD and the scale at which we're using it. Caleb will walk us through how Argo ROTS gives us a better experience than a Kubernetes rolling update for a deployment and showing a demo of the deployment with an automated canary analysis. After that, we'll cover additional needs we have with orchestration at scale and how Argo workflow helps us there. Daniel's going to cover how we use Terraform and Open Policy Agent with an Argo workflow to safely roll out infrastructure as code changes. Our main objectives today are that you'll be able to understand how to safely implement continuous delivery at scale for both Kubernetes resources and infrastructure as code pipelines. Here's a high level diagram of the telemetry data platform at New Relic. The parts of the diagram are made up of approximately a few hundred different microservices. This part of the platform supports ingesting many petabytes of data a day from all over the world with millisecond response times for the querying of the ingested data. I point out both the amount of data we ingest and the response time on queries for the ingested data because all these services are dealing with an incredible amount of throughput and we are also continuously deploying updates all the time. This chart shows the data growth over time for the period that engineering teams on the telemetry data platform migrated these services to Kubernetes which brings us to how does New Relic do continuous delivery in a distributed system that ingests such a mass amount of data safely without interruption. This is where Argo CD enters the picture. First, some history about Argo. Argo was created in 2017 at AppLatix which was acquired by Intuit in January of 2018 who open sourced Argo. A few months later, BlackRock contributed Argo events to the Argo project. Argo then joined the CNCF in April of 2020. So why Argo? Well, at New Relic, we are constantly evolving our systems along with our internal engineering processes and operations. One of those changes was introducing Kubernetes. Kubernetes was a good fit for us because we have been multicloud for many years and our services exist across multiple public cloud providers and private data centers in many different regions. Kubernetes was a natural fit. And since all of our Argo tools are implemented as controllers and custom resources in Kubernetes, Argo seemed like a good fit too which led us to look at Argo CD. This is a long list of features that made it compelling for us to pick Argo CD for our continuous delivery needs. And this is not even the full list. I just ran out of room on the slide. One of the main drivers was for us to have the ability to easily manage and deploy multiple Kubernetes clusters with a GitOps workflow. We have lots of Kubernetes clusters and we're creating more and more all the time. So Argo CD was the easiest way to get up and running and checked all the boxes for all the things we wanted. All in all, we've been extremely pleased with it. A bonus as of last week, Argo CD version 1.8 just shipped and added horizontal controls, controller scaling. So now we can have even more Kubernetes. Working at a company that focuses on observability, I would not feel right without sharing some stats about our current Argo CD instance. We are at approximately 3,000 applications and over 10,000 Kubernetes deployments in the last month. The Kubernetes clusters are very big with most over 1,000 nodes. We've segmented our services into different workloads and workloads are assigned to different size Kubernetes clusters. I'm pointing all this out because we have lots of variables with dozens of internal engineering teams and a whole bunch of services and lots of changes. Argo CD makes the rate of change possible but we're still missing the safe part of this continuous delivery story. And now Caleb is gonna talk to you about that. Take it away Caleb. Thank you Al. So I'm gonna talk to everybody today about Argo rollouts and how that helps us with the safety of so many deployments happening every day. So with hundreds if not thousands of deployments a day, we need a way to make sure that changes rollouts safely and don't require an extreme amount of effort from engineers to make sure that a deployment works right. For this, we like to use a Canary deploy strategy. If you're not familiar with the Canary deploy strategy that involves deploying one or some small number of instances with a new change, waiting some amount of time to make sure that that new change is safe. And then once you've determined that it is safe rolling that change out to the rest of your instances. This can totally be done manually in terms of verifying whether the Canary is done safe. But we don't wanna have a human involved for 30 or 60 minutes every time a deploy is made when we're making hundreds of them a day. So something that we were really looking for was automated Canary analysis. Automated Canary analysis allows you to query some metric provider and use that to determine whether the deployment of your Canary is safe. And then in the case that it detects that there's something wrong with the metrics can automate a rollback. So we looked at Argo rollouts for this because the standard Kubernetes deployment resource doesn't provide most of this stuff that you see here. The rolling update strategy allows you to roll out things slowly one at a time as long as probes as long as the probe conditions are met but not really that advanced use case of stopping, pausing, running analysis in more granular steps. So Argo rollouts does provide this stuff for us. And I'm gonna walk you through now some of the pieces of Argo rollouts and why it was compelling. First I wanna talk about the custom resources that Argo rollouts provides. We're gonna start with some of the lower level stuff which are the experiments and analysis templates and analysis runs. And then I'm gonna cover the rollout resource which basically encompasses all of these things into one drop-in deployment replacement. And then I'm gonna talk a little bit about all the metric providers that can be used with analysis templates to do the automated canary analysis. And then finally I'm gonna show a demo of all of that coming together. So let's talk about experiments first. An experiment at its core creates two different replica sets. Each of those replica sets has its own pod spec template. So you can deploy something with as small of a change as a different version in your Docker image or you could have two wildly different pod specs. It's up to you. What you do with those pods with just an experiment is up to you, they'll be run and you can go poke them however you see fit but where it gets a little more interesting is when you pair an experiment with an analysis template to be run against those pods. So an analysis template describes through metric providers what to query and what a successful query looks like. From there, the experiment will use the analysis template and initiate an analysis run. An analysis run is really just an instance of that analysis template with arguments filled in, typically with information about one or more of those pods in your stable or canary replica set. And finally, what most folks actually interact with in Argo rollouts is the rollout resource. Like I said, this is a drop-in replacement for deployment. If you didn't use any of Argo rollouts advanced features and didn't specify like a canary strategy, you could just use it like a drop-in deployment replacement. But you're not going to get all the goodies in it until you specify a special strategy like canary or blue-green deployments. Argo rollout supports blue-green deploys, which I'm not going to cover today, but I encourage you to go Google it and see if that is something that you're interested in as well. So on the right here, we have a really basic example, mostly taken from the Argo rollout stocks, showing that most of the spec is just like a deployment. However, what we see here is an alternate strategy with canary. In this example, what we're doing is deploying 20% of the instances first, pausing for five minutes and then running a one-time analysis using the success rate analysis template and passing in an argument with the service name of the service being deployed. The analysis type that's used here of running a one-time analysis, at the end of that pause duration is one way that Argo rollouts lets you configure analysis to run, but there are other ways, including running analysis in the background the entire time that your canary steps are progressing. Last before I get to the demo, I want to talk about all the different metric providers that you can specify in an analysis template. So first, you can run a Kubernetes job. This would instantiate a Kubernetes job on the cluster and just look for the exit status of that job. If it exits zero, your job is successful. If it's non-zero, it failed. You can fire off HTTP requests against any endpoint and as long as it returns JSON, you can specify what JSON path and what values you're looking at in that JSON payload and which ones you consider failure and success. You can also integrate with a tool called Kayanta. Kayanta is traditionally a piece of the Spinnaker ecosystem, although you can run it standalone. The main thing that Kayanta does is perform a different type of statistical analysis on the two different groups of instances that you're comparing called the Mann-Whitney analysis. Google that if you want to learn more about statistics and this can be integrated with and Kayanta has its own set of metric providers that you can configure there too. So also you can directly query different metric providers, Prometheus, if you have PromQL queries that you want to run, as well as a number of commercial providers that includes ourselves, New Relic, DataDog, and Wavefront. The demo I'm about to show you is going to be using New Relic as the metric provider because that's what we use here. So let me pop out of the presentation and start showing you some stuff. I'm gonna start first with a rollout resource. So what I'm gonna demo is I have an Argo CD application with two resources in it, a rollout and an analysis template. This is the rollout. You can see we have five replicas and then we're going to have this canary deploy strategy where we deploy 20% of those resources, aka one instance. We're going to pause for only 20 seconds. This is a demo on a webinar and I wanna keep it a bit snappy. And then we're going to run a one-time analysis against the error rate of the application. We're gonna be passing in an application name, which is webinar demo app, the canary hash. This is something provided by Argo rollouts. It is the replica set identifier segment of the pod's name and the latest value here basically says, give me the pod template hash from the canary group. And this is just another argument into our analysis template saying how long to run our query for in our metric provider, so against New Relic. The rest of this spec looks exactly like a typical deployment spec. We have an image, we have environment variables. Most of these environment variables are just hooking up the New Relic agent. And then we have one environment variable that takes the rollout pod template hash and using Kubernetes downward API makes that available as a environment variable in our container, which we are then using to add to the instrumentation by the agent so that we can pick up in our transactions whether a given pod belongs to the canary group or to the stable group. So this error rate analysis template that we're referencing here, it takes four arguments. It takes application name, the canary hash. We saw that in our rollout. We have this since, this is defaulting to one minute. So any rollout using this analysis template for any of these arguments that have a value there, that values the default and so you don't have to pass it in. For these arguments that don't have a default you're required to pass in a value from your rollout. So we're using 20 seconds as the since here. The error threshold we didn't pass in, we're using the default that is 1.0 which is a 1% error rate threshold. We're specifying a failure condition which is that the error rate is greater than or equal to the threshold. So this will fail if the error rate goes above 1%. And then the query that we're giving to New Relic is the shorthand of it is what is the error rate of this application with this pod template hash. So in our example, we are saying what is the error rate of the canary pod that we've deployed? So let me jump over here to Argo CD for a second to this application. I have this rollout running already. I am going to make some changes to it now and I would caveat I would typically do this in a more getups fashion, but again, this is a demo and I wanna keep it snappy. So we're just going to edit the manifest directly to simulate the new deployment. I'm gonna deploy release four and we're gonna see what Argo rollouts does with this. You'll see first that it spun up a new replica set with one pod and scaled down the stable replica set to four pods. So we still have five pods total running. In a second, what you're gonna see, here we go, is an analysis run was executed and very quickly we see that the canary replica set has been scaled up to five pods. The old replica set is scaling down to zero. So if we take a look at this analysis run, it's all green, so you're not gonna see a ton of information up front, but if you dig into the manifest, you'll notice that we have an error rate of zero at this time. So let's go the other way. Let's deploy something bad. So I have a version of this application that boots up completely fine, but there's a bug in it that causes all of the background processing that it does to error. So let's deploy that version. Again, just like last time, we get one pod on a new replica set here, our old replica set scales down and I'm gonna show you something while we're waiting here, which is, this is a new relic and I have a query here that is showing basically the same thing, the error rate for this application and we're already seeing that this has just recently spiked up. If I jump back to Argo CD, you're gonna see that another analysis run occurred and the newer replica set rev six because the analysis run failed scaled down so it automatically rolled back and the previous stable replica set scaled back up to its full five instances. If we look at the analysis run, we get some events, the analysis failed and specifically the metric error rate failed. And if we look at some of the data behind the scenes, we have a full 100% error rate, of course, we wanna roll it back. So that's the basic demo of how Argo rollouts works. Again, this kind of metric analysis, automated analysis is really important to us with the scale and the sheer number of deployments that we're doing in a day. Again, hundreds if not thousands, that's not the kind of time and attention that we want to force engineers to pay. This is really allowing us to continue to move fast while making changes in a safe manner. So I'm gonna kick it back to Al now to talk a little bit more about this scale and how we orchestrate changes to it. Cool, thanks, Caleb. That's a pretty compelling demo. I hope everyone, this shows you exactly how to implement safe continuous delivery using Argo rollouts and how we do it too. In this section, I'm gonna dive into the additional needs we had with orchestration at scale. So there's a few issues with orchestrating at this scale that New Relic has. How do you safely make changes? How do you add capacity? And how do you isolate failures? Common approach is to scale out deployments to other regions, but this doesn't work if your applications are sensitive to latency and you need to be close to where your customers are. It might seem straightforward to just keep scaling out your Kubernetes clusters by adding more nodes, but that's not actually a good practice up to a point. Our clusters are already thousands of nodes, so it seems like we need another mechanism. We need to create some boundaries to limit failures in our systems. This high scalability post highlights the next set of changes we introduced, which is a cellular architecture. We needed to paralyze and isolate by sharding our dataset. Are the slides moving forward? There we go. Next slide. We definitely have a need to support incremental capacity, which cell architecture does very nicely. Being able to isolate, so one cell does not impact another cell is a huge benefit to our operations. This allows us to continue to deploy changes to a small subset of our cells without impacting all cells. Incorporating a cell architecture into the automated canary analysis deployment that Caleb shows means we can have a really high confidence our changes are not causing an issue. Not only are we doing canary analysis inside of an application deployment, it's also now inside of a cell that's isolated from our whole environment. So the telemetry data platform looks like this inside of a region where we can just keep adding cells as we need more capacity and isolation. And we end up with N number of cells. This architecture can be applied to any number of applications. You just have to look at how you route and shard your dataset. Next slide. This is the same data growth chart I shared before, but now it's faceted by cells. The large blue area at the bottom, that cell now serves a significantly smaller amount of overall capacity. There's also a few interesting visuals in this chart. In the top left, the purple and green cells actually taper off and disappear completely. In the middle of the chart, a whole bunch more cells start appearing. This is where we started using Argo workflows for our orchestration of cell builds. The amount of orchestration at this scale requires us to have a flexible orchestration systems that many teams can interact with and that's where Argo workflows comes in. Argo workflows is perfect for this. In the system, in the systems we just talked about moving to cell architecture, we have approximately 20 teams involved. Each team needs to deploy their services and most teams need a combination of steps like creating infrastructure resources like databases and S3 buckets and other things via infrastructure as code pipelines. They also then need to deploy their applications to Kubernetes that depend on those resources. And we want to automate all this and we want it to be safe and to happen continuously. For each team creates one or many workflows. However, they like and the different steps they need to get their service deployed to a target cell and it really boils down to since a step is a container, it gives them a really powerful abstraction that's easy to maintain. Next slide. One of the nice features in Argo workflows is that the workflow can call another workflow or also workflow template. So here we created a cell build workflow that each team has added all their own workflows to. Again, the team workflows are everything they need to deploy their services to a cell and we really stress to the teams that all the steps are out of potent. So we can rerun the workflows safely. This parent workflow here is actually a directed, a cyclic graph or DAG. Sorry for the tiny image, but the point of showing you this was so that some serial steps happen at the beginning and then some parallel steps happen later on as the dependencies are resolved in the DAG. The steps at the beginning are actually an Argo workflow template calling Terraform and then we start deploying Kubernetes resources using Argo CDN Argo rollouts. So I hope I've done a good job explaining our orchestration on scale and how Argo workflows helps us. What I haven't explained is how to safely run auto-applies of Terraform, which Daniel is going to talk us through and how that works and show us a cool demo too. Take it away, Daniel. I think I need to stop sharing. Yes, I'm not able to take this screen. Thank you. I'm going to talk about how we have implemented our Terraform pipeline using Argo workflows. When we start with our proof of concept to integrate our existing Terraform code into Argo, we had some requirements to accomplish. We had to use Argo workflows to run Terraform. Every step had to be important. That way it can be run multiple times if needed without affecting the current infrastructure. The solution has to be generic for all teams. That way it's easy to look for them and every team avoids reinventing the wheel over and over. And we had to make sure that Terraform can run without human interaction and trust in what is going to be applied. Argo workflows help us because Argo workflow is able to trigger another workflow. We can use artifacts between steps to pass data, for example. And Argo workflows can be launched with different inputs such as soul naming or place. Then Argo workflows runs Docker image under the hood. We have to create a new image for us to fit our needs using Terraform, TFM, OPA and concepts. Also the Docker image use different inputs and it doesn't require more interaction. And Terraform, we had already some existing Terraform code where we weren't previously creating our infrastructure with another pipeline, but we had to do a few changes to make it even better. We have to switch to Terraform work spaces so we don't have to duplicate the code for every cell when the code is always the same and it only changes different variables. And if we need to overwrite some values for a specific cell, we can specify them on a DS bar file. The Terraform work spaces get straight automatically if it doesn't exist as if it hadn't for new cells. Then the open policy agent covers the needs in making sure that the Terraform code that we're applying won't do any undesirable changes such as deleting a code in this class, for example. It uses a real query language where things can specify that are acceptance policies. And if Terraform plan doesn't pass the OPA, the OPI policies, the process is canceled and it exists without error. I'm gonna show, I'm gonna click demo. So this is a, I will go close interface. And here I have some Terraform code. What I'm gonna try to show is like, we are gonna create, or like we are simulating, like we are creating a Kubernetes cluster. Then we are deploying some apps into that cluster. And in that only we're creating an S3 bucket. If we go to the main queue, I'm just using a resource because I think we wanna spend time creating a new cluster right now. But the interesting part here is this Terraform real file, which is the one that allows us to make sure that we're applying changes in a safe way, for example, we have this new resource where we are setting different weights. If the plan is gonna delete another resource, it says that it's gonna have a hundred points. If it creates a new one, it's gonna have 10 points. And if modify one, it's gonna add one point. We see that our plus radius is 30. So if the overall calculation of all the resources are over 30, it's gonna be canceled automatically in the plan. Then the Arugula flow that we're just seeing. This is the workflow template where basically we launched the Terraform container. We are planning the repository to get all the data, all the files. And then we are specifying the Docker image. Here we have some secrets that are already on the Kubernetes cluster that allow us to clone the repository. And then here is where we launched the Docker image and we are passing the values to the different variables that we need. For example, we need to pass the Terraform directory where the code is the Terraform version if we wanna force one, otherwise it will detect our Terraform version written in the code and we'll download it automatically if it isn't already present on the Docker container. Then we specify the workspace which basically is the name of the cell in our case. And where the OPA file is also we can use concepts as well and then the action that we're gonna apply that can be a plan, apply it slowly. Then we have our AWS access keys that again are stored on the cluster that allow us to communicate with AWS. Then this workflow is here I'm simulating the one that I'll show previously, the usual one. So basically it's a DAX, it uses DAX. So I have three tasks here. One task is that creates the Kubernetes cluster and it refers to the workflow template that I showed previously. We are gonna pass only different values which is the workspace in our case, the surname. The DSD that we wanna apply in our case is that directory where our OPA file is and then the action that is applied. Then the second step is deploy apps. In that step we can see that has a dependency. So that the step will be launched after the previous ones is finished. And the only that it changed is the DSD and here we're using a Confess in sort of OPA. And then for that create history we don't have any dependency which means that at the beginning both the Kubernetes cluster and create history are gonna be run on parallel. So let's go here. First of all, I'm gonna show that we don't have any template. So what we need to do first is upload the template into our cluster. So I'm gonna run template create and I'm gonna specify where the template is. Again, now if I leave certain place I can see that I have one. If I go from the interface and to workflow templates we can see it also here. Then let's create a new cell. So I'm gonna submit the workflow in the workflow Jamel in that case it's just a workflow not a workflow template and I'm gonna pass a parameter which is the assembly. I can check the logs on the console. We can see different colors in our case means that every color is a different step or a different container. Or we can go on the web UI with its analyzer. We can see how it created the Kubernetes cluster. It's detected that we are using the reference 12.79. Then the workspace webinar demo doesn't exist. So it creates automatically the workspace. It says that it's gonna create a new resource and as the OPA is only 10 because it's creating a new one it passed the OPA checks made the place safely in the code. Then it deploys the apps and the same for the create S3 backend. So now let's go to the code. Imagine that I say, oh, I wanna change the resource name and I'm gonna add a two. If you are familiar with the telephone you know that the separation basically what it's gonna do is it's gonna destroy the previous cluster and it's gonna create a new one. So change name, come in the changes and I'm gonna trigger again the same set or the same workflow with another workflow. If I open the Kubernetes cluster now it detected already the workspace because it has been created before and it says that it's gonna destroy a cluster and it's gonna create the new one that we have set. So it's 110 that the total score because it's 104 deleting resources and 10 for adding one. And then it says that it failed the OPA checks and it canceled the operation. And as it canceled the operation we can see that the deploy apps set them. And that's basically how we are using our workflows with the telephone. Cool, Daniel. That's a pretty sweet demo. I definitely showcases how we use Terraform and OPA with an Argo workflow to safely roll out our infrastructure changes. This fits nicely with what Caleb showed combined Argo CD and Argo rollouts so we can build new cells in a safe and automated fashion without any degradation of our services scale. Now we're gonna leave the rest of the time for questions and no, a hotdog is not a sandwich. So let's see, I'll try to go in order here I think. We had a question about CRDs. I think it was related to machine templates for Cades and how you manage the complexity of that simultaneously. We do that with Argo workflows. Essentially we'll have a step in a workflow that uses Argo CD and Argo rollouts to push out CRDs and then we can run any validation that we want. And then you can have another step that in the workflow that does whatever you need to do with machine templates and those are treated as code as well. The next question, how would a list of applications and application details UI look with 200 microservices? Is it a drill down UI or is everything going to be on the same page? Caleb, you wanna take the phone? Yeah, I can take that. Since we deal with having 3,000 applications in one Argo CD. So any individual application looks like what you saw in my demo where it's scoped to just the resources that that application controls. The list of applications is one big list but it is filterable. So you can group them together by project which is at least in our case, we typically tie that to like a namespace or a team. And then you can also filter it by like which cluster you're targeting. You can put arbitrary labels on those applications and filter by that kind of a robust filtering system there. So it's not exactly drill down but it's like a filterable list. I hope that answered your question. I think the next question is from Mohammed. Good habit with CI CD is performing a DAST which I think is the dynamic security testing. How can we perform these tests to a Canary release? I can take that too because I think it probably depends on your DAST tool but I can imagine using either a job metric provider in a Canary release or a web metric provider and either running a Kubernetes job that goes and asks your DAST tool to go do a thing and then inspect the results. So run that scan, look at the results, see if there's bad stuff there and if there is a fail that Canary. Are all the examples from the demos and any public repo? Unfortunately not, but I think we will probably create a public blog post on New Relic and be able to share a lot of the content. I think the next one is related to an answer that I gave a face in previously. Yep, I see that, cool. See, okay, how do you manage destroying resources if the workflow is canceled? Kinda depends what the step in the workflow's doing. If you're using rollouts, then you're in good shape because it automatically rolls you back. For the Terraform stuff we showed, it essentially will, if it passes the OPA checks and then it fails for some reason, then you're gonna get alerted and have to dive in to see what's wrong. If it's another step that's running a container that an engineering team has created, it just depends on what that is. It doesn't have that rollback functionality unless you've built it into the step itself. Hope that answered your question. Are we running an Argo for each case cluster, or do we have one that manages all of them? So the short answer is we have one that manages all of them. The real answer is we have two that each manage all of them, but we definitely take the singular approach. The only one of those components that is deployed to every cluster is the Argo rollouts component because that is a controller that needs to run on every cluster. Everything else is centrally managed in one. Yeah, and that's where you get to a certain number of clusters and apps and you really need the 1.8 release that was just released last week. So does Argo provide notification hooks for successful deployments? It does actually. One of the components in Argo we didn't cover was Argo events, which has its own huge powerful ecosystem for really creating event-based pipelines to react to anything going on from many different sources, not just Argo itself, but you could watch Kafka topics and PubSub environments and things like that. Which one are we tackling next? I think the one from Alexi, probably how do we create another environment for the application? As new instances of the application that consists of hundreds of services and components and additional dependencies, for example, it might require to create a PV resource and may rely on and centralized instances of like Rabbit and Q. Can you copy application to create new instances? I copy dev 01, test 04, stadium, et cetera, or create one from scratch every time? I have opinions. Yeah, go for it. Sure. Like this is why we do GitOps. We're creating a new version. So especially I think with the stuff that we're deploying, a lot of these are Helm charts and Argo CD handles like Helm and Customize and other renders very well. So you wouldn't necessarily be creating one from scratch, but you would be creating a new application with the exact same Helm chart, but with different parameters for your different environments. And that's how you deploy a new one. Here's one for Daniel. Where are you holding Terraform State for your workflows? Well, we are storing it on a street, but for the demo, I was using Minio on local. So yeah, I don't understand a lot of things like querying the street. Yeah, so each workflow that a team owns has their own state backend in an S3 bucket. I think that's using Dynamo for locking as well. But it's anywhere Terraform really can store it, but we definitely like making sure our state is pushed up somewhere so multiple people can interact with it safely. So this one's interesting. Are there any features to debug, build container inside like CircleCI has? That one I'm not sure of. I think you'll be able to look at the logs, but if you need to, if you're gonna try to like, and knit into, you know, or execute into the container, you might have to introduce your own like pause mechanism there or something. So you can, you know, look at the, what's going on. So this is a good question. Does it compare versions of Kubernetes resources against what Argo created? Manual edits of things like ENV's resource limits request, et cetera. So yeah, it has a sync component to it. So like if you edit the state of a resource that is managed by Argo, like Argo CD or Argo rollouts, if you edit that manually with Qubectl, it'll essentially be out of sync with the Git repo that's coming from. And this is where like GitOps comes in where if that happens, the next sync will essentially blow away those changes. I can actually show this real quick because I'm out of sync right now in that application that I have demoed. Let me do this. So like, you know, I made some manual edits to that rollout through this UI and it now tells me that I'm out of sync with what's in Git and you can get this diff. Here, let me show the compact version. You can get a diff about like what has changed here versus what is in the Git repo backing this application. And somebody had asked about notifications earlier, like we specifically use the Argo CD notifications project that's in the Argo, like labs org. And that will send out notifications when like into Slack when a application goes out of sync for any reason, which is really handy. Let's see, we've got a couple more questions. Libby, how much time do we have still? In about seven minutes. Okay. Y'all are good to go. Yeah, one says in your GitOps workflow, how do you manage credentials like required by Argo to access the repositories? Are they stored in plain text? Is the basics falling there for the repository itself? So we're using secrets, coordinate the secrets to store all kinds of credentials. And then with Argo CD or Argo flow, it's really easy to get them. It's just, it's like when you define a bot that you want to use a secret, I think it's exactly the same definition. Like, yeah, you say like the value from secret key reference and then that's it. Does Argo have LDAP and AD integration? Yeah, this was on one of my slides. It actually has that SSO integration and very granular RBAC controls that will let you have multi-tenancy and things like that. And you could even deploy different Argo instances to the same Kubernetes cluster if you really needed to in their own namespaces and have some granular control of what teams can access and stuff like that. Everyone for you Caleb, what other tools did you test before settling on Argo? That's funny, that was laughing because I spearheaded an effort to test a whole bunch of tools on the way to coming to Argo. We looked at some commercial providers. We previously had, and so we looked at Harness, we looked at Tecton, we looked at a number of other like pipelining tools, specifically in the Argo workflows space. So like OCD and that list is a little long. We previously had experience with Spinnaker. So we had done an evaluation of that. It all kind of voiled down to this collection of tools, each solving specific needs that we had very well in their own like pinpointed way. Like so Argo CD did the GitOps and the syncing and didn't try to do anything else, didn't try to be like an all-in-one tool, but Argo workflows did the pipelining and it did that really well, but it didn't try to be everything. And we found that flexibility to meet all of our needs and together like putting these three things together drew a really nice overall picture. All right, we've got three minutes. You wanna go tackle another one? Yeah, let's do it. I was just reading through, through some trying to figure out which one I can answer next and which ones I've already answered. See, I can answer one of these. So how do you set up and configure Argo CD to use cluster bootstrapping features, app of apps? So yeah, the team that manages the Argo installation for all of this stuff, it's all managed through a Git repo that is synced through Argo CD. So there's one bootstrapping step to get Argo CD like up and running the first time. And then it's just an automatic sync of all of the Argo components after that. Specifically using the app of apps method. So here's a question for, do you think Argo could be utilized for setting up base apps of the Kubernetes clusters like a pass layer of the application that we use in our infrastructure? Logging, file B plus log slash, ingress controllers, metal LB for Bell Meta clusters, storage class definitions for cloud providers, et cetera. Or do you think Argo is a good primary, primarily for application state management? I think it's both, really. This is where the GitOps workflow really comes in. So like the source of truth is really in Git for all this. And Argo is just the intermediary of applying those changes. And that really lets us have confidence of what the changes were, who made the changes, somebody reviewed the changes. You can have different environments for testing those out. And then Argo is the delivery mechanism, but it's also doing stuff like Caleb showed with rollouts, which is giving us the metric based analysis for Canaries and other things. And then with workflows orchestrating it all together so we can kind of do anything we want with it in some regard. I think one of the important things is like, we really push that all the things we're deploying are idopodent. So that way we could rerun the steps over and over again if we need to, right? Like it's safe to just say, hey, they run and they will not make any changes if they don't need to. Right, I think that's all we have time for. Thanks everyone for joining us. I'll remind you that the slides and recording will be up later today on the CNCF website. And thanks again for joining us. Thank y'all for your presentation and all your Q&A. And we will see you at a CNCF webinar very soon.