 scaling GitOps using Argos CD, application sets. And the lessons we learned during our entire GitOps journey. So before I begin, I would like to understand how many of your have used application sets. OK, that's quite a few. So then in that case, I'm just going to walk through a couple of yamls and how we utilize that. And we'll take it from there. So first, we'll just walk through the agenda, which is what was the problem we faced, what business was looking for, why we chose Argos CD specifically, and what was the POC phase, the discovery phase, post that, how we migrated to application sets, and finally, what is our production checklist, and how do we decide how we want to go about it. So the problem, some of the main issues which business liked to look at was one was configuration drift in production and zombie issues. What I mean by that is you know when you have a production issue, and at that moment, you want to go and check and resolve the issue as a priority. Most of the times, you tend to do it the right way. Once in a blue moon, you tend to make a configuration change directly in production. Everything is resolved. Everything is hunky-dory. One week later, another deployment, or one day later, another deployment, and the issue recurs. So that's the zombie issues for us, and GitOps is the best way to resolve it. Auditability issues, who made the deployment? Who committed the code? Where was the configuration change made? Then Git deployment pipelines. We wanted to ensure that we are moving away from just using GitLab pipelines or GitHub pipelines to something which automatically starts deploying our applications. And finally, security, which is you ensure that only a particular application has the right to deploy your tools and configuration. What was the benefits? So reduce downtime with canary deployments and the screen test. So while you can check metrics for your production changes, there could be certain bugs which are introduced into your application. How do you, the reason we use canary deployments is so that we deploy the application. And the screen test is basically your end users who complain about it in case there's a 5% deployment, one of your 5% will give you feedback and let you know that this is an issue. The next part is configuration drift. You avoid configuration drift, and you ensure that your application is exactly what your repo says, your golden source of truth. And finally, a visual information for developers. This is developers don't necessarily need to know all the nitty gritties of their microservice deployment in Kubernetes. You want to take that overhead away from them and provide a simplistic view for them in order for them to try and understand this. So the POC phase. We first came across ARGO CD applications while it did provide us benefits using GitOps model and creating the ARGO CD CRD creation. It still had too much of maintenance overhead for scaling purposes. Then we came across another way of doing things, which is app of apps pattern. Initially, we found that was an upgrade from just using applications. So we decided to do another POC on that. But we again found we reached certain limitations in app of app pattern. We had to maintain separate repositories in order to maintain your main app and the remaining of your replica apps, which you might have. And that's when we came across application sets. So we went into one autelius workgroup meeting. And that's where we heard the workgroup discussing about application sets in ARGO CD. It was still in beta phase, but it pretty much changed the game. There were a lot of features in it which we required. And we decided to go ahead and run a POC on it. This is why we also find that it's really useful to be active in your open source work groups. You learn a lot of things where different people are trying to solve similar problems. And you learn how different companies, different teams are trying to solve these problems. So what are application sets? Application sets are a simplified way of deploying your application. They ensure that you don't need to repeat a lot of manifests. You can use one single manifest as your application set and deploy it to multiple environments. You're, for example, in this YAML. You can see you have dev, pre-production. You can deploy this one single application, which is using the list type of generator to deploy into multiple environments. Now the next one we'll cover is SCM provider generator. What this does is it uses your SCM provider, which is GitHub or GitLab. It scans all your respective repositories. You can choose all branches as false or true. So generally, in a dev environment, we mark that as true, so that all feature branches get deployed. But for production, you just want to deploy main branch. As far as the token is concerned, that's your token for your respective Git repositories. And finally, what we use is a filter, which is path exists. Now this is useful if you have certain legacy applications and you're gradually migrating this over to Argo charts. You can keep it as any particular folder name. You don't have to specifically call it Argo chart. But we find it useful when we are migrating over from the old way of doing things to the new way. The next is the Git generator. This is very handy for Mono repos. You can just give multiple paths. And if you have multiple Helm charts within it, customized deployments within it, it will utilize that and start deploying all the applications. You can also choose to exclude a specific path. So as you can see, we have mentioned exclude DNS and external DNS in this. The next one is Matrix generator. Now this is our go to. And this is what we utilize everywhere. Matrix generator is basically it combines two separate generators in order to give you a full-fledged system. What this allows you to do is, in this example, you're using your Git generator to deploy multiple applications. So this is what we are doing is our Kubernetes cluster bootstrapping. And then in the list generator below, you can deploy to either a demo cluster or a dev cluster or a staging cluster and so on. The other thing that you all will also realize is you need to mention the sync policies. So what we do is we mention automation and all the configuration that is required for the same. And now this is about making Argo CD production ready. This is the story of the doggit my homework. So the production set up. Initially, when we deployed our application, we kept Argo CD on the cluster itself it was deploying to. So multiple Argo CDs multiple for multiple clusters. Gradually for a production cluster, we realized we wanted a bit more security. So we decided to move it to orchestrate a cluster. We moved our production Argo CD to a separate orchestration cluster. We would keep other orchestration based tools or common set of tools on this orchestration cluster. For DevOps and Dev environment. So the DevOps environment is the environment where the DevOps team can make mistakes. This is essential because you do not wanna bring down the dev cluster or your staging cluster. The next bit is how do you ensure that you use all your respective tools? So while bootstrapping, we use certain essential tools to begin with for Argo CD to deploy the remaining applications on its own. We start off with Argo CD external DNS and sort manager in order to bootstrap the main cluster and then Argo CD deploys all the other applications that there will be. So whether that's your Grafana charts, your nginx ingress, all of that is done automatically by Argo CD once you give it the repository path. The next is Argo CD world plugin. We tend to use hashicop vault in order to use or pull our secrets from applications. We have used the sidecar method. One thing to keep in mind in this case is you do not want to try and use the same service account that Argo CD repo server gives you. This is because of the token. We use Kubernetes authentication for this purpose. And if you utilize the same token and you have to ever delete Argo CD to recreate it, the token gets refreshed and then you need to reauthenticate the whole setup. The next part is your repo secret and your cluster secret. So during our bootstrapping, we also use repo secrets. This is you once you set up a repo secret, it can connect to your GitHub or GitLab easily. And the cluster secret is so that it can now connect to your cluster, which is it sets up everything that your cluster secret needs. The next thing is the webhook. So in order to make sure that you're not waiting around or the developers are not waiting around for the sync time to kick in, you want to set up a Git webhook. This ensures that as soon as there's a push or a pull request, Argo CD knows that it needs to sync. The next bit is the cluster token, as I mentioned earlier. Again, you can see that using the annotations of wall plugin, we are directly pulling the secrets from the respective wall cluster. The other part that is essential is, as I mentioned earlier, was the matrix generator. This is for your Mono repo or your cluster bootstrap. We used a combination of customized Helm combo. What that does is while your Helm chart allows you to make most of your changes, there are certain values which are not available to change. That's where we also utilize customized to make those additional changes. And finally, we have our SCM provider with list generator for the matrix. Now, what this does is it can scan your entire Git provider in order to deploy all your different applications. In this case, you can see that it's creating it in a very specific manner. You have your feature branch name. You then have your application name, and then you have your environment name. In this case, it is a DevOps environment. The other part what you'll see is the value files. We are creating a separate one for environment variables. We use a separate one for image tag. And similarly, the remaining standard configuration, that will stay the same mainly for all our different environments. We also only do apply out of sync. This helps in ensuring that it's not constantly polling and syncing every single thing. Only if it is out of sync, it's going to go ahead and sync it up. Last and finally, we have our OIDC integration. Now, this is to ensure you have your SSO setup. You want to make sure that you use either your Google Groups, your Microsoft Teams, whichever one setup you prefer. And post that, you can take it from there. This ensures that your cluster is pretty secure. Your DevOps team have admin access, and your developers have only read-only access. That's it. Thank you. Any questions? No idea. You got it right here. Raise your hand. Yeah, hello. You've got it here. Thank you. Great report. Thank you very much once again. I'm from Avalops, and we also use application sets a lot. And we have this problem where you try to merge applications that's like different generators within application sets with Customize. You don't have merge keys there. So the question is, is there a plan to implement merge keys for generators in maybe beta for offsets? Sorry. So I'm not sure about, could you repeat the question? Yeah. So we do use metrics generators. Yes. And parts of the configuration lies in different files, which we try to merge with Customize. And then there is missing thing, which is merge key. So Customize can only replace the list, not merge. This is because how CRD is designed. So the question is, is there a plan to fix CRD? I'm not sure about the plan to fix CRD. That would be within the Argo setup. As far as we are concerned, we use Customize only for the main Helen charts in order to basically replace certain patches, not for our internal Helm charts. So the internal applications are deployed using our Helm charts, which we create a template. So there we can add as many values or changes that we want to make. Yeah. Come to the Argo project booth tomorrow because it's staffed by all maintainers. And so we can talk to you about that specific issue. Other questions? Hi, my name is Israel Bolenko. I'm from Mobileye from Israel. We have very good integration with Argo CD. I have to say that it's a very good project. We use it very intensively. But a couple of things missing for us. Two main things that I think that you maybe need to consider. First thing, conditional style statements inside of application sets. It's a very good feature that actually is missing for us. I need to do a lot of tricks to overpass this stuff. When I need some conditions, I need to do very complicated tricks to do it. And second thing, what I will say that is missing multiple sources for configuration files. Recently, you added a new feature that you can combine two file generators inside of matrix using prefix. Actually, this feature can help when you have only two config files in the path. But if you have a lot of config files in the path, it's impossible to do. And we needed to do trick to do it. I actually use my pipeline to generate from many, many different sources with single config file and use it as a source of truth for everything. I generate by myself. Config file that contains everything that I need. After this, I use it. So if you will add feature that you can use a lot of config files inside of path of generator, it will be great. And also, it will be great if you can combine the directory generator together with file generator inside. It's also missing. So it's three main things that I think if you will add, it will be just fantastic stuff. Again, as you mentioned, it'll be in part of the Argo chats or the Argos tomorrow. They have a workshop. Yeah, if your questions are focused around features you'd like added to Argo or things like that, I would talk to the maintainers. There are a bunch of us around. We're going to be at the Argo project with tomorrow. But that's a great conversation topic to have there. Looks like we have another question. Hi, thanks for the presentation. I have a question for you. So I saw you were bootstrapping your cluster and through Argo CD. What about this stuff that's outside of Kubernetes, for example, in AWS IAM roles, or in Google Cloud, whatever you may have? Are you combining another solution with Argo CD to bootstrap those things? So when you say bootstrapping, what other could you be a more specific example? Yeah, sure. For example, in AWS, when you're bootstrapping a cluster, for some of those stuff, you might need some IAM roles to read from other services. So in GKE, you have this option where there are certain IAM configurations which you can put in. And it creates your workload identities for you. It uses internally something called CNRM. And whatever you create, once you create these manifests within Kubernetes, it'll go and ensure that your respective GKE setup also creates that IAM role and the IAM role binding, et cetera. So certain things which you might require for workload identities, for example, you can implement using the bootstrapping cluster. Not necessarily every single thing. For that, we use Stairform to begin with. Hi, Daniel from Germany here. Thanks for the talk. Very insightful. I got a question regarding performance issues you might have faced when migrating to production because I can imagine that application sets with a matrix generator tend to get really big if the dimensions themselves are big enough. So did you have to do some tweaking, some tuning? What are your experiences in that regard? Yes. So you have to generally follow the HA patterns. And if you find any slowness, you will increase as the previous talk they mentioned as well. They had to tweak it and increase the total number of Redis nodes, et cetera. So it depends on your individual use case where you might have to increase the sizing based on your application setup. Hi. So I saw you using application sets for installing main cluster dependencies first and then the applications. We also do that. But how do you make sure they don't? There's no race condition where you install. I think you saw it. Linker D cert manager, how do you make sure they are installed first and running before your applications are installed that need Linker D or cert manager or other things? Because sometimes there's a race condition with application sets. So depending on what exactly you're trying to install, I would preferably create a separate application set, which you try and ensure that it's set up prior to the second one. Keep it in a different repository or try and ensure that particular linker D we have not yet completely implemented. But generally speaking, what we do is three of the applications. We first bootstrap it, which is external DNS cert manager. And we use DK English. So that sets up the whole Argo CD part of it. And most of the other applications which we set up is done using the application set. So the race condition items you would have to deploy via pipeline preferably so that you do not have this issue. Also a good use case for app of apps plus application sets using them together. There is a question over here. I think we have time for one more. What was it? Over here. Hey, thank you for the talk. I would like to know how you integrate the CI to the Argo CD process. So in terms of CI, we basically just copy the respective files over. So within our CI CD setup currently, we are not using Git workflows. So within our CI CD setup, what we do is whatever essential files are there, we generate them automatically within the CI. And within GitLab, you have the option of CI skip. So it will only perform the action you want without running your whole infinite loop where every time you commit, you are rerunning the whole setup all over again. So we basically use that whatever file or environment variable which is necessarily generated. Copy that over to the next values.file. Generate that file, commit it back into Git. Thank you, Amit. Everybody give him a round of applause. Appreciate it.