 Hello everyone and welcome to our session managing your cluster, the GTOPS way. We'll introduce ourselves. My name is Nishant Kumar and I work for Microsoft as a senior software engineer and I'm a developer at the Azure Operator Distributed Systems Product and I have with me. Hi, I'm Uday T. Kumar. I work for Zeta Technologies where we are revolutionizing the world of banking. I'm an engineering manager heading a team of DevOps engineers. Thank you, Uday. So talking about the agenda, I'll talk about what is GTOPS. Why do we use GTOPS? What are some of the benefits? GTOPS principle and then Uday is going to talk about a specific tool, Argo CD and talk about this architecture features and provided demo and then I'll be talking about some of the GTOPS challenges and then we'll end up with the comparisons with some other open source tools. So what is GTOPS? It is a way of implementing continuous deployment for cloud native applications and it was a term which was coined by Weaverworks in 2017 and the core idea of GTOPS is having a Git repository that contains declarative descriptions of the infrastructure currently desired in the target environment and an automated process to make the environment match the described state in the repository. So if you want to deploy a new application or update an existing one, you only need to update the repository and an automated process handles everything else. So it's like having a cruise control for managing your applications. And Kubernetes specifically enables GTOPS by fully embracing declarative APIs as its primary mode of operation. And why GTOPS? Because you can deploy faster and more often since all of your artifacts are stored in your Git repositories, you could be sure that there are no manual changes or any random script that is running in your environment. So it's really good that you go into the Git repository, make your changes and just click the button and you're sure it's going to match the desired state with the current state. And also you have faster error recovery for free. So let's say the production environment is down a big deal, but with GTOPS you have a complete history of how your environment has changed over time. And this makes error recovery as easy as issuing a Git reward and watching your environment being restored. And the Git record is then not just an audit log, but also a transactional log. And you can roll back and forth as many times you want. Yeah. If you don't mind, can we take the questions to the end? Yeah. Thank you. So it also makes our deployments secure. So GTOPS allows you to manage deployments completely from inside your environment. For that, your environment only needs access to your repository and image registry. And that's it. You don't have to give your developers direct access to the environment because, you know, the QC tells the new SSH. And then you have self-documenting deployments. So have you ever logged into environment and wondered what's already deployed here? So that's really a pain when you really log into some environment. But with Git repositories, everything is declared there. You just log into a Git repo and you want to find out what's deployed in the environment. You don't need to even log in. You just check the Git repository and everything is out there. Some principles of GTOPS, the entire system must be described declaratively. So with GTOPS, one example is just one example of many cloud native tools that are declarative and can be treated as code. And with your application declaration version in Git, you have a single source of truth. So your apps can then be easily deployed and rolled back to and from Kubernetes. And even more importantly, when disaster strikes, your cluster infrastructure can be independently and quickly responded. And that system is then versioned and immutable. So as I mentioned earlier, everything is stored in Git and serving as your canonical source of truth. So you have a single place from which everything is derived and driven as well. And then approved changes that can be automatically applied to the system. So once you have declared state that is stored in Git, the next step is to allow any changes to the state to be automatically applied to your system. So what's significant about this is that you don't need cluster credentials to make changes to your system. And then your systems are continuously reconciled. So once the state of your system is declared and kept under version control, software agents can inform you whenever the current state is not matching the desired state. And the use of agents also ensures that your entire system is self-healing. And by self-healing, we don't just mean any nodes or pods failure because those are just handled by Kubernetes. But even some manual errors, like if someone comes into an environment, deletes a deployment which is by mistake or not intended, then the reconciliation loop will actually check what is there in the Git and it's going to undo all the changes which was done by a human. And then there are two ways of how you deploy a cluster, either via pull-based or push-based system. So with push-based, you use a CI-CD pipeline to push changes to your environment, and then the pipeline is triggered by the code commit or merge. And with pull-based GitOps and agent running inside your environment, continuously pulls a Git repo or container registry for changes. And when it detects a mismatch between the defined state and the running state, the agent pulls the defined configuration into the environment. So pull-based, first of all, it increases the security because the agent is running inside the cluster, so there's no need to store your credentials in your external CI, and it makes more of an environment more consistent as push GitOps typically only works in one direction from Git repo to your environment and pull-based deployments work in both directions. It actually pulls from a Git repo into your environment. So this can detect and remediate configuration drift in the event that changes that are made to the cluster manually or from other sources. So here are some of the popular open-source tools, Argo CD Flux, Junkins, Weave, GitOps Core, Pipe CD, and there are many more that you will find out in the internet, but in this talk, Uday is going to specifically focus on Argo CD, talk about its architecture and provide a demo, and then I'll talk about some of the challenges and comparisons, so I'll hand it over to Uday. Thanks, Nishant. So Argo CD is a very widely used tool, one of the many open-source tools that we try to contribute to, customize and use at Zeta Tech. It allows one to use any Git-based repository, whether it's GitHub, GitLab, or BitBucket. Specifically with respect to Kubernetes, the manifest can be specified in any number of ways, customize, helm, or just plane, vanilla, YAML, or JSON manifest as well. It also supports any custom management tool which can be configured as a plugin for the purpose of config management as well. Argo CD can work with many of the commonly used ingress controllers, including Ambassador, Nginx, Contour, Traffic. In fact, if you go through the Argo CD documentation, it has sample manifest files, which act as a quick starter reference, and I suggest you do check the documentation out. Argo CD supports two major deployment types, multi-tenant with UI and CLI, and the core, which only comes with the CLI. Multi-tenant installation is the most common installation type where multiple dev teams can access it and collaborate, and it is basically why the API servers, which uses the web UI as well as the CLI. Now, this further has two modes, the HA and the non-HA. The HA is basically a kind of version of the non-HA, but it supports the HA and resiliency, as the name suggests. Now, the non-HA is strictly not for production use. It is more for demo or evaluation. No prizes for guessing what mode I'm going to be using in my demo today. Now, when it comes to both of these modes, there are two ways or means in which they can be deployed. There is an install.yaml where Argo CD is installed in the same cluster as the rest of the applications, and there is a namespace.install.yaml file where Argo CD is installed in a completely different cluster as compared to the applications. The difference is whether somebody wants it in the same cluster or not, and for various reasons of security amongst other things. The last mode, of course, is the core. Core, as mentioned earlier, does not include the API server at all. We'll discuss a little more about the components in the next slide, and therefore does not have a UI. So this is essentially if the cluster admins want to have an independent setup, and they want to run a specific instance of Argo CD, this is a more preferred way because it's lightweight, it does not have the HA components built-in, and everything is done via CLI. Let's look at a little bit of the architecture, which will help you understand what we were talking about earlier. Now, there are three major things, API server, repository server, and the application controller. So API server, as most of us would guess, is what enables the communication, most of the communication with the rest of the world. The API server is basically a GRPC or REST server, which exposes the API consumed by the web UI, the CLI, and also the CI CD systems. It takes care of a couple of things, application management and status reporting, then invoking of application operations, including rollback and any user-defined actions that might be put in repository and cluster credential management, that is, which are stored as gates, secrets, or can be delegated to any of the external or third-party authentication or IAM management tools, then authentication and delegation, as I mentioned, RBAC enforcement, and in case somebody has Git webhook events, it can also act as a listener or forward for them. The next one is the repository server. Now, as the name suggests, this is basically an internal service, which takes a local cache of whatever is present in the Git repository itself, and it holds the application manifest. Now, it is responsible for generating and returning any Kubernetes manifest when certain inputs are required. Now, what are those inputs? It's your typical Git-based inputs such as the repository URL, any revisions, application path, template-specific settings like parameters or environment variables or Helm values.yaml. And finally, there's the application controller. Now, it is basically nothing but a Kubernetes controller which monitors running applications and compares the current life state with the desired target state, as specified in the repo. It detects any out-of-sync applications, what actually happens, you will get a better idea when we go through the demo. And now, what I mean by out-of-sync application over here is that suppose we've made a change in the GitHub repository, and it is not what is deployed currently. So that's where it says, okay, that this is the desired state and this is not currently what it is. And it can sometimes optionally take actions as well. Optionally, why? It will manual sync or auto-sync as well. That's something, again, which you will get a better idea during the demo. It is also responsible for invoking any user-defined hooks for lifecycle events. This is for any advance deployments when you have canary or blue-green deployments in place. Now, here's some of the major features. Now, when it comes to the features, I tend to categorize them into five major category and try to remember them using an acronym. S-A-L-A-D, SALID. Yes, guys, it's a post-lunch talk. So I had the option of using Dallas or SALID, so went with SALID. S stands for state. So automated configuration drift detection. When an application moves away from its desired state, as I mentioned earlier, it finds that out and it syncs it. It also provides visualization. There is a big yellow icon which comes up saying this is out-of-sync. And as I mentioned, for any advanced or complex application rollouts, it supports hooks for precinct, sync, and post-sync. Then there's the API part for A where as we've already discussed, there's a web UI which provides any real-time view. There's a CLI for automation or CI integration and webhook integration for GitHub, Bitbucket, or GitLab. The L is for logs and metrics. It can always look at the health status and give an analysis of what is going on, audit trails, and also supports Prometheus metrics. Then S stands for auth. There's an SSO integration via various methods and third-party tools. OIDC, auth to GitHub, GitLab, LinkedIn, Microsoft, et cetera. It supports multi-tenancy and RBAC policies for authorization and access tokens for automation as well. And finally, the D stands for deployment. Automated deployment of applications to specified target environments, rollback, or roll anywhere to any application configuration committed in Git repository. Multiple config management templating tools are supported, as we said, customized Helm, or even plain vanilla JSON files. And it has the ability to manage and deploy to multiple clusters. Let's now talk, go through the demo. So we tried to do a live demo, but the demo gods were not with us. So you can see how the installation is done. Basically, a namespace is added, and we need to do a Cube CTL apply of the GitHub account. So I've done that already over here. And when we do a Cube CTL get-all in the namespace for ARGO CD, we can see a whole bunch of things that are deployed. So we can see the services, the deployments, the pods, and the replica sets, and a stateful set. We'll grab a hold of the load balancer IP, and this is already deployed. So in case we were to, it's already deployed, but just for the sake of the demo, I'm putting it out there again, and you can see it comes up. I've already configured the GitHub repository. All we need to do is go to repositories and click on connect. So you can see that I've set up something specifically for this demo. Currently it has nothing but a simple readme file. And now what we are going to do is I've already configured some YAML files, which we will use. I'm creating a new app on ARGO CD itself. I'll give it a name. This is going to be a simple Nginx application. So just Nginx test. The project will be default. The sync policy, we'll keep it manual for now. Repository URL, since the repository is set up, the repository URL comes in as a prompt. The path I can put in any optional directories I want, I've just put in a dot for now. The cluster URL will be the Kubernetes default URL. I've used microkits over here. And I've put in the namespace as GitOps, which I've specified as well. It could be just as default in the default namespace as well. You can see it's healthy and synced because currently in the repository, there's nothing, absolutely nothing except for the readme file. Now, as I mentioned, there's Nginx YAML file, which I've already set up to save us some time. I'm going to copy it into the GitHub repository and then push it on to GitHub. And I'll open it up in a bit and show you what we've done. We're just going to push this, you may have and run some commands, which you may have seen a hundred times. We've kept it just for the sake of the demo to show that it's not some pre-baked, pre-configured things. You can see the YAML file over here. The containers have been set up. The port has been set up. Notice the number of replicas was just that in now. You should get ahead, get commit, and get push. And there we are. So while that happens, let's go back to the application and you can see that it's just showing Nginx test. If I were to do a refresh, it now shows out of sync because the manifests have been pushed into GitHub and it is saying that the desired state and the current active state are not in sync. So we can click on app diff to see what the difference is. We can see that the services and the deployments are showing whatever I had earlier shown on the YAML files. Now we hit a quick sync. Sync has a couple of options, like prune. Prune is when you have a certain application and you want those pods to be killed and have new pods created. For example, we are upgrading from one Nginx version to another . You can see that as soon as I have synced, the services have come up and two new pods have come up as well, which are getting created. I will now just do a Qubectl get all and we can check what is the status of that on the CLI. And we can see in the namespace GitOps, the status of the pods are container creating. That should come up in a minute. Now we can see it is in running state. Let's go back to the UI. Yes, it's in a running state. And if we pick up the cluster IP and we are to enter it, voila, Nginx is up and running. Now this is where we get to see what happens when there is a drift in the current state and the desired state. So I'm going to modify the YAML, change the replicas from two to four so that we actually create a drift in the current state and the desired state and the usual Git based commands, Git add, Git commit and Git push. So let's again find out if the status has been synced. But before that, let's take a look at what is there on GitHub. You can see that the file has been updated, the number of replicas is four. Let's do a quick refresh to see if everything is hunky-dory. It's not because it's again out of sync. The app difference this time clearly points out that the number of replicas is the only factor that has changed from two to four. And we can hit a sync and again the number of options are still the same. As soon as we hit sync, we should see two more pods coming up. There we see one has come up and the fourth one. Let's quickly check on the file. We can see that two new pods have come up. Both the statuses are container creating. The app difference or the app details rather shows us various options. One of them is to enable auto sync where Argo CD would do the job of refreshing it, checking every now and then and then doing the actual sync. We can delete the application as well which will allow us to see all the four pods are up and running and in a healthy status. With that, over to Nishant. Thank you, Uday. I will talk about some of the GitOps challenges. So far what Uday has shown is you have a Git repository and then within Argo CD you provide a source and then Argo CD would ensure that whatever is the desired state is then available to you. So the second one is how do you create those manifest file which resides in your Git repositories. That's one of the challenges and then the second one is how do you do secret management. I will talk about both of them. There are two popular open source options for managing your manifest files which is building blocks for your applications. One of them is with the help of Helm you can actually package your applications, its dependencies and it has a templating language so you can use that to deploy, upgrade, install your applications on your environment and then the second option is customize. So customize uses purely Kubernetes YAML files and with Helm what you get is you have a values.yaml file, you can substitute all of the values that you want to let's say update. If you want an image to update you can go and change in the values.yaml and then it will reflect in all the other places that image has been getting used. It could be in your deployment or any other files. Similarly in customize it also it has a customization.yaml file. Again it's a single yaml file where you can provide your substitutions. If you want to use a custom yaml file the only benefit that I see with customizes you don't get the complexity of Helm templating language. If you really want to do some complex logic then Helm templating language can be very overwhelming but in case with customize since it's just pure Kubernetes YAML file it's pretty straightforward so if you understand a Kubernetes manifest file you already understand customize and something else you can do is you can do a lot of overlaze. If you want different configurations for your different environments let's say a dev or production environment then it is natively supported within customize. It's really a good tool and I would recommend you go and explore that. The next thing is secret management. You might be aware there are multiple open source secret management tools like hashikap, vault, external storage and so on. If you want to know how to store your secret in your Git repositories I mean right now without GitOps or without a good solution you need you can't store your secrets directly in your Git repositories because it's just base64 encoded by default in Kubernetes and it is not encrypted. So either you have to use some third-party solutions where in your environment or you can use another tool called bitnami seal secret. So the special thing about that tool is you can actually store your secrets along with a manifest file in the same Git repository. So seal secret is a one-way encrypted secret that can be created by anyone but can only be decrypted by the controller that is running inside the target cluster. And once it is encrypted it can then be safely uploaded to your Git repositories. And it has a seal secret is composed of two parts. One is a cluster side controller and operator and that is used to decrypt your secret and then there's a client-side tool called KubeSeal. So in the example you can see on the right-hand side you have a normal Kubernetes secret and using KubeSeal you can actually encrypt and on the right-hand bottom side it creates a seal secret. The client is sealed secret and the data section is completely encrypted and now you can upload that secret into your Git repositories. And then the controller, once you apply that secret in the environment the controller would already be running and that controller will decrypt the secrets on the fly and make it available to your pods. So again an interesting tool that you can explore. And here are some of the comparisons for different tools in upstream. This might be a little bit outdated since the open-source community is continuously evolving but yeah, like Flux, Flux, V2, ArcoCD, PipeCD, Junkinex, these are some of the really popular tools that I think you can go and explore. And I would like to thank all the awesome blog posts available on the internet because it was really useful in putting the content out for this talk. So thank you for attending this talk. Any questions? So as I said, for the cluster admin you can always have a separate CLI-based deployment. And there are a whole bunch of things where your ArcoCD or rather your Kubernetes cluster itself can be deployed. So there are a lot of charts which comes in Helm which is what we typically tend to use. And in fact in OpenStack itself this project is one such project where all the charts are written where you can deploy OpenStack on top of Kubernetes, have the Kubernetes itself deployed and that entire set of manifests you can have it in a GitHub repository, you can have ArcoCD and you can choose what mode you want and have the entire thing deployed. Do you have the same production? Not in my current organization but in a previous organization we have done that. I think we may have just crossed time. So if there are any questions you can come to us. Thank you.