 Well, hey, everyone, hope you all are having a great time at GitHub scone. And thank you for tuning in for our session. Well, let's not waste any more time and get to our talk without any further ado. So, well, we all know that GitHub's has become a default way to, like, manage the configuration cloud-native environments, right? With tools like Argo CD and Flux keeping Git and Kubernetes in sync. But the deployment quality has not necessarily improved. It's maybe because GitHub's lacks end-to-end traceability when operators make changes on target environments to identify, like, where things broke. But GitOps also lacks standardized pre- and post-deployment checks against error budgets, SLOs, or external dependencies. Well, today, we are going to tackle all these problems and work our way through GitOps with Captain. But before we jump on to the crux of our topic, let us first introduce ourselves. So here there, I am Shivank Shandelia. I am a DevOps and open-source advocate. I was also a Google Summer of Code mentee this year under CNCF. I am also a cloud-native New Delhi community organizer where we often organize events advocating about cloud-native ecosystem and open-source. I also contribute to cloud-native projects such as Captain, Armada, Kubernetes, Mishri, and much more. And I also do love to create content via YouTube blogs, technical blogs, sharing stuff on Twitter, and LinkedIn. Well, over to you, Rakshit. Hey, folks. My name is Rakshit Pandwal. I am an undergrad from India. I am an approver for the Captain project. Earlier this fall, I was a CNCF intern for the Kaverna project. And in the summer, I was a Google Summer of Code mentee for CNCF project under the Captain project itself. Other than this, I do community work, such as I am the chapter lead for CNCD Chandigarh. And I'm also a team member of the LinkedIn community. And you can find me on socials at Rakshit Pandwal everywhere, Twitter, LinkedIn. So yeah, we can move ahead. Cool. So, like before we talk about Captain and everything, let's just understand what are the problems we are facing with GitOps. Like, what exactly is wrong with GitOps? So basically, when we talk about that, first thing that comes to my mind is complexity of adding tests and checks. Well, we all know GitOps promotes the idea of managing infrastructure and applications through declarative configurations stored in our Git repository. Now, while this approach has many advantages, ensuring the quality and reliability of these configurations, I think it's pretty crucial. Now, the complexity of adding tests and checks in our GitOps workflow can possibly arise due to several reasons. And some of them can be diversity of configurations or collaboration changes. Now, when I talk about diversity of configurations, the configurations can be a bit diverse that might cover infrastructure as a code, application manifest, policies, and much, much more. Now, think, adding all these tests and checks across these different configurations can be a bit challenging, especially when each type may require different testing approaches. Similarly, with collaboration, multiple teams contribute to our GitOps repository, obviously. Now, coordinating the addition and maintenance of tests can be a bit complex, which brings me to my second point. That would be integrating integration challenges with our observability platforms. Now, when I think about that, immediately what comes to my mind is the dynamic nature of our deployments. Now, GitOps, we all know, emphasizes continuous deployment and updates. Now, this dynamic nature of deployments can pose challenges for integration with observability platforms, because ensuring monitoring and logging configurations are automatically updated with changes in the Git repository can be non-trivial. On the other hand, when we talk about configurability, observability tools often require specific configurations for optimal use. Now, when integrating these configurations into GitOps pipeline, aligning them with the changes in the repository and ensuring consistency across the environments can be a bit complex, which brings me again to my third point about problems in the efficiency in the deployment processes, which might occur due to pipeline optimization or scalability. Now, when I talk about pipeline optimization, GitOps heavily relies on automated pipelines for deployment, ensuring efficiency of these pipelines, minimizing build times and optimizing resource utilization, which are its continuous challenges. And when talking about scalability, as the number of applications and services increase, maintaining the efficiency of a deployment process at scale becomes a challenge. Now, this includes efficient synchronization of configurations, had handling dependencies and managing the overall complexity of the deployment pipeline. So, in order to tackle all these problems, captain is what that comes to our rescue. Well, captain comes under cloud native computing foundation as an incubating project. Captain is an operator designed to help teams manage their deployments in Kubernetes by adding the notion of applications. Now, captain defines an application by attaching workloads to it, which basically helps us to add pre-evaluation checks before deploying any workload or application or cluster, finding out when an application is ready and running. Keep in mind, I'm talking about application, not workload. So, we need to find out when application is ready and running, check the application health in the declarative way, standardize pre- and post-deployment task, and basically it provides out-of-the-box observability related to our development deployment cycle. Well, the beauty of captain is that it does not require any specific configuration. All you need to do is define your application, add particular annotations to the workload, and label the namespace where the captain will observe your deployments and applications. Well, now let's see how captain works, actually. So, for this, over to you, Raksha. Okay, so captain, or earlier known as the captain lifecycle toolkit, mainly involves two operators, and the work of these operators is both different. So, first we can just quickly go through the lifecycle operator. So, lifecycle operator, as the name suggests, helps you to maintain the lifecycle of your workload or application. Now, how it helps? Suppose there's an application and for that, you can create pre-deployment evaluation, pre-deployment tasks, and after the deployment, there can be post-deployment task, and there can be post-deployment evaluations too. Now, for the later task or later evaluations to work, the first evaluation or task needs to be completed. If you want these three tasks to be working or for them to be triggered, we first want this evaluation to be passing, or this should pass for these three to work. So, suppose this is a lifecycle of an application. So, over here, the first evaluation is check-available CPUs. Now, what this evaluation can be, it's to match any metric now. Over here, we can see check-available CPUs. Now, what check-available CPUs is a metric which checks for if we have enough CPU resources on our machine available. So, if there will be a certain number of CPUs available, suppose we have defined like, there should be around this or that ramp available in our system, and if that is matching with the metric obtained from the observability platform, then this evaluation will pass. Now, later on, we can have a task. This task, just again check if then entry service is available. Then we can, then the pre-deployment tasks are completed. Evaluation is completed. Then the application will be deployed. And after the application has been deployed, there will be evaluations, post-deployment evaluations or post-deployment tasks. Now, the evaluations can be anything if you want or the task can be sending a message to your Slack channel. Like over, we have a task. It will just simply send a Slack notification to a specific channel that we have specified. You will just have to create a web book, I guess. Yeah. You need to create some sort of a web book to configure with the captain. And then it can automatically send the notification to your Slack channel. Now, this is what the basic crux of lifecycle operator is. And then we have the metrics operator. Now, what metrics operator does is it standardizes all your observability platform. Like the metrics operator is going to fetch all the metrics from all your observability platforms and then expose it via the Kats metrics API or Trometheus. Now, why this is needed? Because often companies usually use different observability platform for different metrics. Like they might be using Prometheus, Dynatrace, Datadog for integrating with HPA, Argo CD. Like, suppose we want a certain metric to automatically scale our application or we can use this metric in our Argo, in our Argo CD. Now, over here, we can see HPA is connected to Dynatrace, Prometheus. Then Datadog, then Argo CD is connected to Prometheus, Dynatrace, et cetera. Now, what captain metrics operator does, it acts as a bridge between all these observability platforms and your GitOps tools or auto scaling tools. Now, the metrics operator will query the metric from the providers and then expose it via the Kats metrics API. Now, for architecture of captain, the captain mainly involves four parts. The first one is the lifecycle operator, as we discussed. Then we have the captain metrics operator. Then we have the captain cert manager for providing the certificates to the operators and then we have a scheduler for scheduling the parts, deployments, et cetera. Now, all talk, let's move on to the demo. Now, what we are going to see in today's demo is, we'll have a sample potato head application deployed on our cluster. Then we'll have the captain metrics operator deployed on our cluster too. Now, we'll move on to the CRDs next, how the metrics operator works and all. Then we have Prometheus deployed on our cluster and for this demo, we are using HPA. But if you have a use case where you require Argo CD or any GitOps tooling, then you can configure it. But in this demo, we'll be using HPA. Then we'll define a captain metric in which we'll mention a certain metric. And if that metrics matches, then HPA will automatically scale our deployment to three replicas. Okay. We can move on to our demo over here. Wait, let me hide this. So what do we have over here? Basic application that we will deploy on to our cluster. Basic deployment and then a service to expose it. Then we have the metric provider. Now, what the captain metrics provider is, it's basically a CRD in which you'll have to define your observability platform, observability provider or metrics provider, such as we have defined it as Prometheus over here. And then we have the local URL on which it has been deployed on the cluster. In the captain metric, we have the metric that we want to fetch from the metric provider. Like we'll mention the provider name over here, name my provider. As you can see, its name was my provider. Then the query or the metric that we want to fetch over here, we are just fetching the number of resources, the sum of resources and the resources CPU, then the fetch interval seconds after how much interval of seconds the operator will constantly query the observability platform. And then we have the range field. In range field, we can define the interval for live. We want to fetch the metric for last five minutes. So we can just mention it like in the interval, then we have the step. In the step, we can do like one minute. So the operator will fetch the metric for last five minutes, but with the one minute gap each. So there will be a total of five results. And then on those five results, an aggregation function will be applied and that in this case is average. We have average, maximum, minimum. Then we have mean, then we have percentiles. You can go through the documentation if you want to know more about this feature. So yeah, and then we have the HPA. In HPA, we have mentioned our metric over here, name CPU throttling or the name of our metric was CPU throttling. Then yeah, the value should be one and the minimum replicas is one and the maximum replicas is three. So like if the metric value is one or greater than one, then the replicas will be automatically scaled up to three. And if it's not, then it will be just one. Okay, so we can just try this out. Qubes, CTL, reply, F. We have the metric provider first. Okay, then we have the kept in metric. Okay, now this has been created. And then we can just quickly check the value of our metric over here Qubes, CTL, get metric. Okay, so as you can see, the value that is coming over here is 2.84. As you can see, it's greater than one. So the number of replicas for our application should be greater than three because we specified the maximum replicas to be three over here. We can Qubes, CTL, get N. Let me just check the thoughts. Okay, oh, I forgot to apply the HP. The HP has been created. And UC saw that earlier it was one replicas. It should be working in some time. Okay, it's working. As you can see, the other two replicas have started integrating and yeah, the demo worked and it should be running. One is running, two is running, and three is running. So yeah, that's how we integrated our kept in metric inside the HPA. Similarly, you can configure these metrics inside your PTOPS tools such as Argo CD, flux, et cetera, anything you want. Okay, good to go. And yeah, let's move on to the summary, Shalvi. Thank you, Rukshan, for the amazing demo. So let's summarize Shalvi. So we have been discussing a lot of things and some of them are complex at that. At least they were for me when I started contributing to kept in. So let's see what all we learned. First, we learned about what the problems with GitOps were. We got to know about kept in. Third, we got an amazing explanation of kept in architecture from Rakshut and then an amazing demo again by discussing the solution for the problems that we discussed prior to the summarization. So if you want to get a contribute to kept in and get involved with kept in, you can scan the QR code on your screen and all you have to do is just scan and you'll be part of an amazing community. And well, I think that's it. And thank you for tuning in again and for being an amazing audience. Well then, bye-bye. Thank you, folks.