 Hi everyone, thank you for coming to this talk. My name is Brian Fox, I'm a freelance engineer. If you came here hoping to see my more famous namesake in the creative bash, then I'm really sorry to disappoint you, but if you do wanna reach out to me afterwards, then my details on this slide, I'm on the CNCF Slack, I'm more in LinkedIn, if you hunt around, you might find me elsewhere as well. Right, so today I'm here to play in spirit of CDCon and GitOpsCon together, play a little bit of devil's advocate and make the case that perhaps you don't need to GitOps everything. So the title of this talk is Terraforming Argo CD, the GitOps bridge. And whilst I'd love to lay claim to the term the GitOps bridge, I actually stumbled across this in the AWS EKS Blueprints repository where they describe it as follows. In essence, certain AWS resources may still need to be created via Terraform and these resource values will need to be passed from Terraform to Argo CD. Now in the talk, I've referred to both the physical and the metaphorical bridge. The physical bridge is what this quote describes. As the phrase indicates, the approach is not strictly compliant with the GitOps principles. It's a means of bridging the gap between traditional infrastructure as code and a full GitOps implementation. But we can also extend the metaphor. When using the approach, we can abstract the details of Kubernetes, such that teams can deploy to Kubernetes whilst working in a familiar Terraform-based workflow. But before diving into detail, let me give you a little bit of background. At the time I came across this phrase, I was working on a distributed system where the services I was involved with were largely inspired by the self-contained systems approach. There were by and large containerized workloads that ran on ECS and they made heavy use of managed services for things like caching, database or storage in general. Teams were responsible for owning the end-to-end lifecycle of their services and repository served as the boundary and the single source of truth for everything required to build and run the services. We were heavy Terraform users and this meant that app code lived alongside infrastructure code and environmental configuration in the same repository. This pattern worked really well for us, enabled a fairly simple CI CD pipeline. Changes were committed to Git, CI fetched those changes, built and deployed or built and test them. If everything passed, we tagged the repository, published an artifact and kicked off the deployment process. The CD server was simply responsible for orchestrating the rollout of changes across multiple environments where we had agents running in each AWS account that essentially fetched the repository and ran Terraform apply. But we, like many others, were lured by Kubernetes and decided to migrate our workloads off ECS. But as soon as we started that process, questions arose. The first was what to do with our existing Terraform code. Should we be deploying this via Kubernetes operator? Did we need to redo this using a case native solution like cross-plane just to be able to deploy to Kubernetes in a GitOps manner? Rewriting was something that we wanted to avoid and we figured that simply adopting Kubernetes for application workloads was complex enough. So we decided to focus on limiting the scope of a migration and the cognitive load on teams by minimizing tooling overload. So then the question became how do we leverage our existing Terraform code and integrate deploying to Kubernetes into our existing pipelines? But that raised questions like how do I make use of managed services when running on ECS? If I provision a database via Terraform, how do I supply the database configuration to my workload? Or how do I sign my workloads on IAM role or a security group? These are questions that you see coming up over and over again online and there doesn't seem to be a nice answer within the Argo CD ecosystem, particularly if you don't want to write the values out to Git. Finally, there were questions about how to handle common concerns such as how to expose a service, handle secrets, create DNS records and the likes. In essence, how do we configure our common set of operators with resource values coming from Terraform? This brings us to the Terraform provider for Argo CD. Now, this is a standard Terraform provider and see flux people in the audience, which provides a means of provisioning and maintaining resources in Argo CD. It contrasts to flux which goes against the norms and actually bootstraps flux but doesn't provide a way of maintaining anything in flux. It does ultimately what Terraform is great at doing, orchestrating API calls to the Argo CD API and it allows us to create resources in Argo CD without having to write any YAML. From our perspective, this was great. Using the provider meant there was zero impact to our existing pipelines. And as I'll discuss, it enabled us to abstract Kubernetes to a point that we could leverage existing organizational knowledge rather than everyone having to become a Kubernetes expert from day one. So in what way did we make use of the provider? Well, we ended up using it in two ways. The first from a platform perspective in terms of setting up Argo CD and the second was from a service perspective where developers could create and maintain applications in Argo CD. From an Argo CD setup perspective, we ended up with a hub-spoke model where we had Argo CD deployed to a shared instance and services deployed to individual namespaces in environment-specific clusters. The provider in turn facilitated the setup of this deployment topology in that we could use Terraform to provision both the underlying clusters and register them in Argo CD at the same time in the same plan and apply. We could also provision projects, roles, and importantly project role tokens that enabled developers to self-service the maintenance or creation and maintenance of applications in Argo CD. And this approach avoided the security concerns at the time around access to the Argo CD namespace in the shared cluster. But what about applications? Obviously, given that we wanted to limit the cognitive load on teams and integrate into their existing pipelines, making everyone right, Yammel, was out of the question. And so whilst we did end up creating a common Helm chart, using that chart declaratively, wouldn't integrate nicely into our existing pipelines, and we'd still have forced teams to figure out the correct configuration of AWS resources when using the Helm chart. And so we ended up creating a common Terraform module that wrapped the Helm chart. This chart, this module, essentially abstracted away all of these concerns and it was responsible for provisioning the necessary AWS resources and passing the correct set of values to the Helm chart via the Argo CD application resource in Terraform. This essentially made it simple for teams to move workloads to Kubernetes, while slowly introducing them to Kubernetes terms or concepts and the magnitude, like what is an ingress? What is the relationship between a service account and an IAM role? So what does that look like? I'm gonna dive into some code now. I'm in lightning talk here I go. Given that it is a lightning talk, I'm gonna have to rush through all of this, but I do encourage you to check out the repo in your own time. So let's start out by looking at Argo CD. I apologize for the scale of this and the graininess of this. But here I have a simple web app that's been deployed. It has a deployment here. It has a service and we've exposed it via an ingress. And if I go here, I can hopefully see, but we have pod info deployed. From an app team perspective, what would that look like? Is this big enough? Everyone can see, yep. Teams define the service name as well as the project they want to be deployed to. They define a container definition and they define any additional capabilities that they would like. In this case, we want to expose the service via an ingress. Underneath the hood, we have an Argo CD application resource where we can see that this simply defines an application where the source is of type helm and we can see that we can set parameters that are used when templating our helm chart, either individually, as in the case of the replica count here or by YAML encoding a Terraform object for more complex structures. Now, pod info also provides a cache endpoint or an integration with Redis. And at the moment, this is offline. So what would I need to do as an application team to make use of my ElastiCache Redis cluster within my EKS? Firstly, I would need, no, I need. I need to pass the connection string through as an environment variable. I also need to create a security group on my, add a security group to my pod. And I need to enable some security group rules, ingress and egress to allow communication between my pod and my ElastiCache cluster. Now, underneath the hood here, the module's also handling some of the additional complexity that comes around the use of security groups for pods by ensuring that my security group has outbound DNS access and communication from Kubelet on our nodes to the liveness and readiness ports or probes within our pod. So let's try and apply this, will this work? Do I have internet? Yes, okay, things are working. So while that's applying, let's chat quickly about trade-offs or rather where this approach falls short of the GitOps principles. Now, if you were at the keynote this morning, hopefully this should come as no surprise. We can say, is this solution or this approach declarative? It is, is it versioned and immutable? Yes, is it pulled automatically? We could split hairs here, but I'm gonna say yes. The ArgoCD application resource, or ArgoCD is automatically pulling our application workload and the rest of our infrastructure is pulled automatically as part of our CD pipeline. But is it continuously reconciled? And that's to that is no, not everything. Our workload, that's being deployed by ArgoCD is, but our infrastructure and the ArgoCD application itself are not. And this is the common shortcoming that you see when talking about Terraform in the context of GitOps. So let's jump back to our application now. Well, almost done. Yeah, there we go, great. And have a look here. Hopefully, if I click refresh, there we go. We can see that we have an additional resource now. We have the security group policy and we can see that it's being configured with the ID of a security group that we created in Terraform. And if I go to here, hopefully, I get a response from my cash now, which is great. That's awesome. But what about the GitOps principles? Well, we can illustrate the shortcomings of this approach quite easily as well. If I happen to go and delete a pod now, it comes back. Whoop, there it gets recreated. No surprise here. This is standard Kubernetes behavior. This would have happened even if we'd imperatively created our deployment by a cube couple. But if we delete the deployment instead, I should have had that pasted. Again, it now comes back. And this time, it's thanks to Argo CD and the fact that it is continuously reconciling out of claret of state with the live state in Kubernetes. But if I were to delete my application here now, well, then that's not coming back. And therein lies the rub with this approach. But does that really matter? The answer to that's likely gonna depend on you and your use case. But for us, we actually found out that this perhaps bug in our workflow turned out to be a bit of a feature. We found that developers often wanted to be able to play around in lower environments and be able to manually edit applications, leveraging the full power of the Argo CD UI. So here I can disable AutoSync. And because this is not reconciled by Git, we can actually do this. And the fact that we don't have continuous reconciliation in place at the application level, yeah, sorry, allows us to test changes. Knowing that these manual changes would be overwritten on the next time the application was deployed. And this is great because it enables quicker feedback in lower environments. And also, if I go to edit now, enables teams to become comfortable with Kubernetes in a more incremental and controlled manner. For example, here I'm now editing the fully rendered YAML rather than having to learn Helm to be able to do so. If I delete, did I click edit? Backspace, awesome. Now it's important to note that we can also leverage the Argo CD RBAC model here to limit this capability to just develop to the lower environments. So we don't want this happening in prod, and that's great. But by contrast, it's far more onerous to allow this type of behavior if everything is stored in Git. So to conclude, I hope that I've been able to give you a reason to consider that maybe you don't need to get up to everything. Depending on your goals, you can reap a lot of the benefits without having to deal with a lot of the complexity. Perhaps keeping it simple is the best course of action, particularly when starting out and adopting new technologies. And with that, I think I'm actually on time. I'd love to open the floor if anyone has any questions. I can't actually believe I did this on time. Awesome. Yeah. So, you know, I had an additional slide next steps because this is a bridge and it was seen as a stepping stone towards a full yetox implementation. And yes, the next steps involved how we then change this. And there are a number, since we did this work, the ecosystems developed quite a lot and there are a number of new tools out there. There are Terraform controllers from Kubevalla. There's Flamingo plus the flux Terraform provider if you wanted to. We can use as an enterprise solution, we might be able to use Hashicorp's Terraform controller. There are a number of approaches and yeah, we have looked at it. I've since moved on from this team so I don't know if they have, but yeah. I totally didn't think I was gonna compete in time and so I was cutting slides and that was the one that went. Yeah, thanks. Yep. Are you talking about using rollbacks within Argo CD or progressive delivery within Argo CD? Yeah, yeah, the idea of like doing a rollback instead of Argo or just relying entirely on the Terraform state and doing like a redeploy essentially from there. Okay, so we didn't use the features within Argo CD. We went that, I don't even think they had been released at that point in time when we were doing this. Our general approach to rollback was through our CD pipeline. When we initiated deployment, we passed through a specific version and so rolling back for us would either be reapplying the previous version or we would fail forward if it was a quick bug in that way. But we didn't have the advanced use case that you were talking about. Okay. Sorry. Cool. If that's it. Thank you, everyone.