 Thanks, everyone, for attending our talk on crossplane and Argo CD. My name is Jesse Suin. I'm one of the co-creators of the Argo project and co-founder of a company called Acuity, which offers fully managed Argo CD in the cloud. And speaking with me today is Victor, who you probably already know from his very popular YouTube channel. Okay. Yeah. So my name is Victor. I'm from Rabant, a company behind crossplane. And this is a bit difficult subject because we're talking about tips and tricks. How many of you are using crossplane today? The rest of you might have trouble to follow up. This is going into all the troubles that this guy was facing with crossplane. Anyways, so let me give you a very quick introduction, very, very quick introduction into crossplane, right? And I'm going to show it in a way from the point of view of history, right? At the very beginning, we got configuration management tools, Chef, Puppet, Ansible, you know, all those things. Some of you are probably using Ansible still, and they were all based on the idea that things are mutable, right? I know that they can do immutable things, but based on mutable principles, you can call that the first generation, what later on became infrastructure as code. And they were mostly based for better metal, you know, real servers, you know, before virtual machines and before cloud and all those things. And then we got into the second generation of such tools. We called that second generation infrastructure as code, with Terraform, Pulumi, RAL formation, all the good stuff that many of you are probably using, right? Now, what is happening right now with the emergence of Kubernetes is that we are moving into the next phase, and that next phase is using Kubernetes as a control plane, right? Using Kubernetes with all the good things that you all... How many of you are not using Kubernetes? Okay, I'm just checking. With all the good things and all the things you like, right? Now, extensible APIs, custom resource definitions, drift detection, reconciliation, all the stuff that we like, right? And the whole idea is that thinking of Kubernetes as something that transcontainers is wrong, right? That containers are just one of the implementations of Kubernetes scheduler, and we have many, many, many others. And one of those others is cross-plane. So what we're trying to do with cross-plane can be described through two main areas, right? One area would be one-to-one matching between resources in Kubernetes or custom resource definitions and something on the other end, right? So if you need... If on the other end you have EC2 instance in AWS, then you have a custom resource that you can apply in your Kubernetes and use all the good things that we like with Kubernetes to manage the EC2 instance, or whatever else that is, right? Ordering pizzas. It's up to you. I mean, depends on the providers. Now the second and equally important or more important part about cross-plane is the ability to create compositions, right? Compositions are a way for you to create your own custom resource definitions with controllers that define what something means to you, right? What does it mean to you to have a cluster, to have an application, to have a database, right? What does it involve? And as such, when you combine all those things, you can expose through those custom resources your work, your operational knowledge to everybody else in your company. As effectively, we are through compositions enabling you to create your own internal developer platforms and provide services to application developers, testers, or whomever else is working with you, right? Instead of waiting for somebody to open a Jira ticket, right? How many of you like Jira tickets? Okay, cool, cool. Good audience. With that in mind, I'm going to leave it to Jesse, because he's been working with cross-plane in Nargo for a while and has some really good insights. Yeah, thanks, Victor. So today I'll be covering how a Q&A uses cross-plane and some lessons and learnings based on our experience with it. So let me start by explaining, like, why we chose cross-plane. So the main reason we wanted cross-plane was we wanted to manage our infrastructure the same way that we managed our applications. And that's using things like Argo CD and GitOps. So we wanted to apply the same tooling, the practices and processes to our infrastructure, especially when we want to encourage, you know, these DevOps-based workflows. So these days, the line between your infrastructure and your applications are getting more and more blurry. And so there's a lot of this coordination that needs to happen between your cloud resources and your Kubernetes resources. And the best example of this is things like IAM. So Amazon has IRSA and Google has something called Workload Identities. And even though, you know, we're a small team and we don't really have like a separate, you know, platform engineering team, we still wanted to provide the simple interface to developers to basically lay the groundwork for scaling up, providing standardized self-service infrastructure and just exposing a much more simpler set of knobs. And this enables us to treat our infrastructure more like cattle. So it's not like a big deal to, you know, create more and more of these things. So cross-plane enables just, you know, treating these things like cattle. So these are the type of resources that we deploy today using cross-plane. We're primarily AWS shop and we only use three cross-plane providers, the AWS provider, Helm provider and the Kubernetes provider. We do have a presence on GCP but it's quite small and currently they're not managed by cross-plane but I think in the future that could change and we'll be, you know, pretty prepared to use cross-plane for both managing GCP stuff as well as AWS. So probably the most powerful feature that you want to be using cross-plane for is its ability to compose resources into higher-level resources, what they call compositions. And here are some of the compositions that we created for our own internal purposes. And so we started with up-bounds reference architecture. So they publish a bunch of reference implementations for AWS or GCP, Azure. And then we tweaked it a bunch of things to kind of suit our own needs. For example, we changed it to run in three availability zones instead of two. We enabled KMS secrets encryption. We added the AWS add-ons and then we also did something, created our own add-ons which I'll explain in a little bit. But if you're looking to get started, I highly suggest using those reference implementations as a good starting point on modeling your compositions. And also you'll see this one composition up there called the IRSA composition and that's specifically to handle IAM roles and policies associated with Kubernetes service accounts and I'll get to why that's needed in a bit. So and along with those compute and networking compositions, we also have this concept of a add-on composition for Kubernetes clusters. Here I'm showing six such add-ons and each add-on is different from the other. So one example is the one will require different policies. And so their policy manifest will be different. Some is installed using Helm. The others are just installed through raw Kubernetes manifest. And so you can see which of these lower level objects are composed into this higher level composition. And so just to give you an idea of what one of the add-ons looks like, here's the spec for our carpenter add-on. If you're not familiar with carpenter, it's basically a better version of cluster autoscaler primarily for AWS. But so in this spec, you can see we just expose a few of the carpenter parameters that we're interested in. And so looking at this example, this cluster will want to use spot instances and it wants to only use these four instant classes and it will have like a max autoscaling of 100 vCPUs. So all of these values will get passed all the way down to the carpenter, config map, their provisioner. But how that happens is abstracted away from the user deploying this. And this is just an illustration of how this nesting of composition works. So you can see here we have this top level EKS composition and inside that you have things, your normal manage resources like cluster and node groups or they see provider. But then you can include your own self-written compositions like my carpenter add-on. That's composed of like a helm release and instance profile, provisioner. And then that thing encompasses another composition. We wrote the IRSA one and inside that is your policy, your role and your role policy attachment. So let's talk about one of I think probably one of my biggest annoyances about how privileges are handled in AWS. So AWS has this feature where you can give your pods IAM privileges without using long lived static credentials. And they call it IRSA, it's called IAM roles for service accounts. Google has their own version of it called workload identity and they work basically the same way. So the way it works is you have your normal IAM policy and your IAM role and they're attached to each other. And to give IAM privileges to a pod, it has to go through an OIDC provider which you create along with your cluster. Then you need to reference this OIDC provider in like two different places in that IAM role. And then you have your service account. The service account is the one that you want to give privileges to your cloud. And this service account has to have an annotation which references the ARN of the IAM role which we want to impersonate. And that IAM role should then have a reference back to the service account so this bi-directional trust is established. So this is a really pain to manage, but we use cross-point to kind of help coordinate and manage all these references. So to simplify things, we have this composition which we call an IRSA. And this IRSA resource accepts all of those previous things that I talked about as inputs. So what service account name you want to give permissions to, the name space it lives in, the OIDC of the cluster that name space and service account is, and then the IAM policy, the permissions that it wants. And so what the IRSA object does is it then creates the underlying AWS managed resources with all of those references and back references in place where they need to be. And so the end result is that we just have to create this one IRSA object. And then the composition kind of takes care of all these child resources and cross-referencing. So this slide is just showing the fact that we also use cross-plane to deploy managed resources directly. And managed resources are kind of like the lowest level provider resources that have a one-to-one mapping with the cloud provider. So there's going to be a lot of times where you just want to create cloud resources not tied to any other objects. In our case, we want to create route 53 entries or routes, transit gateways. Or maybe we just want to create a bespoke IAM role to give to one of our services. And so the example that we use as far as, let's say you have you create two VPC networks. In our case, it's an EKS VPC and then a database VPC. And then later on, we want to connect those two. So then we'll use cross-plane to deploy a transit gateway and then the attachments to connect those two VPCs. So you might be wondering, OK, where does Argo CD come into the picture? So far, I've been talking about add-ons and how we use cross-plane to install those. But what about Argo CD? And the answer is we actually use both. So there's some trade-offs that you should understand when you decide you want cross-plane to install something through a help chart or when you want Argo CD to do something. So the advantages of using cross-plane to have an add-on installed as part of the foot cluster object is that everything is handled in that composition. So you just basically apply that one top-level EKS object. And then 20 minutes later, you have a fully functional cluster. It has all the add-ons it needs. It's ready to use out of the gate. And this includes things like IAM privileges to the controllers that need it. But the flip side is that these add-on versions are more cumbersome to manage. So they're tied to your EKS composition that you wrote. And so let's say tomorrow you want to upgrade AWS load balancer. You have to kind of modify your composition to get the new help chart version and then publish that new composition. And so the upgrade process of that add-on is much more tedious. Also, if it's part of that composition, it also means that all the clusters that are created from that composition, they all get the same version. And so you don't have that flexibility of maybe picking different versions for different clusters. They're all kind of tied to what you chose in that composition. On the Ariba CD side, the best benefit you get is you can use a GitOps-based workflow for managing your add-on. So this means you can control the versions for different environments. Use GitOps to modify the config map used by those add-ons. And unlike a composition, you actually get to choose which add-ons get to be deployed in a cluster. So with the composition, you're basically deciding these end add-ons are part of it. And there's no way to kind of turn off and on. It's an all-or-nothing kind of decision. Some of the drawbacks of Argos CD using to deploy is it can't do this coordination and referencing passing of IAM to all the resources. So that's one limitation of Argos CD. And second, if you're using Argos CD to manage your add-ons, it's going to be like the second step. You create your cluster first, and then followed by installing some cluster add-on. So I mentioned we use both. And so we decide a different strategy depending on the type of add-on. And so there's three different kinds. The first is what I call critical add-ons. These add-ons are kind of essential for a cluster to function. For example, without auto-scaling, you can't even probably install more add-ons because the cluster doesn't have capacity to automatically scale up as you throw more pods at it. And so for these critical add-ons, we basically bundled it as part of the EKS composition that we wrote, and we consider it just part of the distribution. So then next is what I call the IAM-only add-ons. And so these add-ons are ones that need some coordination with AWS resources, namely IAM. And they need to do that whole cross-referencing stuff that I showed in that previous slide. And these are also things I consider optional. I don't need a certain manager in every cluster or external DNS. So what we do with this is that we use cross-plane to install just the IAM portions of it. And if we do want the actual controller running in the cluster, we'll follow it up by using argocd and install it. And then by then, the service account is already pre-created for them, and we don't have to deploy or do anything in the cloud. And finally, we have the other stuff. And so these add-ons have no dependency to AWS, and we just treat them kind of like normal applications fully managed by argocd. Here are some examples of some add-ons in each category. And currently, the only three that we consider critical and that belong in every cluster are carpenter for auto-scaling, the AWS load bouncer controller, and external secret. So every cluster that we provision will get those three as a default. And then we have these add-ons, which maybe the cluster might need. And so for those like external DNS, certain manager, ADOT, those all kind of require some AWS privilege. And so we'll kind of pre-create those policies for them, but not the actual workload objects, because that is handled by argocd. And then finally, that last column are the example of cluster add-ons, which have no tie-in to AWS, and can be managed like a normal communities application. Next, I want to talk about what we feel is a best practice for using crossplane. One issue you might face is that when you release a new composition or update a provider, it could be risky, because when you're updating those things, all instances of that composition get updated at the same time. And so it could mean that you break all your clusters at the same time. So what we do is we run multiple crossplanes, one per environment, so test stage prod. And then we progressively promote a composition through the environments just so we can control the blast radius if something goes wrong. Here's an illustration of this. So here we have those three environments, or AWS accounts. And you can see there's a crossplane in each account to manage the infrastructure of that account. And when we roll out a new version of our composition, then it first goes to test and stage and prod. And meanwhile, argocd is deploying to all the clusters that you see in this picture. So I'll go over some kind of tips and tricks, specifically with using argocd and crossplane. Because there's definitely some things you should turn on and off when you're using argocd and crossplane. So probably the most important tip I can give is this argocd feature called annotation tracking, which is not the default, by the way. It probably will be the default in the future, but as of now it's not on. So if you've been using argocd and crossplane, one thing you might notice is that argocd has a pruning feature. And it wants to delete resources which think it are no longer managed in Git. And it does it by seeing the live object and looking at a label and says, oh, that label was tied to this application, but it's missing from Git. Therefore I should prune it. The thing is crossplane carries over that label that you apply in your claims resource to the composite resource, which is kind of like a child of the claim. And so because of that, if you're using label-based tracking, argocd turns around and says, oh, I better go delete it. It must be something they removed from Git. So what annotation-based tracking does is instead of labeling it with a simple name, we annotate it with a lot of metadata. And based on that metadata, we can we understand if something was a label or annotation was just carried over from some parent object, and we will decide not to prune it. So this is a feature you definitely want to turn on. Even if you're not using crossplane, this is a better way to use argocd. And it will be the default in the future, I'm sure. Yeah? You want to speak, you said? No, no, I said next week. Oh, next week. I thought you said you wanted to speak. OK, I have some performance tricks. One thing I want to know about crossplane is it can install hundreds of CRDs. And you're probably not going to be using most of them. The problem that happens is argocd wants to list and watch those resources, even if you're not using them. Because you might be using them and it needs to discover them. So this causes a lot of memory pressure on both the app controller of argocd, as well as the company's API server. And this is just an illustration of what's happening. So argocd has a feature called resource exclusions. And the idea is that you can basically tell argocd to pretend like these resources don't exist. And how this helps is that if argocd doesn't know they exist, it won't actually monitor those, which lowers the number of connections and pressure to the controller. Another use of resource exclusions, and this improves the usability of the user interface, is something called, crossplane has this object called a provider config usage. And for the most part, it's an implementation detail of crossplane. You don't ever manage those objects. And you probably shouldn't ever care that they exist. But the problem is they actually show up in the argocd UI because they're always children of the low level of managed resource. And so one thing to just declutter the argocd UI is to also ignore these type of resources. And then you'll basically reduce your resource tree by one whole level. Another problem with high number of CRDs is it makes something called API discovery pretty slow. So you might actually notice this from like Kube CTL itself. Kube CTL has to kind of ask Kubernetes about, hey, what kind of CRDs do you have in the system? If you have hundreds of these things, it actually can get slow. Argosd does the same discovery. But right now there's this bug or performance problem which makes it inefficient. So you can work around this problem by just bumping up the QPS, which we're currently getting throttled on, to the controller. And this should be a temporary work around that you don't need in the future. Health checks. OK, so if you don't know, crossplane has very homogeneous statuses for all of the resources it creates. So they all get this condition, like is the resource synced or is it ready? And so Argosd has this feature which lets you write a snippet of Lua script and evaluate the object to see if there's any return if it's healthy or degraded or progressing. And so here's this simple resource health check that you can write for that DB instance that I showed in the previous slide. And as you can see, just iterate the condition, see if it's not synced. And then if it isn't, it'll cause the resource to show up as degraded. And then you can see in this little screenshot, it'll actually get surfaced to the Argosd UI and you can quickly identify problems in your crossplane resources and get a lot better visibility. OK, some challenges we faced. One thing is if you want crossplane to adopt existing resources, it could be challenging, especially for AWS. And this is because AWS has this behavior where it just generates random IDs for some things, like VPCs, subnets, and security groups. And crossplane, when you recreate a resource, let's say you wanted to take over an existing one or maybe you're just moving your migrating to a different crossplane cluster. So when you apply it a second time, it doesn't know that existing one already exists and should adopt it. So now you're left with two VPCs. There's ways around this. You can annotate your resources with an external name so that you can take it over. But this kind of breaks the whole GitOps user experience because now you're incorporating IDs in your manifest when you didn't have to originally. But one tip you should practice, and I recommend this, to use formulated names for your composite XR resources and instead of relying on generate name for it. And that way your AWS resources get kind of human readable names. Some things, limitations in crossplane, but I think they're being worked on and will be addressed in the future. As of now, a composition, you can't conditionally create resources depending on input parameters. So for us, I would love the ability to enable or disable a cluster add-on based on some true false values in the parameters. Another example is we might have a dev version of a database that has more instances versus a prod instance. So today we resort to scripting and code generation to create two versions of the composition that are slightly different. And another limitation is we, crossplane doesn't have a ability to reference another resource and get status information to feed into the input of the current resource. So if you're currently working all within the same composition, it's fine because all the references happen in the same YAML. But if you wanna kind of a more loosely coupled process, or maybe you have an EKS network, another EKS network and you wanna attach them, today what you have to do is copy the VPC ID, paste it into that transit gateway to connect them, and then now they're connected. So it would be great if we could have something where I can just reference those other two things rather than copying around Amazon IDs. So far we've been running crossplane for the better part of a year and it's been working great for us. These are improvements that we wanna make internally. These are not things about crossplane. So currently we run like an dedicated EKS cluster as a crossplane control plane. And so the more environments we have, like then it becomes more expensive because you're kind of dedicating an EKS cluster for this. So it should be possible, I know it's possible to just run crossplane in a smaller cluster, like a K3S cluster, which in that K3S cluster can run in a real EKS cluster. And so now we can reduce our footprint to just one EKS instead of like three or four. Okay, so that's kind of our learnings and experience with crossplane. So I think we have some more time for questions. A few minutes for questions, yes. About crossplane or Argo CD or anything else? We have time for a couple of questions. Can you put your hands up? I see one there. With Composition Revision arriving soon in crossplane, do you think you are gonna keep a separate instance of crossplane to test between environments? Sorry. With Composition Revision arriving crossplane soon, it's a feature flag today. Do you think you're gonna keep a multiple instance of crossplane to test for each environment? Are you gonna have all? I'm not sure yet. It's too early for me to reveal. Still going left to right. Ask me again in a couple of weeks. Please come to the front for any more questions. I think it's pretty difficult to hear. Give it to us. We thank our speakers. Thank you. Thank you for coming. Thank you.