 All right. Welcome, everybody, to our talk on Kubernetes and Crossplane and building a platform API on top of Kubernetes. So there are still some free seats available here in the front on the right side and the couple there on the left, if you don't want to stand in the back. And today, we want to talk about our experience in adopting Kubernetes controllers for managing things who are not core Kubernetes controllers. And we are doing that with a CNCF project called Crossplane. And they have quite the presence here. So I want to have a quick show of fans who has heard of Crossplane before. All right. All right. Hard to ignore Crossplane at this conference, that's for sure, and that's good. So we'll start with a quick intro, who we are. My colleague, Hannes, maybe you want to go first? Yeah. My name is Hannes Bluth. I'm a cloud architect at Accenture. And I live in Frankfurt, Germany. My name is Jan Weges. I live in Berlin. I'm a platform architect at Accenture. I do most of the stuff I do is open source. I'm a contributor to the Crossplane project. I do also do the Berlin CICD meetup. So if you're in Berlin, and if you are interested in connecting and with other folks who are running CICD, then feel free to reach out. It's been on hiatus for the past two years due to some pandemic events. But we plan to pick it up this summer again. And we are also managing the first time CICD deaf room. So if you are joining FOSSTEM in Brussels, hopefully in presence next year, feel free to catch us and the CICD deaf room. All right, so a very quick introduction to Crossplane. So probably you've by now heard many intros to Crossplane. This might be the third one. But let me quickly explain in layman's terms. So what Crossplane does, it extends a service provider APIs, which are pictured here at the bottom into the Kubernetes resource model into the Kubernetes database. So A, as a user, I can QCTA get clusters. And I do get my EKS cluster back from AWS. Similar, I can create Kubernetes clusters via the Kubernetes platform. So I just apply a familiar YAML. And there's a controller running who takes this YAML representation and talks to the AWS API via AWS Go SDK and manage the state at AWS for me. It continuously does so. So that's the Kubernetes pattern, the reconciliation. Every, well, it's configurable, but every 60 seconds per default. So if there's any drift or any changes, then those are overwritten or updated to the state which I want to give that. So this is a Kubernetes example. I can provision Kubernetes via Kubernetes. But obviously, I can create any object at any external API provider. So it could be an S3 bucket, a cloud storage object, an AKS cluster, a Lambda, stuff. So anything, really, which is mirrored into the Kubernetes ecosystem. OK, let's take a closer look at the Kubernetes example. I love to take the Kubernetes example because it's a very complex one. Usually, an S3 bucket or a cloud storage bucket is just a very simple object, and you just apply it so there's no magic behind that. But to provision a Kubernetes cluster at cloud providers or on-prem is usually a very complex thing to do. And so pictured here at the bottom, it's an example stack of a fictional company which you might run. There are two cloud providers, AWS and Azure. There's the code repository, GitLab. There's the Grafana for dashboarding. There's obviously probably your organization runs Kubernetes. So there's the Kubernetes API, Vault for secret store, Styra for policies, Argos CD for GitOps, and so on and so forth. So all those are mirrored into the Kubernetes ecosystem, so you have them available as one-to-one representations. However, that's usually not what you want to push this complexity to every user of your platform. What you most of the time want to do is at least apply some compliance and security. So for example, a famous example is the visibility of an API. In the Kubernetes case, every Kubernetes cluster should be a private Kubernetes cluster and not a public one. That should be a default. With an S3 bucket, it's the same. It should be private by default, for example, and should be encrypted via KMS. So those are defaults you can set. And you also want to make it simple. And in the case for Kubernetes cluster, there are many, many different API calls you need to take to add AWS to create a cluster, then at your internal SaaS services to register the cluster once it's ready, and to install operators and controllers to make your cluster compliant and integrate in the rest of your enterprise. And that's something which happens here in the integration part and in the custom platform part. At the top, you see the simple object which your users of the platform are using when they want to have a Kubernetes cluster and provision a Kubernetes cluster. So it's a very simple one. But under the hood, it goes to AWS and it creates an IAM role, IAM policy, security group, OADC provider, actual EKS cluster, all those EKS cluster add-ons which are available. And it goes then to Kubernetes because once the EKS cluster is ready, it's not ready to hand out. It still needs to run compliance and integration software which you need to deploy. And that's happening via the Kubernetes API then in the same kind of call. So there then is deployed the compliance software for CM or security scanning or so. And all those controllers for integrating which are not available via EKS add-ons. And then the Kubernetes cluster is registered at, well, in this case at Stara which is a control plan for policies for OPA. And it's registered at ArgoCD so that users can use GitOps to deploy their application into the cluster. So they get everything out of the box. But still, that's hidden from the user by the platform team by this integration. So in the end, cross-plane to achieve this, it's a cross-plane, it's kind of a low-code platform so you cannot edit the stuff via UI yet but you can offer an API without writing actual code so you can describe it in YAML. And it allows you to provide kind of abstractions and governance and security to it. All right, that sounds really fun and games. But what are the top nine challenges we encountered when running this approach at scale and in production? So at scale means we are running, it's obviously it's not Google scale, we are running a couple of thousand managed resources. So that's, I guess it's not like super less, it's also not like Google scale but maybe it's something in between and we want to go over some challenges we faced when implementing that. We've been starting with cross-plane two years ago and this helped us to get a very, very good understanding of the product, we adopted it pre version 1.0 and we're able to kind of shape a bit the product to our needs and some of the stuff you see in today's cross-plane is based on our feedback. So the first thing is the cross-plane deployment model. So cross-plane is a Kubernetes extension and by that it's installable in a Kubernetes cluster. So you might go that just simply every application cluster gets a cross-plane installation. So the application is the blue boxes on the right and the cross-plane installation is the colorful representation. So what that gets you from an application point of view, it's a very good integration between infrastructure and application because now you can take the, you can read from an application, you can directly read the connection details of the database or the S3 bucket or so from the same cluster from a secret for example. And also you have a very good heart isolation between tenants because well, Kubernetes tendency model is the best when there are different Kubernetes clusters. However, how do you bootstrap those clusters? So in the end, you can offer a very, the broad range of the cloud provider or SaaS provider offerings into the cluster. However, a Kubernetes cluster is just one of these offerings and with this approach where you are kind of, it's expected you're already running a Kubernetes cluster. So how do you get to those clusters? So the next thing in the middle is, well, you have a central API for infrastructure provisioning so that users go to the central API provision, they are cloud resources or kind of SaaS resources and then they are using those SaaS resources directly. So they go to a central cross-plane instance and create a bucket, any KS clusters and so on and then they connect to those directly and use that. So that's great, bootstrapping problem solved. I have a control plane for all my infrastructure. However, now you have kind of a bit of the soft isolation problem between tenants. Although you're not running any workloads in the central cluster, that's offered by a central platform team. There are no workloads so it's not like resource starvation or so which you might run into which is a common problem if you're running a multi-tenant Kubernetes cluster. However, you are still running a single controller for the entire, for all the tenants and you could still get into some issues when for example one tenant creates a huge load of API resources and then you get rate limited or something because CRDs are cluster-wide in Kubernetes as you all probably know and cluster-wide means there's only a single controller reconciling those objects. So there's no, well at least not a cross-plane no way to chart controllers to CRDs and that's kind of the issue with this model. There are some scalability challenges. So third on the bottom is, well obvious solution you run a cross-plane instance per tenant. So everybody gets a cross-plane instance. However, it's a kind of a tricky thing because you don't want to run the cross-plane instance in an actual Kubernetes cluster because it doesn't run any workloads and it would be very kind of from a resource point of view very cost-intensive to run a dedicated Kubernetes cluster just for the control plane for every tenant that depends on how many tenants you have obviously. So those can be virtualized and there are some ways to virtualize them via open source tools. So a lot of folks use vCluster by loft.sh for running virtual Kubernetes cluster or there's the cluster API provider nested from the special interest group from Kubernetes who does also that and probably a thousand other ways you could also use an offering from UpBound to achieve that and then you can run the controller either inside the virtual cluster or you can run it outside and connect to the virtual cluster. But you get the hard isolation that's solved. You get a separation between control plane and data plane. However, it's a bit more complex setup and probably your adoption goes a bit along this lines. All right, the approach to compositions that's something we face. So compositions are the feature how to abstract resources. So when you build your automation, maybe you are in an application team and you want to start with the application. So build from top down, application on top, cloud resources at the bottom and you want to deploy the application and it should come with like a stack. It comes, it's a vertical integrated stack application uses database, object store, queue, maybe Kubernetes, maybe not, maybe cloud functions or so. But that's something you want to aggregate. So basically you abstract the application. That's very useful if you're starting with cross-plane from like either an application team or a small company. But however, if you're starting in a large organization, that's very cumbersome and very challenging to implement because now you're the one with the SME for cross-plane NF experience with running that. You need to go to every application team in the enterprise and build those abstractions for them. And that's simply not gonna scale. So you start from the bottom. You are the service provider and you want to offer those resources to your users. So from like the Azure example, you have from the AKS, you built a compliant AKS object, compliant AKS API, a compliant blob store, a compliant Cosmos DB and a compliant load balancer. Those are just examples here. So those applications team, they can just like pick and choose what they from a service catalog kind of thing. And then like from maturity point or adoption points, the third example here would be the one where the adoption is quite far already. Platform team would create those stuff from the bottom. So those compliant API objects and then application teams who are already familiar or kind of borrowing resources from the central team, they would write their automation based on top of those compliant objects already. Well, speaking of all those compositions, this might be not a surprise for many of you, but Kubernetes is a favorite, well, someone said that at a conference, Kubernetes is my favorite YAML database. So expect that you need to manage more, even more YAML. So probably you're already managing a lot of YAML. For sure if you're running stuff like the Kube Prometheus stack or something, but it will now get even more. It's like CrossFit is a low-code platform. You don't need to write the go code for just, well, to use it. However, you need to still describe those APIs and those compositions. And that's done in open API in YAML. So there could be a lot of bugs, as you all probably know with indentation and white spaces and types and so on, depending on the approach how you manage YAML. So if it's just like basic templating and templating or writing that manually, that's maybe something which will change when you adopt that. So definitely some stuff I'm very excited about is Q-Lang for when you can do validation and have a type safety and all this kind of stuff. However, it also kind of is a bit of, since it has a very good go integration, it defeats a bit the low-code pattern because now you kind of still go into a bit of writing code. However, you get a lot more safety this way. User expectations. When you apply a pod spec, you get a container almost immediately, so at least you should. But infrastructure is slow. So if you apply, for example, an RDS database, it will take easily five minutes or more. And there are some very familiar or famous examples which take 10, 50 minutes for infrastructure to get ready. And that's something which some users might not expect to see from the Kubernetes API because usually from what they know, the Kubernetes API, it's quick and you get an immediate reaction. So some might just delete and recreate those objects in a very fast way and infrastructure usually doesn't like that. And then obviously, depending on the approach, your users might not know Kubernetes. So there are some users in your org who don't use Kubernetes. They are running Google Cloud Functions or Lambda's or App Runner or Cloud Run or some like data batch products, AWS batch or something. But now they need to kind of describe their stuff in Kubernetes YAML. While they might not run workloads in Kubernetes, they still need to kind of know the API. So you can get around that by offering a UI so that they can more or less create those resources via workflow or via UI, but still in the end, they need to know some API to create those stuff. Okay, I'm looking a bit at the time, so I'll skip the stuff. Another thing now that you're using Kubernetes for not only applications, but also for your platform and the infrastructure, you can leverage the words of the Kubernetes ecosystem and that might be, well, at least it sure is my first example of showing the CNCF landscape in a good way because many, many of those things are, or probably all of those things are integrated with Kubernetes. So I brought two examples here. So for example, since now you have this central layer where you apply all your stuff, you have also a central way for managing policies at this layer. You don't, well, still you should use, for example, those IAM stuff on AWS and Google Cloud and rows and responsibilities at GitLab and RCD and all those SaaS offerings you might use. However, you can do, you can shift a bit more to the left and apply those things in a common language at the same layer. So you've read the same policies kind of for the source code repository, for the GitOps software, for Grafana, for Kubernetes, for the application. So, well, on a very low layer that means the RBAC stuff from Core Kubernetes, but you can also, and probably you should, run some more fine-grained policy execution stuff like OPA, like Kiverno, or any of those things. And what you can do with those is then to have quotas, for example, for your infrastructure. So depending on most, sometimes the billing processes in large orcs are not very cloud-native, maybe. So there might be some subscription models. So someone says, okay, I've like an SML subscription and S is, I don't know, two Kubernetes clusters and 10 nodes. And this you can easily implement at this layer. So that's, it's really easy. And other examples are, for example, running kube-genitor to just really like, remove demo environments after a day or two. That's pretty easy, or kube-cost to manage costs. So this gets very easy. And that was just small examples. I'll now hand over to Hannes for you. All right. So the next point or then the next topic where we had to deal with some issues was a COD maturity. So CODs are very nice to extend Kubernetes, but they were built to be part of the container orchestration at first. And then they evolved to be more than that, where people started building operators rather than controllers that were just inside Kubernetes. And with that, now, as Jan said, there are providers that implement all the resources for their specific cloud provider. And with AWS, that's over 700 APIs. And with Azure, that's over 600 APIs. And then you end up with multiple thousand CODs in your cluster. And just, that just doesn't scale very well. So one of the issues that we had was the Kubernetes behavior around immutability. In the open AP scheme, you can specify that the field is immutable, but it's not observed in Kubernetes. So sometimes your cloud provider also has some sort of API points that can't be changed. I don't actually have an example, but I remember that you couldn't change like the type of node that you're running on or something like that. And then I could say in my COD description, please don't change this or don't allow it to change, but you can. And then the user is confused about why it's not updating in AWS, because that's what you would expect from the API. The next thing that we have is when we're working with APIs, we have some sort of like if it evolves, you want to version it because you might have breaking changes. And CODs allow you to write conversion webhooks, but until recently, that wasn't available in crossplane. And it's being worked on, but it's not there yet. And then on the other hand, even if you have conversion webhooks, it's still more reliable to do the migration manually and to help your users do their migration over time as in the deprecation version that Kubernetes is using itself. All right, I will also skip the rest here because of the time. Concerning the scalability, there was a talk yesterday. It was yesterday afternoon, the COD that broke the camel's back. I'm sure you'll find some information there. Since we're using YAML and our Kubernetes API, you can, we can now deploy everything to, through GitOps instead of CIOps. We don't use a pipeline that executes a Terraform script anymore. It's, you can actually use the same tool like Argo CD, for example, to synchronize everything that we have in our repository to our cluster. And that allows us to use the same mechanisms for applications and infrastructure. But again, this goes back to what is your user expectation? And if you say, okay, you can use the same behavior or the same API for both. They might expect you to have those offered in the same cluster. And then with what we discussed earlier about the, how do we deploy or architecture or different cross-plane control planes? It's just that they might expect to be able to deploy both into the same cluster, for example, with a Helm chart, but that might not work. And so there's some interesting development around multi-cluster architectures and our management and cluster federations. And that is something where we need to figure out how can we make it manageable across multiple clusters as well. And then there's of course a question, what's the end? Like where do we stop using GitOps? Do we deploy our secrets with GitOps? We can use SOPs, for example, to encrypt everything in our repositories and then let it deploy. But it might be more reasonable to use a secret provider like Vault or the secrets manager or something similar to just request the secret and then use that as credential instead. And of course using GitOps with your node group with like a scaling configuration that just creates more conflicts where your providers or your ARGO cities trying to sync your repository. And then on the other hand, you have the autoscaler telling the cluster, please scale down and crossplane in between syncing the options back. So you have a couple of conflicting controllers as well. And then there are options around that. You can set fields to be ignored in some sense, but that is an issue that needs to be solved whenever it occurs. And then I think what we also learned since we're offering a platform to our users, it needs to be documented. And that's not enough to just to write a short entry and oh, please use crossplane, pull your CODs from the server because get CODs is already an operation that a normal user can't do because it's cluster scoped. So we need API documentation similar to the doc COD staff that really well reflects all the APIs and for crossplane, for example. Then there is usage documentation on what are the behaviors of what the user is deploying, what are the side effects, how do they interact. And of course, whenever something changes or is deprecated, we need to release notes, we need some sort of lifecycle management and API and operator lifecycle management. There was another talk on Wednesday on this, I think on Wednesday. And then the last thing is contributions since we are working in an enterprise situation, it's important to be sure that we can use open source, we can contribute to open source. And in our case, we were lucky, we were able to engage with crossplane very early on to shape their product as well so we can make sure that it fulfills the needs that we have at that point or that you need for building a platform. We have also been contributors to the provider AWS and a lot of resources. And we have open source, the couple of other providers like the provider GitLab, provider Argos CD and provider Styra. And then with a provider Grafana, we have built it using the JET implementation or like the Terra JET implementation. And that was then taken over by the Grafana community and they are now maintaining their own provider as well, which is something that might be nice for all the other cloud providers to do as well. Thank you for listening. And if you want to reach out to us, feel free to do so. And if there are any questions, I think.