 All right, all right. Let's get this thing going here. All right, so let's get going on the first slide. All right, so we are here at the end of a maybe a long day for everybody on their first full day here. And if you have been to the other cross-plane focused sessions today, you may have seen me now for the third time. You may be getting sick of me by now, but I promise I'm not sick of any of you all. So let's get this third talk here going. So we're going to talk about a success story, basically, is what we're going to be talking about. It's a streamlining infrastructure with cross-plane and a transformation story on it. All right, so we are going to cover, this is all about Clemont's company and his team that he's a platform engineer for, a company called Consensus. We're going to talk about where they started, some of the pain points they had finding cross-plane and things that they were hoping cross-plane could solve for them, and then the journey they took to get to the successful platform that they've built now. So we'll dive into three key areas of the functionality that they've built, and then we'll wrap it up with a conclusion of it all, some of the key lessons, and then a Q&A. All right, so my name is Jared Watts. I am one of the creators of the cross-plane project. I'm a founding engineer at our startup Upbound that created the cross-plane project. This is my second open-source CNCF project that I'm a part of. The other one I've done is the Rook project. So I love building open-source communities, and then also I kind of split my time in between California and Belgium. One of those has good waves. You can guess which one that might be. Here you come up. And so as Jared was saying, I've been working at Consensys for the past two years, and I really like creating stuff, and it's not only confined to work, I really like music and photography. So with that further ado, let's get started. So a little bit about Consensys. Consensys is a blockchain technology and Web3 software development company. It was founded back in 2016, and we have about 800 employees today. Our main offering are MetaMask, so it's a self-consensual wallet. Self-consensual meaning you're the owner of the key, and the wallet enables you to connect and interact with Web3 application. Then we have Intra. It's a Web3 development platform, so it's a set of API and tooling to develop Web3 application. And our latest offering is Linear, so it's a ZK EVM L2 network. To keep it simple, it's a network that enables scaling on top of Ethereum. So let's talk about a cross-plane journey. So our starting point looked like this. So we will have teams, and we had embedded SRE models, so we will work as part of a multiple team. And the environment looked pretty much like this. So we'll have a team using time form, another one using Ansible with cloud formation, and the third one using Pulumi. Also, the maturity was great different across teams. So for example, Team C, they had a dedicated SRE, and DevOps, Team B only DevOps, and Team A didn't have anyone. Also, the interaction that we had with teams is that basically we didn't offer any service, so they would come to us asking, I need a database, I need to scale. And one of the tasks that we had was also maintaining what they had. And if there was a vulnerability, well, we needed to patch all of them, and it was quite inefficient because it was deployed in a different way for each team. Also, so the inefficiency that we are seeing is that we, having all those tools, it was tool-specific knowledge. And so that's quite a learning curve when you arrive in the team, but also working as part of multiple projects is quite a collective load. Also, as we saw, it was mostly human interaction. So our voice, tickets, our message, so it was a little bit inefficient. Also, what we are seeing is that teams were reinventing the wheel. So they were solving the same problem, but in different manner because they didn't use the same text tag. Also, the many inefficiency was our time to market. As we didn't have an identifier we were doing thing, it was hard to know how to do it. Also, teams were building for their own projects, so it was kind of a pet approach and it was unreasonable for the company. So our plan was pretty simple. I have a platform approach so that mostly consists of having golden path and having self-service. How did you want to do it? Well, pretty easy, leverage Kubernetes and for the simple reason that it has a lot of benefits. So it has an extension model with the API. It also supports versioning by design and we have discovery of the APIs. Also, Kubernetes has a great isolation model with Namespace and authentication with its airbag. Also, what's interesting in Kubernetes is the reconciliation model. To offer the conception of our platform, well, we wanted to do something pretty simple at first so we could use Argosy. What did you want to target first? Well, new projects and very simple ones, so state-led first. So if you wanted to do this before cross-plane, so before joining Constancy, so I was working at Federico, and we had built this whole platform on top of Kubernetes and automated the lifecycle for the developer. So we would have operators managing registry and let's say Artifactory and Secret Store and Vault and we had built all those operators. So this was quite a learning curve because we needed to learn go a little bit of the Kubernetes internal and it was quite challenging to manage. Also, those kind of components were kind of specific purpose-built. We did open source some of those operators but there were some specificities to our environment and so it wasn't really portable. So we've got an idea about where Clemont's team, the problems they had, they starting their journey and trying to attack some of these problems and we're running into some issues and some challenges with those. So this is when you start looking at the cross-plane project and start seeing a couple of its different focus areas and how those are going to help the journey that his team has started on. So the very first thing we're gonna look at here is that cross-plane, it's a framework it allows you to build your own platform without writing code. We just saw on the last slide that his team started writing custom operators and having right controllers and all this sort of stuff. That's more software to maintain, more complexity. And so a better approach for a lot of scenarios is to simply be able to declaratively describe what you want the platform to do. Not write the code, the imperative code that instructs like line by line what to do but just describe the situation you wanna get to, let cross-plane deal with all of the provisioning, managing, reconciliation, all that stuff, right? Now, you can go past that of, you can imagine some scenarios where declaratively specifying what you want cross-plane to do isn't gonna cover exactly what you want. So you can go beyond that and start writing a little bit of code, custom code specific to your platform's needs. With the new feature in cross-plane called composition functions. But you can do a lot with just a declarative approach. So beyond that though, we also would need to look at how cross-plane is an API-first design, similar to Kubernetes itself, right? Where you as the platform engineer can codify your golden paths, like this is how we do infrastructure at consensus. You can codify that into cross-plane resources and then you can enable your developers to self-service and get that infrastructure on demand when they need it, but the golden path has been kind of captured somewhere, right? So that's that separation of concerns with what the platform engineer needs to deal with and what the developers are faced with with the simple abstractions that you expose to them. And then once we're all in API-first design, we're all using the same, speaking the same language, then we can integrate really nicely with the rest of the tools in the Kubernetes ecosystem and we're taking a consistent approach with applications and infrastructure and then you have similar experiences across those, like label matching and things like that. All right, so some people may have seen some of this stuff today, but a quick refresher for people that aren't super aware of the composition model in cross-plane is that it's got this idea of composing together multiple resources into a simplified abstraction that your developers get to use themselves, right? So our developer here on the left side of the graph, she's gonna just be able to create a simple claim, like I want some sort of infrastructure resource, then underneath the covers, a cross-plane composition that you, the platform engineer, authored, will specify specifically these are the resources that compose that high-level abstraction that the developer is asking for. To make that more tangible, we can look at an example where the developer, all they're gonna have to worry about is, yes, I have a deployments and a service and whatever for my app, I also need Postgres. So in the same way that I'm creating my app, my deployments, my container stuff, I'm going to also create a Postgres instance that is a simple example of an abstraction of a platform API that the platform engineer has cobbled together, composed together over time, right? So that small Postgres instance that the developer needs to deal with, something very simple underneath the covers, we see here that that means the platform engineer has made a composition for AWS, that that means an RDS database, the database parameter group, security group, all that sort of stuff. It could be AWS, it could be GCP, Azure, whatever, but the developer is faced with a simple model, a simple abstraction of Postgres and then all the golden paths, the configuration, the organizational policy, all that stuff is under the covers, underneath the API line. So to give you a rough idea on how we met progress on this, so this is the time. So we started back in August 22, we were POCing on AWS and in parallel, we also started internal discovery. So we talked with all the stakeholders across the company and we wanted to better understand the needs and how we could support them. Back in December, we had written some documents and were requesting comment across the company and we started developing the MVP. Then in March, we had our first version of the platform. So really easy, so we onboarded our first client and it was a very easy application. So we just had the Kubernetes cluster and a registry to deploy the application and so that was deployed to production. And then we started adding more and more resources so to support stateful application. So we added support for LDS, for S3 bucket, for mail service, SCS, and key management with KMS. And then in June, we deployed our first stateful application to production. After this, we started the development of additional resources for blockchain workload. Blockchain workload are a bit more complicated in the sense that you need to customize a bit the dongle group, have access to storage class, auto scaling, and have some caching, for example, with Freddy's or Mcache. And then in September, we had the POC of running blockchain node on the platform and we also started the development of a UI and a backstage integration because I think there are some very interesting synergies between backstage and cross-plane. So now let's talk about one of the key features that we've built into the platform, which is multi-tenancy. So the idea is that we have this single control plane that will hold the composition, so that's our automation. And then we use namespace that will represent the different environment that we have and so those claim will take the composition from the cluster. Also, we wanted to have a one-on-one, one-to-one relation with our cloud provider account so that when a claim is inside the namespace, it correspond directly to our cloud account. How did we do this? We are building a provider config reference inside our composition. So if you look on the right, we have our claim inside the namespace, Timer8dev, and as part of our composition, we are patching from the namespace of the claim to the provider config ref on the managed resource. And so that's how we are selecting the appropriate provider config for this namespace. A nice advantage of this is that you have control over the naming convention. And this make it easier for our user and also for us for troubleshooting purposes to have naming on those resources. So if you look on the left, we have our claim as part of the namespace and inside our composition, what we'll do is that we are gonna patch the metadata and name of our managed resources and we're gonna use the namespace and the name of the claim to build that name. Also keep in mind that if you are using managed resources as part of other composition, you might have conflicts. So you can, for example, prefix with the XRD name or the composition name if you only have conflicts. So now let's look at how we manage all of this. So surprise, we use crossplane to manage our tenant system. So we have this cluster scope resource because we want to have uniqueness of tenants inside our control plane. And so on the right, what you see is the managed resources that we generate for this tenant resource. So we're gonna create a management namespace. So it's not for the user, it's only for us. Then we're also creating an observatory tenant. So before starting this platform, we had deployed an observatory platform using LGTM. So Loki, Mimir, Grafana and Tempo and we build a custom provider to be able to create a tenant into that platform or retrieve existing credential if the tenant already existed. Also what we are doing as part of creating this tenant is that we deploy Argosily instance so that teams can right away consume the platform. And lastly what we are creating is subdomain and certificates so that when teams deploy a cluster, they can expose their application without any further configuration. Once we have created this tenant, we're able to claim environments. So that's why we didn't have a namespace on the tenant, but now we have the team A namespace. And this environment, yeah, let's call it dev. So what do we do as part of this composition? We are creating a namespace and this one is gonna be where the user will consume the platform. Also we are building the provider config to access the right AWS account and we are kind of doing the baseline on the account. So creating roles, policies, collecting to OADCs. And finally what we are doing is that we are configuring Argosily projects. So if we take a step back, pretty much look like this. So on the left we have our management API and on the right we have the consumption. Now let's talk a little bit more about our Argosily isolation. So as I said, the tenant is creating an Argosily instance and then the environment is gonna create projects. Project is the isolation mechanism within Argo. And we are creating two different kind of projects. So one is for infrastructure resources and the other one is for the workload. So now let's see the difference between those two. Let's take the first one on the left, the infra one. So if we look at it, the destination is gonna be in cluster because it's gonna contact the control plane and we are gonna patch the namespace with the corresponding tenant and environment. So basically it's gonna be the namespace dedicated to the user. Also, so this is the control plane so we don't allow any cluster scope resources to be provisioned but we do allow some namespace resources and this is gonna be kind of the catalog that we offer as part of our infrastructure API. So let's say we have Kubernetes cluster and S3 bucket. Now if we look at the other one at the application project, the destination is gonna be everything except the control plane. And then we're gonna blacklist some of the admin namespace on the workload cluster that we manage. So for example, let's say Qt system, you know this. Again, we don't allow any clusters scope resources but once they created a namespace, developers are kind of admin inside the namespace. They can provision any resources. So now let's talk about the second feature that we implemented which is claim reference. What do I mean by this? So let's say I created a cluster and now I want to create a database and connect this database to the cluster. So if you look at the SQL instance claim, the spec as a cluster field and the demo is gonna be the metadata name of the corresponding cluster that we want to connect. If you see in the cluster, we have a spec region defined but not only in the SQL instance and we're gonna select the appropriate region base on the cluster. Also what we are doing is that we are connecting the DB to the cluster VPC using label matching. And finally we are generating credential or a service second binding in the workload cluster so that developer can access, while an application can access the database. If we take this a step further, this is what it could potentially look like for our whole application. So the starting point will be the cluster on the top left and then we have an API to define namespaces. So it's gonna reference a cluster to deploy a namespace and then we're also able to, developer also able to deploy service account so as part of a namespace. And then this service account is gonna be used by other resources to give permission to access the resource. So what I didn't show you previously is that the SQL instance is reference a cluster but you can also reference a service account to give the permission. For resources that don't need a direct attachment at the network level or that can be connected to multiple workload like a bucket, we implemented a resource that we call service second binding. And this one is gonna take as an input the reference of the bucket and the reference of the service account. And so having this, our service account will be able to access both the database and the S3 bucket. Also we have kind of an extension model for Kubernetes. So for example, developer are able to configure additional storage class if they want. All right, so we're about to talk about a particular part of the Clemont's platform here that uses heavily the feature called environment config within crossplane. So I wanted to kind of take a step back and talk about that feature and what it's meant to solve. So the environment config feature is, it is a good way, like a good way to think about it is that it provides runtime information to your crossplane compositions. So you can write a single composition and then you'll have it run in different environments with different environmental contexts to different environmental information like the dev environment, staging environment, production environment, whatever it may be. And that single composition can behave differently because of this environmental input that's coming into it. So what it looks like is it's kind of like crossplane's version of a config map. You know, it allows arbitrary unstructured data, key value pairs, that kind of stuff, right? So it's a way to stash information from wherever you need to so that crossplane compositions can access it and manipulate it and all that sort of stuff. So there's two big scenarios to think about here in which sources of data going into an environment config, one of those is going to be from things outside of crossplane entirely. So external data sources and systems. A common one to make that more tangible is like, you know, your CI CD system, get ops, that type of thing. Your CD system could, as part of deploying things to the control plane, it could deploy an environment config with information about that particular dev or product, you know, prod environment, whatever it is there. And then another one that we're gonna see a specific example from the consensus platform is within compositions themselves. Composition can write, you know, generating resources. It can be writing information about them, status and data and stuff like that. Write that to an environment config and then later on a totally different composition can use that information from the environment config. So it's a bit of a, it's a way to share data across compositions as well. This feature has been around for a little bit. It's, we first released it as an alpha feature in 1.11, which is back in January. So coming up on a year now, one thing we didn't expect to happen was how many people were gonna start using it. We released an alpha feature and people started putting it in production pretty quickly. So it's very popular and we have a bit of work cut out for ourselves in the 1.15 milestone that we're working on just as of this week to mature it to beta and make sure that it fits the right shape for the usage it really needs and the right API. So if you have feedback about the composition environments, environment config stuff, there's a SIG, a special interest group for it. We encourage you to join that on the crossplane Slack because this is the time to get that feedback in before we continue maturing the feature. All right, so now let's look at how we build the reference system. So as part of our composition, we are gonna create both the managed resources. Let's say we have an X cluster resources, so this is our cluster automation. And so we are gonna have managed resources to create this cluster. And also as part of the composition, we are gonna have this environment config. And so if you look at the patch and the claim resources that we have, basically is gonna copy some of the field there. So as part of the environment config, the data that we're interested in is the data that we are gonna reuse as part of other composition. So that could be, for example, the account ID that we needed to template some policies, the region, we saw it in the case of the database, our IDC endpoints, that's used I think for the service account that we have. Also what we are adding as label is the corresponding namespace and the cluster. While we have this, it's to be able to retrieve the environment config as part of another composition. So this is our SQL instance. And if you look at the configuration that we have as part of our configuration is a selector. The selector is gonna use a match label. And so it's pretty easy. We're passing the namespace and the spec cluster that will correspond to the name of the cluster to select the appropriate environment config. Now let's talk about cluster components. So we talked about Argo CD, but the tenant instance. So that's the bottom one that you see. We didn't talk about the Argo CD instance that we have at the top. That is the one that is gonna manage our control pane. And so in the middle, you have the namespaces that are representing our different environments. What happens when a cluster is created as part of those tenants is that it's gonna be both added to our Argo CD instance in the control pane and to the tenant instance. So the tenant instance is gonna be so that user can deploy workload to their cluster and the control pane one is gonna be to manage cluster components. So those are cluster components work. Let's start on the bottom. So we have a claim and this will generate the next cluster. And as part of our composition, we have a cluster manager resources that is part of the Argo CD provider. And this is gonna manage the cluster that live inside Argo CD. And then inside Argo CD, so there is this concept of application set. An application set will generate an application based on the cluster that we have. So we have kind of this dynamic behavior when cluster added is gonna generate a new application for that cluster. And then finally, so the application is gonna sync back to the cluster corresponding to the claim. So we do manage some admin components that are installed by default on the cluster, but we also have an add-on mechanism if teams wanna have additional components. So if you look at our claim on the left, we have an add-on field with, for example, downscaler. That's a controller that we use to turn off workload on the weekend. And so we have true or false value. And as part of our cluster composition, this is gonna get passed down to the cluster Argo CD resources. And this is gonna be set up as a label on the cluster. We do this because as part of Argo CD application set, you can use the generator cluster and select cluster base on those labels. So for example, when we defined our downscaler deployment, we say that we want to match the downscaler equal true label. And what's really nice with this is that if they change the value, it's gonna remove the application. Also what's pretty nice is that you can use the default value of the XRD. So for example, we didn't define the reloader add-on on our claim, but the default value is false. So it was added to the cluster label. Also you can do more complex than Boolean. And for example, we also have an auto-scaler field with carpenter. And for example, you could implement different cluster auto-scaler for your clusters. All right. All right. So we have seen a bunch of details about the platform that Constance has built. And the success story with it. So that was a lot of the practical, specifics and technical details. So let's hop back up to wrap this session up today with a high-level conclusions out of this. So we saw that Constance was struggling when they had a different infrastructure and platform approach for each one of the teams. So to be successful, you've gotta take a consistent approach and have a consistent platform story for the works for all of your teams. Another thing I think we saw that was really, really interesting is that they took an approach of a unified single control plane that can, it's multi-tenant, right? So it can handle the needs of all the environments for all the teams and these sub-environments for each team too, like production and staging and all that stuff, right? So that is really nice to kind of consolidate and have a single control plane consistent experience that's dealing with everything your teams, your organization needs, but you'd have to take the right approach with multi-tenancies. So you've got the proper isolation and things are set up correctly, right? We also saw that a platform will not scale unless you are taking the time to automate and codify what the golden paths are. You define that this is the way that our organization does infrastructure and you capture that, you codify it, you get it in a repeatable pattern so that your development teams, new teams, new environments, all that sort of stuff can take advantage of those golden paths and self-service and get those when they need it. And obviously that requires automation, right? If you have a human in the loop there, that's going to slow things down, that's going to get, you know, go back to the days of, you know, days or weeks for provisioning new environments as opposed to minutes. And then finally, you know, as we see time and time again, if you've sort of standardized on the Kubernetes API, everyone's speaking the same language, all the tools play really nicely together and you get a simple, like, consistent story across all of those as well. All right, so I think we've got, so that this thing says 35 minutes, but I don't think that's correct. Like, five minutes for questions. So if anybody has Q and A, there's a microphone right there and we'll be happy to answer everything for you. You've popped up real quick, go for it. Yeah, so one of the things about the multi-tenant approach that you have, how do you manage a situation where a dev team wants to deploy some kind of operator that would deploy a cluster-level resource? Do they have their own cluster and in their tenant Argo CD, they can deploy that? And like, I guess if the answer is they can do whatever they want, then what are you specifically managing in the control plane Argo CD? Oh no, they cannot do whatever they want there. They won't be able to install a cluster component, they don't have the permission. So they cannot create clusters, scope resources on those clusters. So what we will do is implement it for them and so now it's available to the whole company. Gotcha, okay, so you give them a way to self-service that operator into the environment. Yeah, right. Thank you, great talk. Thanks. I'm new to cross-plane, only really saw any of it today. I was curious about the environment config that you were discussing. You said it was a lot like a config map and then when we saw it, like it actually just looks like a config map. So why is that not just a config map? I've seen other tools where they say leverage this from config map or from secret and so why have your own? Which, because it seems like if I wanted to have values that maybe I would propagate to another service, having it in the environment config makes it not able to do that. Yeah, so one of the biggest reasons for driving that design is because when we first had the idea of, hey, we want to populate cross-plane compositions and cross-plane things with data from arbitrary places. That was the driving use case for the functionality of environment config. And so the very first thing was like, okay, cool. So we'll teach it, we'll just let you do config maps and we'll just teach the composition machinery how to talk to config maps and blah, blah, blah. But a problem that that brings up is that the multi-tenancy story and the RBAC model for cross-plane is that if you let a composition or you let composition authors access arbitrary data and you start like Kubernetes resources, like config maps or secrets and stuff like that, you very quickly run into a problem of, oh wait, they can access any config map or we have to lock this down in some sort of way. So the decision to have a strong type, a cross-plane specific type for the environment config was based around largely driven by security of, okay, it'll be a cross-plane specific type. And as opposed to your compositions being able to access any config map and throughout your cluster and goes spelunking around, it's just a cross-plane specific stuff that was kind of designed to be part of the platform and access through compositions. So that was the driving factor. That makes sense. As a follow-up, then, is there a mechanism to export some of the values if you wanted to chain this together with another tool that would read from a config map? Yeah, and that's a great question because that exactly speeds back to it's all Kubernetes API stuff and any tool can read an environment config. It doesn't have to know specific what it is. It's got a data field and then unstructured data or key values underneath that. So anything, you know, kubectl could go plunk down into it. Anything that talks to Kubernetes API can access it as well. So it integrates nicely with the ecosystem even though it is a custom resource that is cross-plane specific. It still integrates nicely everywhere. Thank you. Yeah, thank you. It's really funny. At my company, we kind of built the same thing. I just gave a very similar talk to my internal teams. Like, we care about this very same flow. So I see a lot of the same ideas. I'm glad I'm not the only one thought of them because that makes me a little bit smarter, I guess. But the good thing, you know, I really like the environment configs and everything. I guess my one question I have is, have you run into any scaling issues? That's one of the things I've been kind of worried about in just the lack of being able to kind of, like, deploy multiple controllers to control the, like, different namespaces. It's been a little bit of lacking on the cross-plane side. So for now, we don't have a huge number of resources. I think we have about 10 or 15 clusters across, I guess, seven tenants. And that's about 1,500 managed resources, so it's not huge. But still, it uses quite some resources. But maybe Jared can talk a little bit more about the optimization that they are currently doing. Yeah, so I think that there are, historically in cross-plane, there have been places where you run into scale issues. The biggest one that we ran head into, like, really badly, was around the number of CRDs because cross-plane, you teach cross-plane how to manage anything. So Amazon itself has, like, 900 resources, right? And so that created an enormous problem for us because the scalability thresholds of cross-plane, sorry, of Kubernetes in general, they thought through a number of things, like number of resources, number of pods, number of nodes, like, all those are well documented and well understood. Number of CRDs was not part of that focus or not part of that list. So you started, you know, it's like, cool, I'm gonna teach it how to, you know, Amazon and Google and Alibaba and, like, the control plane would fall on its head because there are too many CRDs and the processing of them and all to get them exposed as endpoints in the API server was just not efficient enough to handle that. So we did a number of things in upstream Kubernetes to kind of deal with some of those inefficiencies and make that a better process. But then we also took an approach in cross-plane of, of... The families. Sharding. Yeah, exactly, provider families, right, to separate those out so that you can just pick a scope set of resources that you wanna deal with, though they're important to you and not bring in, you know, thousands of them all at once. So that was a major scaling issue that has largely been solved now. Yeah, the families are pretty cool. I've been using them. I guess the only thing I would like to see, maybe Alibaba and the Crest Plan in the future, is be able to kind of like name space some of the providers. So they, the same CRD, but have more control over what they're operating on. So I can kind of get some guarantees about what controller, you know, so having one controller fight over all of the CRD across the entire cluster. Anyway, thank you, great talk. And one last thing is that in the provider IDBs, there is a lot of ongoing work on the resource utilization. So the interaction with Terraform right now is through the CLI. And what they are doing, I think there was an alpha about a week ago to interact directly with the API. And so they're seeing a lot of stability and not stability, but it's much more efficient and faster to reconcilate on this newer version. And I think it should be available in a few weeks. Yeah, yeah, yeah. The one not 15 or one not 14? No, so it's not tied to the Crest Plan release. It's part of the provider release. So it's a... Yeah, so out of band it's not, yeah, it'll be before one 15, I think. Sweet, so that's all the time we had in this session here, so we'll wrap it up. Thanks everybody for coming, but I think we'll probably stick around and hang out a bit if you wanna come up. Thank you. Thanks, dude. Thanks for having me.