 Welcome to KubeCon. We have the honor to have one of the first talks here on Monday morning. So glad that a couple of you came and are not sleeping in or still traveling because I think the majority will probably still be entranced or so. We took our liberty and came in on the weekend. So glad that you can be here with us today. We will be talking about the platform at Deutsche Bahn with cross-plane and cross-plane provider ARGO CD. A project which we are running now for a good three years. And we want to share a bit the challenges we had when building the platform with cross-plane and with ARGO CD, with the integration we came up with and the solution and how you can benefit as well if you want. My name is Jan Willis. I'm from Germany, Berlin. I'm a platform architect at Accenture. I've been tinkering with open source for my whole life. Starting with Kubernetes 2014-15 with the first projects and for a couple of years now I'm doing cross-plane and I will happily go on about that in more detail what we have built in the past time. Meanwhile, my colleague Dennis will tell you more about the platform. Yeah, hi, my name is Dennis. I'm from Frankfurt, Germany. And yeah, I have over five years experience in building and running platforms and cloud environments. And yeah, I'm working at Deutsche Bahn for more than two years. First of all, if it's okay, I would like to give a short introduction what Deutsche Bahn is. Maybe it's not so well known here in North America, but if you ever visited Germany and take a train there, it's the chances are higher that this was with Deutsche Bahn. So Deutsche Bahn is not only Germany's largest railway operator, it's also a major player in Europe's transport sector. The areas of operations include, among others, passengers and freight services on railway. And with the strong rail strategy, Deutsche Bahn aims to significantly increase traffic on rail, that means bringing traffic to rail. As you may imagine, this will also include modernization and digitalization of railway operation where our platform came into the picture. With that being said, let's take a look at the current state of our platform. We have one platform team with around 11 developers and DevOps engineers. We're currently managing over 50 Kubernetes clusters, different over 50 Kubernetes clusters, backed by AWS AKS service. We have 50 different developer teams using our platform and we are allowing these developers to using over 45 different kinds of AWS services via self-service. In regard to AgroCD, we have currently 100 different app projects with certain application and this application interfacing with 180 different repositories. At the monitoring purpose, we are monitoring the consist of Grafana, Mimia and Loki. We have 48 different Grafana organization and managing 17 terabytes of magic locks and 16 terabytes of locks. In crossplane, we currently have 54 compositions which result in over 2,000 lines of code, but most of them are generated by Helm which we use to provide our crossplane package and we also have about 10,000 managed resources. This means we handling over 10,000 external resources like AWS SV buckets or applications in AgroCD with our platform. That was a short introduction into our platform but which components are making a platform. First of all, we have observability, very important part of the platform that means we need to make sure we see metrics, we get our locks and also tracing is an important part. Then each application needs credentials so you have to think about where you can store your credentials and how you can receive your credentials and also infrastructure management, that means do you want how you can deploy your infrastructure with the platform. Traffic management means how you can reach your application and next part is the employer, so how can you deploy your application on the platform. Also important parts are compliance, so that means we are a large company with a lot of rules so you make sure that compliance is part of your platform as well as security. So Deutsche Bahn, for example, have critical infrastructure with survivor network and we have specific requirements and so we have the important part with compliance and security on our platform. Yeah, with this being said, I would hand over to Jan to explain you more what components we use in our platform. Thanks. So yeah, with what Dennis said already, a pretty important part are the dependencies of an application. So how do we manage compute, storage or network, all these parts and with that I want to introduce the platform API where we basically go from the bottom to the top at the very bottom. We do have our service providers which offer us several features like, for example, on those cloud providers, compute, storage, networking, messaging, et cetera, et cetera. But that's usually, well, they do have a very, very large API surface. That's not usually, they don't offer everything what is needed. So we do have other APIs which are very important for application management and deployment. For example, obviously the Kubernetes API itself, the Git API, in this case, it's GitLab or a policy engine for example here, Syra, AgostiD, Grafana, it's not an exclusive list. But just to give you an idea that there are many, many different APIs involved when deploying applications or not only deploying but managing applications. And those need to be integrated somehow because if the product teams or service teams want to actually, kind of after the first POC or after a migration, want to deploy their applications, you have to be like, I don't know, a major genie to understand all those APIs. Well, Kubernetes let alone is very complex by now, so very complicated. And the platform team, which owns the layer over the service providers, set out to abstract those a little bit, well, abstract and integrate obviously. So with the goal to make it easier for applications team to actually use and manage their applications. So to integrate all those service providers but also to give them metrics and all this kind of nice things which you expect. Another stakeholder of this platform layer, which is only depicted here in green, is what Dennis mentioned, the compliance department or security where now since the platform team providing a single API abstraction layer on top, we do have very good insights on how people are using our platform because we are providing a live API which is like request response. We can make some defaults, for example. A famous example is the S3 bucket, so if anyone provisions in the S3 bucket, it will be per default non-public to kind of not accidental leak some data and so on. But we can also integrate into CMDB systems, for example, because Kubernetes as the layer here, we will go into a bit more detail soon. It's kind of the abstraction layer. We do have a database for querying all of our infrastructure. And obviously on top is our users and those users, well, since it's Kubernetes abstraction, they can go with all this Kubernetes tooling which is available. So, for example, they can describe their stuff in hand charts or in customized or even in plain JSON manifest and use curl or with the client go SDK available. They can also write their own automation and go or any other language which has Kubernetes SDKs available. Another thing is the GitOps today. Obviously, here at ArgoCon, GitOps is the main topic and I'm not sure how many years already the ArgoCon is part of the co-located KubeCon. But it's a thing and especially for production environments, we are using GitOps quite heavily. And with this approach here, mainly using the Kubernetes layer as an API framework, we can extend GitOps not only to application like pure Kubernetes objects, but to kind of all kind of objects which are part of the Kubernetes API. So, for example, we... Since we can provision S3 buckets via the Kubernetes API, we might as well provision them by our GitOps and we will go into this kind of glue layer in a couple of minutes. And we can build our website on top, like a portal. We are using backstage for that, but in the end, you could use whatever you want. So, Argo CD and Crospin are a perfect match. I don't think I need to introduce Argo CD here quite extensively. Crosplain maybe has... Who has heard of Crosplain before here? Okay, maybe half of it. So basically, I've been talking already a bit about Kubernetes objects and how we can extend Kubernetes objects to not only have this core, like I believe it's maybe around 60 objects in a vanilla Kubernetes installation by now, but extend it with our own objects. And that is exactly what Crosplain does. It hops on Kubernetes and extends it with its own object. So you can store whatever objects you kind of like or want to create in the Kubernetes API. And with this service providers, that's exactly what we are doing. We are mirroring the external APIs into the Kubernetes ecosystem. So, for example, in the case of AWS, we are mirroring the AWS API surface into the group version kind Kubernetes YAML, if you wish so. So it's a very kind of standard way of accessing different service providers because they all look the same. So, for example, if you take a Kubernetes deployment, it has an API group at the top is apps. A version is V1, the kind or the object is deployment. And then you have a metadata name for whatever name it is or should have. And then in the spec, you get to describe what exactly is this object. And with the Crosplain provider ecosystem, for example, with the provider AWS, we get to describe what this S3 object or S3 kind exactly is made of. So obviously it has a name too and then some other restrictions like a location size and retention policy and so on and so forth. So, Argo CD is the perfect fit for managing this front layer on how to access the platform. And here I want to describe a bit on how we go about this integration. So, from left we have our Crosplain ecosystem and on the right we have the service provider API. It's kind of a bit 90 degrees now because the service providers are on the right side. But I want to describe here and make it clear that the mirroring part, so we do approach just external APIs with the provider ecosystem. And in the case of Argo, that's the HTTP API. So, the provider Argo CD, which we wrote as part of the project and we are still maintaining and developing features for it. It basically mirrors the Argo CD API into the Crosplain ecosystem. So, you might wonder, okay, but I can already create Kubernetes objects and they are in Argo CD, like with the Kubernetes API because Argo CD uses Kubernetes as the database, if you want. So, I can more or less connect directly to the database and create an Argo CD application or an Argo CD project because those are the two types which Argo CD has as a native or their own registered Kubernetes types. So, there are two ways of connecting or of managing Argo CD or approaching Argo CD. And it's not super... They are not like fully 100% the same things where you can configure, for example, the same things. So, for one thing, you need the HTTP API. Other things you can do with both APIs and for other parts, I think you only can do that via the Kubernetes API. So, with the provider Argo CD, we wrote a software which mirrors the missing pieces, basically, back into Kubernetes. So, that's exactly why we wrote that because part of that was... Our journey was when we started. Argo CD was at version 105 or 106 or so. It was like two or three years ago. And it didn't have dedicated objects for anything. So, there was a config map which has a huge list of entries, all in strings and blobs. And those were the repositories which Argo CD manages which are their upstream Git repositories. So, for example, if you wanted to add another repository, you need to more or less check out the config map from Kubernetes, make your adjustments, and then push it back to Argo CD, to Kubernetes. And this obviously is super... It's not very tenant-safe or very hard to make tenant-safe and it's prone for errors. Other parts are a bit... For the security, that's a bit of a cross-plane... ...own thing because a lot of the things which we store in Argo CD, for example, credentials to Git repositories or credentials to connect to the Kubernetes clusters, both of them Argo CD needs to have, we want to obviously store encrypted and not only encrypted when connecting via HTTP, but also in the database at rest. So, since Argo CD uses Kubernetes at the database, if you have all this stuff in secrets, you're good. But for the integration layer for cross-plane, when you kind of describe your own APIs and make them easier for the customers to use, a lot of these things is not in secrets because that's just plain Kubernetes objects like deployments or anything. They are just not encrypted at rest. And with the provider Argo CD, we are able to integrate it and exactly store the credential information or leave them as secrets and have them encrypted at rest. So, that was a major plus for us. I want to go into two use cases because we only have a couple of minutes now. One is the cluster registration of Kubernetes clusters. So, for example, if we create a new Kubernetes cluster, we want to have it automatically registered at Argo CD. And what's the standard way to connect to a Kubernetes cluster is the Kube config. So, when you create a Kubernetes cluster at somewhere, it doesn't matter, even the V cluster of the cluster API or at one of the cloud providers, you get back a Kube config. And unfortunately, Argo CD doesn't read a standard Kube config. You need a dedicated object in Argo CD, which has a few parts here different but doesn't use a standard Kube config. But a standard Kube config has all the information in there which you need to register the cluster. So, with Provider Argo CD, basically you can create any cluster and reference the Kube config and it will create the corresponding connection at Argo CD and you have the cluster registered and connected with all this information. So that was just merged recently and it's very cool for integrating this stuff. Another thing we do have is when we onboard new applications. So, for example, a new application team wants to create a new microservice and then you go on and say, okay, well, I need a couple of things at different APIs. For example, I need a compute environment which may or may not be Kubernetes. The CI CD integration, which is, in our case, the GitLab integration, the GitLab runner, then the Git repository, obviously, and then finally the Argo CD repository application cluster registration, et cetera, et cetera. And obviously I want to have, ideally I want to start not from scratch but from a blueprint which already has all those modules and in this case, I believe, Java, Jars in there which connect to the rest of my company's ecosystem. So with Crossplane, I'm able to exactly have this high-level object. In this case, it's, well, standard Kubernetes on the lower left, also deployable via GitOps. And then, after this dotted line, the platform basically fans out and goes to all those different APIs and creates the objects and manages the dependencies and manages, for example, very important parts for the connection of those things, the updating when the tokens run out, et cetera, et cetera. Couple of challenges I want to go into is that there are missing HTTP APIs in Argo CD, which we would like to use but we cannot use for now. For example, onboarding the accounts that's still, I think it's not a config map but it's a Kubernetes secret but it basically is just a list of all those accounts. Would be very nice for us to have a dedicated HTTP API or, in the other case, to have a dedicated Kubernetes object as the backend. For us, it's also a bit unclear what goes into the Argo CD HTTP API or a dedicated Kubernetes object or a Kubernetes object which has a label attached to it and then is read by Argo CD, so a bit of Argo CD internal. It's not very clear. And since we do only have 30 seconds, I want to go to the last slide and as a bit of a summary, provide that with the provider Argo CD. We connected one of the most important parts of our platform into the rest of the platform and with open sourcing the provider. We do get contributions and a bit more widespreading of our approach from other companies. And with that, we're done. Thanks. So we have a few minutes for questions. There's a mic up there or if you raise your hand, I can run over to you. If you have any questions, we have a few minutes here. There's a question. You can stand up on the mic there. And then you'll go next. Single question. Can I integrate Argo CD with the Terraform instead of crossplane for infrastructure as a code? Oh, sorry. Yeah, so she's asking if you want to... Just one question about Argo CD integrating with the Terraform instead of crossplane as an infrastructure as a code. Is it possible? I believe so. We chose crossplane for our integration layer because it provides us with a Safe Service API. Whereas in Terraform, you don't have that at least in the open source version. I do know that it's possible to integrate Terraform into crossplane via different ways. However, I'm not sure about Argo CD. It's possible, so I'll try it out. It's possible to integrate Terraform with Argo CD. It's also an option you can choose. Yeah, it's an extension. I have actually two questions. First one, if a new API comes out, how long does it take you to expose it to application developers? Do you mean a new application of internal at Deutsche Bahn? Or do you mean a new object at Argo CD? Say S3 added a new capability and your developers need to use it, but you have the intermediate layer in between. That's a good question. In the case of AWS, you described as a code generation pipeline. AWS has this project called ACK, which is the AWS controller for Kubernetes. Crossplane has the provider AWS, which is basically the same, but not from AWS. AWS has a code generation pipeline, which generates code for both of these projects, which are both Kubernetes controller. As soon as that is available at their AWS models, you could go and generate the new extension or the new fields in the API and use them. For Argo CD, it's similar, because when Argo CD releases a new version with an API extension, via the provider Kubernetes, so via direct access, you would immediately have them, but you're not more or less type safe. You're still in this kind of string world. With the provider Argo CD, we did have a contribution from SAP, where they use GoVerta to more or less use the Go objects or the struct types from Argo CD directly into the provider AWS, so it's provider Argo CD. Sorry, it's not like a full code generation approach, but it's very close, so you don't need to have this copy and paste boilerplate for that kind of stuff. In the question of time, I would say it depends on how fast you need it. The way is clear on how to do it, and then if you get this pull request merge, then it could be just in a single day more or less. Thank you. And one more question. If you go a few slides back, how do you propagate between different providers? Say you need to create a security group and then you need to send the security group ID into the applications to specify the key and I configure something, so it's not within a single provider, but with cross-provider communication. Yeah, so cross-provider has a feature called Compositions, which basically allows you to use, in the simplest form, templating to connect those things, like a hand chart, for example, but to compose many different objects into a single, easier one, which is, for example, the application blueprint. I showed you that's just like new application type Java, and that's it. And that basically we use extensively to abstract the infrastructure part. And this is cross-planes feature, right? Yes, it's a cross-planes feature. Thanks. Correct. So the question was, the cross-plane needs a management cluster to be usable since it's an API. So it's not a client-only approach, like, for example, Terraform, but it offers an eye, and the question is, how do we provision the management cluster? So it's a bootstrapping problem or challenge, and basically what we did was we seeded first with a kind cluster from a local instance, then from the kind cluster we created via compositions, a cluster, in our case AWS, and then we basically migrated the state into this cluster so it manages itself because the state is at AWS, so the cluster is at AWS. We only connect with it via client-side. We are able to make this kind of off-dance to have a cluster self-managed. A control-plane cluster self-manages it's all basically. Thank you. I think before you mentioned you, we're also using backstage. Is that right? Yep. I was just wondering, in that kind of use case where a user creates a new application, like where does backstage stop and where does cross-plane start? Yeah, that's a good question. So we do have many different levels of knowledge in the company. So for example, the TechSavvy crowd, they are using direct API access quite heavily and extensively, so they are not afraid of using Kubernetes objects and so on and so forth, but we do also have folks who are, they just need to develop a small, very simple software and they don't want to learn all this kind of stuff on the bottom. And for that, we do have a scaffolder in backstage where you basically can create those Kubernetes objects which are then deployed via our RSD to the cross-plane platform cluster. Basically, if I go back here, it's on the top right. So your interface is a graphical one and you scaffold, that's a plug-in in backstage which basically allows you to scaffold Kubernetes objects. Okay. And cross-plane is still involved from that point on? Yes, cross-plane is still involved as a lower level, but some users prefer to not directly interact with this kind of low abstraction. Thank you. Hey, we got one more quick question for the next talk. Hey guys, I was just wondering, you talked a lot about self-service model providing cloud provider infrastructure for your end users. How do you gate against somebody provisioning 1,000 S3 buckets or running up the bill to an extraordinary amount? So the question is how do we gate or secure ourselves against someone accidentally or on purpose creating 1,000 S3 buckets as a platform team. And so in the case of production system, it's very clear we don't hand out credentials to those production accounts. So there's no direct AWS access or access to any of those APIs really. So the only way to approach those system is via the platform API, which is a Kubernetes-style API on cross-plane. And with that, we get access to a variety of policy mechanisms and control mechanisms on how many objects can be created. A simple one in Kubernetes is the RBAC stuff, but you don't have things like you're only allowed to create five or 10 S3 buckets. But that we can do with add-ons for the Kubernetes API. In our case, that's OPA and Styra. So we kind of have safeguards on the shift-left approach so that the API itself already says no if you want to create more than 1,000 buckets. Obviously, those guardrails can be created at some APIs at the service provider. For example, I believe at AWS you could have some internal mechanisms to control that as well. I'm sure at others, for example, not. So for example, in AugustaD, there's no internal mechanism so you are kind of dependent on those shift-left approaches and before things. Yeah, in our case, it's OPA, more or less. Thank you. Thanks. Alright, thank you.