 Who's here for the legal talk in the audience? No hands. There will be plenty of legal, and there's a coincidence. There's a legal talk in the next room. But we're not from legal, but we like legal, so it's everywhere in the talk. All right, so before we start, a short introduction. Who is who here on stage? So, hey, I'm MJ. I work at Cast AI with new cost optimization, and I'm interested in control planes. This is why I'm here. Sebastian, one of the founders of Kubernetes, and yeah, all we do is all around control planes. Yeah, more or less the same for me, Stefan. I'm at the upper on doing control planes. You know, myself probably from CRDs. I was very involved there. So if you use them, that's probably partly my code. So our talk today is why Kubernetes is inappropriate for platforms. So this is, of course, a big claim and how to make it better. So this talk is about ideas. It's not about setting a project or product or anything like that. So it's more about not just accepting Qube, but thinking about how we would have to change Qube to get something better for platforms. That's our motivation today. And we were thinking, OK, the last days were all about AI and ML. So let's build an AI and ML platform here. And keep in mind, we like Qube, but we also want to show you what potentially could be improved or what we dislike. So let's start with this experiment. So if you want to build a platform, the first thing what we need is we need to provide our developers some kind of possibilities to create objects. So we need some kind of objects. We have this in Kubernetes. We have CRDs. We can create like an object called model. And different teams can create it. Different user can create it. So it's there. Perfect. The next thing, as we want to provide our platform to different teams, we need to have some separation. We have this also in Kubernetes called namespaces. So we can give each team its own namespace. They can deploy this object. And it's completely separated. We can also separate it by our box. So also this we have in Kubernetes. And of course, what we want to have is a uniformed API. Potentially we need even more objects. And it should, every time, follow similar principles. And Kubernetes gives this to us. And if in the future, we want to add another AI service or we want to add a database service, it should not be a complete different way how to consume this, how to operate with this service. So this part we have perfect. Let's go further. The next step is we now have these objects. What to do with it? In Kubernetes, we have this reconciler pattern like the control loop to interact with it. And then we build operators on top of this who do some actions, who either deploy stuff on Kubernetes. If you know cluster API, it can even, or cross-plane, it can even talk to other APIs. So there are tools for this most popular controller runtime to build this. But there are many more, like, for example, cube builder or meter controller if you don't want to write so much code and want to make it more simple. And also, like, if you want to really deep go into it, like, like, go. And you don't only need to be using go for it. There are many other ways. You can use Rust, Python, Java. So there's a big ecosystem of tooling to work with this. Cool. So multi-tenancy. So in Kubernetes, we all know that multi-tenancy is implemented in a namespaces. But there is a question mark, then. Is this really a multi-tenancy? So when your tenants are split by namespaces, your developer, you're developing on that internal developer platform, you're interacting with a platform via clients, SDKs. And it's explicit. Your developers usually have access to one namespace or all of them. The SDK has to be specifically typed. But that's not a challenge. You can work with that. Your teams can create a wrapper around that. The real challenge happens when you start interacting with CRDs within the clusters. So as soon as you install cluster-wide services like CertManager, CrossPlay, and ArgoCD, and you as a platform owner needs to upgrade those components, now you have to synchronize with each and every development team to make sure they are OK with the upgrade. They are OK with the versions. If you want to introduce some clashing CRDs, you have to go out and basically get a consensus from your platform to do that. And in the end, this makes a platform owner a very unhappy person, because usually these personas has to deal with those things. So yeah, short status check. This looks OK-ish, like we live in that world. But there are cracks already visible in this model if you want to build platforms. But we know this pain. Several years, we live to them. And OK, we can continue like that. But if you continue, this picture here is actually not the picture of today, right? People have more clusters. So single region, multi-tenancy-wide namespaces. We had that like seven years ago. Didn't look different. Single source of truth in one cluster. The real challenge today is that, right? Multi-region, multi-cloud, everybody wants to do that. And you need more isolation in namespaces. So there are ways to get that, like more clusters, essentially, of some kind. And we talk about that in a second. So more clusters means the sprawl of clusters. And this brings complexity. So you have to think about how to share data. Like the volumes maybe just live in one cluster. And you cannot talk between services via the Kube API. Keeping config consistent needs tooling. And applying policy gets more complicated. So there are many complexities. And so we represent it here in this picture as bridges. So we have to reconnect those clusters, right? There are connections, logical collections, maybe even network connections. But they are connected. They're not living on their own. And to start with that, you have to create the clusters. There is a giant ecosystem already to just create clusters. And you will have in use maybe one or two of those tools, maybe other tools. But you create more clusters. More tools are available. And basically, if you have more clusters, you have to tame them. So there's another class of tools to do everything I showed previously. Config, Compliance, AgroCD, so GitOps tools, cross-plane, help you with policies and with application deployment. Cluster federation helps you to a degree with compute, so federating deployments across clusters. And there are cluster managers, like the Open Cluster Manager project, for example. And you see there's a big set of tools. And they are all very scoped. They have their use case. And you all know this cartoon there at the bottom. Isn't there something which can unify them? Of course, this is an immediate idea that engineers have. But those tools are not just technology. The real problem for platforms is every tool dictates a view on personas. And the view on personas maybe doesn't match what you actually want to build. And maybe they have been developed three, four, five years ago when platforms also were not a thing. So they were not built in a way that they are really compatible with platforms. So personas, basically people, they are the actual challenge in this area. And so usually personas, they will talk about them later on. Platform owner, service provider, and user. So we focus on those three. And what makes it complex? Tube was built basically with one persona in mind, right? There was this ops person deploying application on a cluster. This admin basically can do everything. And those personas, they have very limited and partial responsibility. So you have to find a model for authorization using AirBug maybe, or other tools, Scaviano, policy, things. And you have to basically implement the responsibilities they should have, and not more because then it's a security problem. And it's getting even more interesting when you think about third parties. So you want to use a third party tool, like a service which is installed on that big setup. And this big setup is your setup. If you ask your neighbor how your platform looks like, the neighbor's platform, it will be completely different, right? Everybody is proud to have solved the platform problem in some way by a clever use of tools. But it's very diverse. And the tooling we have in the ecosystem, like a Helm chart, doesn't make sense in this world, right? It cannot talk about multicluster. It's just not made for that. So one question you could have, what is actually a package, a service which is able to span clusters? How to install that? How does it know about your clusters? And you get those other bridges or those other streets there. So many, many ways to reconnect the cluster. But this territory between the clusters is basically undefined, right? If you go here to the booth hall above, every multicluster tool builds their own bridges. And there is no common language. So complexity explodes. The thing is hard to support. Like, it can break down easily by just new requirements, for example, because you just found those tools which basically implement just the requirements at the point in time. But tomorrow there's a new requirement. And those tools don't do it anymore because it's hard. They're not integrated. They just do what you wanted at the time. And it's hard to integrate into them, like installing something new and supporting the same personas as hard. And in general, I think it's not a good experience. You can build it to a degree like a platform which kind of works, but it's not a good experience for any of those roles. And so you have to check again, Kube was not built for that. Like, we are trying to build platforms by a technology which was built for containers. That was the purpose all the time. And platforms is something we came up in the last year or the last two years. And we try to make use of Kubernetes for that. But remember, we like Kube. Like, the ecosystem is, of course, very big. Lots of companies and projects around. So we want to keep that. But the real question is, so we want to keep the ecosystem. But if we built a Kubernetes, a Kube today, for platforms, it wouldn't look like Kubernetes. It would be similar in certain degrees, but it wouldn't be Kubernetes, for sure. So last time we were about that, so we know where to go to. Like, we have a rough idea what a platform should do. We did something similar before. But this time, ambitions are different. So we want to build something which can lift up an ecosystem to platforms. Personas change, obviously. And Kelsey once mentioned, Kube is not meant as an end game. It gives a pattern. Like, it builds basically a platform for container workloads. But it's for container workloads. And the big question is, what is the next step? Like, if we lift up or we leave this as container orchestrations area, and we build platforms for other things. And yeah, coming back to Lego, we are on the left side. We have small building blocks built for single cluster environments. And everybody can build a car or something. But again, if you ask your neighbor for his car, he built with the same tools, it will be very different. And they don't integrate. It's great for creativity. Everybody is proud about the car. But we have to get to the right side, like a grown-up approach, where basically we build towns. And towns are an environment where every component lives in, where I can have an off-the-shelf component, which knows about, this is a platform. There are clusters. There are APIs and everything. This is a vision, basically. And we should get back to a place where we can collaborate, like a common language talking about platforms. We can build a service which just deploys one to this big platform, worldwide, multi-region, multi-cloud, and so on. And one of the challenges, what we see, Steph, I mentioned it already. Nowadays, we have much more personas in the game. So when we started with Kubernetes, typically it was a developer who was responsible for everything. But now we have at least three different kinds of people who are involved in this. But there could be even more. So it's starting from the platform owner. So the platform owner really holds the key for everything. He connects everything together, and he provides this generic platform what the users can consume. But the platform is nothing without services. So then we have the service provider. So these are dedicated teams who are really providing the services. In our case, it's the AI or ML team who's providing this as a service to the developers. And who's also operating this. So all this complexity, like how to upgrade, how do I maintain this, is done by the service provider because they are the experts of the tool. And then we have the users. They want to use this. They are developers or data scientists in the AI, ML space, or application owners who are building higher level services on top. So definitely, if you're talking about platform, you at least have these three personas. But there could be even more. So let's look into more in detail. Let's look into more into the platform owner. So the platform owner, their main focus is really their enablement. So they give you a well-defined, flexible platform to make the consumption and to consume the services for the user as easy as possible. Also, they want to abstract the complexity. As a user, you don't want to deal with, like, how do I deploy this AI, ML service on a cluster. You are potentially even not a Kubernetes expert. You want to consume the service. So the goal for them is like, for the platform owner, is like, how can I provide this generic platform that the service provider then can use to build this higher level services on top of this? And another part is then also, it needs to be scale. Scale horizontal on different providers, potentially, on different regions, adding new services to it, adding new users to it. So that's not a platform for only AI. In our case, we want later to add databases or storage as a service to the platform. So the platform needs to build in a way that it's really scaling and that's working to add other services and that other services can easily consume in a similar way, especially without reminding the wheel for each and each of the services. So it should be easy to add new services with similar patterns, similar ways, and not like that every service later has its own API, its own way to provision. So as a developer or as a user, you really want to have one way how to do this so that you can also build higher level tooling on top of this, which makes it then to keep it really homogeneous so that you have one way and one way to consume it, one way to provide it. And as a user, you don't care if you're consuming our AI service or potentially later consuming a database service additionally, it should be easy to integrate into your stack, into your tool stack, and you want to have this simplicity. Similar, I mean, we are all here on the KubeCon. Similar, Kubernetes did for Container. So Kubernetes really enabled this whole ecosystem. And I think similar, we need also for the platform. We need this standard way that everyone can deploy and adding services to this. So we really need to build it for the user. So they should not be forced from the platform team in any opinionated way. They should use the services. If you have multiple services, they should use whatever is best for them and not what we potentially from the platform team think, because there the value comes. It's primarily giving them the services. And if your services are good, they will consume it. They will use it. If it's easy to consume, easy to upgrade, easy to adding new services, they will definitely start consuming it and potentially even getting into the role that they later add their application as service to the platform as well, so that other then can again use their services and you stack it up service by service. Yeah, next one, service provider. So imagine you are developers and you want to build some tooling. So imagine when I talk about tooling, a policy engine or something like that. And in Kubernetes, you have controllers, obviously. You can run controllers in Kubernetes, but they are basically limited to the cluster scope. They cannot go outside. I mean, they can, but tooling like controller runtime is just not built for that. There's even not multicluster in controller runtime nowadays. So what we want, and this is of course a vision, to have this rail track for basically the service provider behind the scenes of the users. So users are in those small plates there, the cutout plates. And we are on the track now, and we are building the service via controllers. So to do that, we need some system. And again, the example Kubernetes does it for cluster, where you can get the request from the users and build something in a consistent way, whether it's one region, multi-region. Basically, you need awareness of the APIs and the tooling you have for this platform use case. And if you have that, you are efficient, right? And you can safely operate the service and the setup. So what we need is something like a set of tools, which not only work for a single region, but also for this bigger setup. And you see those screens there on the track, so those are the controllers. And you need a way to deploy them without even knowing that there are seven clusters in this region and 25 in that region. Today, there is just no way to make those aware, especially not with standard tools, where you cannot deploy a third manager globally. No way, there's no way to do that. And the vision is basically, we need some system which can talk about those global services as a product. A product you build your own, a product you sell, or something you buy from some vendor. And if you think about a vendor, you buy something from somewhere else. Like you have this consultancy, you hire because they have this very useful turnkey solution for some use case. And you want to deploy that. I mean, you have a big platform. There's lots of data inside, lots of secret data, maybe even. And Kubernetes cannot protect that, really. I mean, it's pretty limited what Kubernetes itself can do. But the system must, of course, allow secure third party services. And yeah, if you go into details here, zoom in a bit, what are the primitives? And we will not show them here. We will see some of them in the demo later on. But the question, of course, is the CRDs we saw in the beginning. Are they actually the right thing for this setup? Or do we need anything else? Of course, we can ship around CRDs, like we can synchronize them and install tons of Helm charts. But is this the right abstraction? And yeah, maybe not. And AirBug, same question. If I'm a service provider, I only want to see what I have to see. Otherwise, there's a risk that I leak data, or I'm some exploit in my application, that does something back for the platform. So I want to see only what I need, like the claims of users and nothing else. Anything I don't need from my service, I don't want to see. With AirBug, this is hard. All right, sounds all very abstract. There's always compute, of course. Compute is central, obviously. And I mean, in Kubernetes, it's this thing, like it's the center of the world. In a platform, there must be compute, but it's just another service. So think about APIs which provide compute. This can be a Kubernetes API or anything else. But maybe you have seen that when the cloud movement was starting, this term of utility computing was a thing. And basically, it's what AgroCD, Flux, or other tools like Crosplain, what they are modeled around. They are running somewhere, but basically, AgroCD or Flux also, they can deploy stuff elsewhere. It's always somewhere else on some compute service. And there are different ways, different kinds, flavors of compute services. I guess many will use vCluster for isolation reasons, for example. It's just an API to create the vCluster and get it into the system in a consistent way. It doesn't matter that if you run the application, you don't have to know it's the vCluster, right? It's the detail. It's something the application team could just do behind the scenes. You want compute, which is cube compatible. And of course, VMs and similar things also exist, federations, there are many more variants. And the core idea here is, cube is attached to the platform. So attached, we will see it very visually later on when MJ throws a demo. Compute is not the center of what we are talking anymore about. So compute is the service on top of a platform, not the center of the world. Cool. So let's look a bit to the user's journey. So we talked about two personas, but in the end, users who will use the platform. So as a user, what they want, I want spaces to do my job, to work, to interact with. And in this concept, we call them workspaces. They should be distributed. And as a user, I should decide where I want my workloads to run, where my jobs to be. If it's on premise data center or in a cloud, it's a distributed, isolated, but at the same time, all logically collected. It's like the last thing I want to do is keep jumping different cube configs, different access modes. It should be seamless in my control. I should choose which APIs to consume and where. And you might argue that this picture is not the one which abstracts the complexity. But as a user, this is what I want to see. Only things I interact with. There's few blueish blocks, which represents the attached compute, single API endpoint. I can navigate from one workspace to another. So isolation at the same time, connectivity together. And for me, if I'm jumping from application to application, from region to region, from cloud to cloud, I should not care about implementation details under the hood. That's a platform's owners and service provider team's responsibilities. Yeah, and what we describe here, it's a vision. This is not cube, right? Cube cannot do that. Just by adding more clusters to our platform, I will not get this user experience. This experience, how to use a platform, how to consume APIs, and so on. So it looks like we need new primitives. And primitives which are not about just containers, but in a way, they should be for a world which are inherently multi-tenant, region, and cloud. Cube was never built for that. And again, the big goal is to regain a state where we speak the same language where we can innovate together. And I mean, what we all like at Cube is the API. And we want to have this kind of API. And so there is something there already. So we have KCP, Kubernetes-like control planes. It's now a CNCF Sandbox project. It's really a framework to bootstrap Kubernetes-like ZAS or platforms to build this, to build higher-level services on top of it. But there's even more work done also in upstream Kubernetes to make the API server more generic, so that you have really generic control planes, which not all the details you need for containers, because for a platform, you potentially don't need containers. You won't need to have services. But in the real compute cluster, you later need containers again. And I want to underline the word framework. This is not a product, not a project. It's a project, but it's not something you install usually. It's a framework to bootstrap experiences, to build an experience for a platform. So what MJ will show in a second is basically an example. It's an example platform experience built with KCP. KCP is a library, if you want, to do those things. And it doesn't have to be a platform. It can also be a product, like a company's building. Cube, Kubernetes-compatible products, which are not about containers, on that basis. And it's a core component that is shared that's KCP. OK, so demo time. I know thanks for sitting all our visionary talk. Let's see this in action. In this case, we have a platform deployed across two global locations globally. So we have two regions. One is named Root, and it's based in Europe, because it's KubeCon in France, Europe. And we have a second region, which is Beta, named Beta. And it's in US. And I will be acting as all three personas in a demo. So I will try to represent it as best as I can. So let's see how it goes. So first persona, I'm as a platform owner. So service team came to me. I said, hey, we are ML team, AI team. We want to start providing model training API to your customers in your platform. It's like, sure, I can do that. So just to get showing, so I have two locations. Root, Beta, single API represented here. And I show the current view of my current work spaces nested to Chisholm. These are platform system workspaces. Nothing to do with ML AI. So it's MJ, this is one KCP, right? And it's running already worldwide in two regions. It's a single instance. It has this spanned out across the locations. And I will show later on how the users will interact with it and choose where they want their workloads to run. So first thing I need to do is I need to bootstrap my ML teams configuration into the cluster. That's your custom tool, you won't try to find. It's a custom code. And nothing stops this to be self-service. So teams can come in and onboard themselves too. In this case, I want to have platform owner wants to have more control. So he says, yes, OK, I'm going to onboard you. But this tool is basically aware of the clusters, right? Or the regions, so it can have APIs for that. So let's show the different view what we have now. And I'm nervous typing. Yeah, in WS, of course, it's workspace. So yes, and I can see the new workspace appeared, ML, for the ML team, and sub-workspace training. So I created this playground for ML service team to go and provide the service. So how this looks like, if I go now inside API export dash. No. So it bootstrapped an API. So it created this API as a service object. And it's called training. It's already bounded to both locations. This is enough to enable the platform, the service team, the ML team to start serving behind these APIs, like create their own services. Globally. So this is important, right? So they can write controllers. They are just aware of the topology in different regions. So at this point, I'm shifting personas to be more like a service provider team. So let's get some layers off. So I'm switching to config, just to show where I am. From the service side, I have this structure created for myself as a user. And my team runs everything in Kubernetes controllers. We like Kubernetes controllers. So we're going to be serving those APIs using standard Kubernetes controller pattern and reconciled. So I need to deploy the controller to reconcile the APIs globally across the state. So let's just jump into Compute Prod. And all those commands, they look very like a file system, right, to change here or something. I went into the cluster where my controller will be running. Let's spin up there. So you have a cluster now in KCP, right? So is this a workspace? Like, if to look for the Linux terminology, I mounted a Compute cluster in my workspace as an auxiliary compute. So Compute became part of my ecosystem because I don't want to be jumping Kube configs. It's the same experience for everybody. And it looks like Legotov just finished. So I am deploying the controller, so let's see ML. Controller is running. So the service team, ML team who provides the APIs, they did the job, they compiled it. So now I'm a user. As a user, I care about only one thing at this point. I need to run my ML jobs, get results. And they need to run in US and Europe because of data rules and everything. So let's switch to user role now. So as a user, so same workspaces, let's create two workspaces. And if you notice, I have two location selector. One is name root and another one name beta and type ML training. So I'm applying, hey, platform, give me a workspace place for me to work, which is ML enabled. So if I do now VS3, I see the two new appeared. And I have the CRD created, which I read from documentation. This is how you do these things. I need to train some chat application on Lama2, some parameters, same CRD. And I need to train it in both locations because of different data, like France accent, language-based chat, and USA English one. So I go into Europe's workspace, simple common, just get in. And I create the CRD, Ash Ogiyama. And I see it got accepted by location, best Europe, because I just instantiated into the workspace. I didn't provide it anywhere else apart, like, give me this workspace in that. This means platform itself knew that the workspace needs to land in Europe location and do the job there. Let's do the same now for USA, just to show that it's not a serial demo. I'm creating a same model, same thing. And if I do, I can see it got accepted by US location. From Kubernetes standpoint, this feels looks like Kubernetes experience. All the CRDs are there. API is there. And one thing which I want to show last is the models API was provided to my workspace in a form of bindings. So this means the service team, when we created export, now you can bind one to many. And teams just interact with APIs, and they do the heavy lifting. And you don't see anything about the controllers, right? Yeah, just invisible. It's a service team's responsibility to handle those things. And if we see now here, like model in US, completed running, job done. So that's how it could look like if somebody would build ML platform as a service. Yeah, so very quickly, KCP is a center and box project. We talked about that. It's based on Kubernetes source code. It's based on CRDs. So everything here is a Kubernetes-like API. So all the tooling, controller runtime, Kubernetes you saw in the presentation, all the tooling just works. AgroCD just works. And everything that's the primitives we are building here, they're inherently multi-ten and multi-region, multi-cloud. So it's more than just what Kubernetes offers. Workspaces are in the center. Workspaces are our unit of the user experience to work on, to work in. Everything is in one, like you saw the switching around between the workspaces in the hierarchy. There's one endpoint behind that. Of course, the endpoint can be HA, like in different regions or different cloud providers. But it's basically logically one endpoint. And it scales. Like we saw two shards here. So the thing was running already in two shards. But you can have 100 shards in theory. And the API export and the API binding, it's based on CRDs, but it's not CRDs. It's more than that, because we need different primitives for API management. So we saw that already. Yeah. I mean, we are here also after the talk. Talk to us. We want to get your input. We also have stickers with us. You can find us on the Kubernetes leg, kcp minus def. Of course, it's a Sandbox project. So the code is open source. Go to GitHub, kcp minus def slash kcp. Follow us on x, so kcp or our dedicated handles. And if you have later questions or come to our booths, up-bound booths, Kubernetes booths, or Cust AI boosts, there you can find us. And what we need and what we want is your feedback. Feedback to the talk, how you like it, but also in general, what are your thoughts? What are you doing with Platform and where you would like to use this? So that kcp and in general, this whole idea can evolve. Thanks, everyone. Thank you.