 Yeah, thanks for joining my talk. And I must admit, I'm really grateful to God that I'm standing here right now, because like on Sunday, I woke up to a very, very bad food intolerance, like too bad symptoms of it, from something I ate on Saturday and was laying in bed and was like, I can't make it to get off scone. And then on Monday, I woke up, nothing happened, like as if nothing happened, I completely recovered and now I'm here. So, yeah, I'm really grateful to be able to do that here for the first time in years. I am Max, I work at Data IQ. I'm a Kubernetes engineer there. I'm also a flux maintainer. Happened for like half a year, nine months maybe. And at Data IQ, we basically have two products. One is, they are both sold under the umbrella term DKP which is the Data IQ Kubernetes platform. One is enabling single cluster experience and the other product building on top of that is for facilitating multi-cluster experience, right? So we basically have multi-cluster running through our veins, if you will. And yeah, it's one of my favorite topics of the last years. However, it is pretty cutting edge. Like all the companies talking about multi-cluster here are, from my experience, exceptions to the rule that multi-cluster is still very cutting edge and very early in its stage, right? The main customers of Data IQ are big corporation enterprises, government agencies. And they are slowly ramping up the Kubernetes deployments in a single cluster way, right? And those that do, they already have kind of mature day two operations with the help of vendors like Data IQ or with the help of an awesome platform team that they want themselves internally. That's pretty mature. Single cluster, let's say like that's problem solved. Few enterprises are now entering like this phase of experimenting with multi-cluster environments, right? So they want to use multiple clusters, Kubernetes clusters in particular for hard multi-tenancies, something Michael nicely laid out in the talk today here as well. But it's only like dipping their toes into the waters of multi-cluster, right? It's a very early stage and very few have mature day two operations of these in place with like day zero being an experimentation phase, day one being a setup phase. You install your first clusters, you play with it and day two being the actual taking to production, right? From the experiences that we made with our customers and what I've been talking to other people with, there are mainly four use cases that I was able to derive that you wanna facilitate in multi-cluster environments. You wanna treat your clusters as cattle, right? I just had this quick conversation with Chris before the talk about customers loving their clusters, right? And he said customers shouldn't love their clusters, no one should love their cluster. Treat it as cattle. If it breaks, shut it down, create a new one. If the version is too old, it gets too far behind, don't try to upgrade it, just kill it and create a new one, right, treat it as cattle. But that means you need to make creation and deletion and modification of a cluster are very, very simple and replacement of an existing one as simple as creating a deployment and deleting it again. You also wanna deploy a fixed set of components onto each managed clusters. So the scenario I'm laying out here is you have a management cluster and then you have several attached clusters or adopted clusters, however you wanna call it. And in order to make them manage to, for example, roll up metrics to have insights into the cost of each cluster, you need to deploy a certain fixed set of management components onto each cluster. That is something that is mandated by the management cluster. But you also wanna provide something, as I call it here, like a centrally managed application catalog. So you have like an ops team or a vendor, maybe that maintains an application catalog for you that you can provide to your users to deploy to every cluster, something like, I don't know, I need this ingress controller. So they maintain it as part of the central application repository together with certain configuration that maybe fits to the internal enterprises environment best. And you can just pick from it and say, I wanna deploy that to this cluster or to the set of clusters. But I mean central to a multi-cluster environment is that you wanna make it easy for users, for the developers to deploy their workloads onto the clusters, right? You have a tenant, maybe multiple tenants per cluster, and they just wanna warn their applications. And obviously, where GitOps comes to the rescue here. And I liked how Michael said it in his talk before, they kind of converged to Git because it's just such, or to GitOps, doesn't have to be Git, in a natural fashion, right? It just comes natural that you converge to the GitOps model. But there are some things that would have to change with regards to the mindset that you like apply to the model. You can exclusively rely on Kubernetes RBAC anymore. In fact, you actually shouldn't, right? Because users don't need to be able to access the Kubernetes API anymore. Because now, what you need to apply is policies for access to the Git repositories because that's the entry point for all the users deploying their workloads. You also need to employ multiple repositories. We saw that in the several talks before. But if you really wanna isolate your tenants, you need to provide a Git repository per tenant, obviously. And you may wanna provide a way to configure certain applications in a different way. Maybe like you have the central application catalog, but you want users to be able to say like, ah, yeah, I wanna deploy Metal LB, but I need to be able to configure the address range of Metal LB. Yeah, something like that. So that's pretty important. Users need to be able to configure the applications that they deploy in a very simple manner. And of course, you wanna isolate tenants from each other on each cluster. So you may still wanna use your cluster's resources in a reasonable way and maybe put several tenants on the same cluster, right? And then you gotta make sure to isolate them. And that's like one principle that stuck to me from the first time I heard it. You don't need, and maybe you don't wanna give users access to the Kubernetes API anymore. Maybe only in a read-only fashion so they can see the status of their resources, right? But other than that, they shouldn't be able, or at least shouldn't want to create, edit, or delete any resources because everything's going through Git anyway, right? With Flux, if you edit a resource, the customization controller will kick in right after that and just delete your changes again or rollback changes again anyway. And what I wanted to create first, like as a prototype for myself and something that we wanted to experiment with internally was something like a multi-cluster template, something like we define a certain set of workflows and personas and then we implement that model. However we wanna, we can use Flux for that, we can use Argo CD for that. And I liked how like the GitOps Working Group and the OpenGitOps project converged onto that definition of GitOps with these four principles. And maybe this is something we can come up with as a community for multi-cluster environments, right? Because everyone's doing it differently right now because it's so new, right? Red Hat is doing it very differently from Data IQ and AWS is doing it differently from everybody else again, right? And then there's many ops teams at this point, like at this time trying to figure out the best way to do multi-cluster, right? So maybe that's something the community can provide help with at least. So I identified several workflows that you wanna enable or implement with your multi-cluster setup. The first is pretty obvious, you wanna deploy a management cluster, right? You wanna spin it up, install certain components onto that. For example, flux or whatever. Then you wanna be able to adopt a cluster. So you have like several clusters and you wanna make them managed and you wanna have this like single pane of glass that shows you all your clusters, that shows you how much each cluster costs, how much each workload costs. You wanna see metrics, you wanna see the logs. Maybe you wanna roll up the logs to the central management cluster. You wanna have something like SSO, all that has to come from the management cluster. And that's what I call adoption or attachment or yeah, there's several terms for that. As I said before, you maybe wanna be able to group your clusters and deploy the same set of workloads onto this group, right? So you have several groups, you have several clusters in each group and then define the workloads that each cluster in each group would get, right? So you would have like, I don't know, maybe a dev group and all the dev clusters would get this set of applications deployed and then a staging group that would get a different set of applications deployed. And also you wanna be able to configure the workloads per cluster and which workloads are deployed onto each cluster. So you still need to be able, even though you have groups, to target a single cluster. Yeah, that's about configuration of the workloads and then you also wanna be able to like, just deploy like your development team and you wanna be able to like, just deploy your application onto a single cluster. And then there's different personas acting on each cluster. So you have the management cluster administrator, that's like the person that has like, the highest privileges and this hierarchy of multiple clusters, that's the person that deploys the management cluster, that's the person that makes sure the fixed set of components, of common components that each cluster receives is well maintained and everything like that. And then you have the workspace administrator, that's the person that administrates a certain group of clusters and you have the managed cluster administrator, that's the person that is responsible, basically the tenant that is responsible for maintaining a single cluster, right? Maybe the development team, maybe a single person out of that development team. And of course I didn't wanna leave it at that and just say, yeah, that's a nice concept. One with it, I also wanted to dip my toes into it and actually implement it and see how hard it actually is given the tools we have in the community right now. And as a flux maintainer, it came quite natural to me that I used flux for that. If you don't know what flux is yet, it's basically a set of components that one on a cluster that let you reconcile, get repositories or other artifact storages like a Helm repository or an S3 bucket, then you have a Helm controller over there that creates Helm releases from manifests in that repository or in that source and you have a customized controller running that basically uses customized to create resources on the cluster from like bare bones Kubernetes manifests in the repository, right? So that's flux and that's what I love about flux, why I fell enough with flux. It's so simple, right? It's just a set of certain controllers. Each controller is very, very targeted at one use case and only one. So it's like an implementation of the UNIX principle which I really, really dig. Each component like the source controller comes with their own CRDs and it's easy to understand, right? You don't need days or weeks to understand flux. You just run flux install, it has a great CLI and you can go right away. If you wanna dig deeper into flux, there's a great documentation page created by Kingdom Barrett from the flux project. It's a huge page, it's a lot of text but it explains flux from A to Z and it's a really great page, I really love it. I really love that we have that up now for like since two months or so. So the demo I'm showing right now, it's pretty quick because you don't have too much time is hosted in this repository. I created a QR code because I thought maybe the name is a little too long to type it up. I don't know if the QR code works in this place but yeah, anyway, you can ask me for the link later if you want to. Everything is there and let's just dive right into it. So what do we have? We have on the left side, I'm connected to the management cluster and on the right side, I'm connected to another cluster. Both are just kind clusters for simplicity's sake and because you never know how the internet connection is at a conference. On the right hand side, you can see it's a bare bones, gonna make this bigger, it's a bare bones kind cluster, nothing exceptional is running in it. On the right side, we have some customizations already running in it. There is a script in this repository that's called Create Management Cluster and what I wanted to do with this demo is see how far can I go without, or with programming as little as possible, right? With just using what's out there as much as possible. And again, as Michael said before, they didn't have many programmers in their department and I guess that's true for many enterprises. So what I try to do is use what's out there and program as little as possible. So Create Management Cluster is basically these four lines of shell script that creates a kind cluster that installs flux and then it applies two resources to it, right? That's it. The rest comes from the structure and the layout of the repository, of the Git repository itself. So let's see how I attach a cluster. That's a little more shell scripting, but that's the closest I could get to making it simple. Maybe we can find ways to make it simpler. So it also installs flux using the CLI. That's something that's actually not extremely necessary because you can keep all the flux manifest in the Git repository and I have attached clusters just inside that and install flux that way. But bootstrapping is always kind of difficult sometimes. Then I create, or rather the scripts creates a secret holding the cube config of the attached cluster or to be attached cluster. It creates a customization, a flux customization and stores it in the sync file. This customization is special in that it uses a very, very awesome flux feature where you can say, this is a customization that targets a different cluster and not the one that flux itself is running in. So you can pass a secret to it, a Kubernetes secret to it, that has a cube config file. So flux will take that and apply all the resources from that customization onto the other cluster. And that's like the starting point from where everything gets installed from the management clusters point of view. And yeah, then it stores the actual remote sync is what I called it. So these are the two resources that I applied that are applied on the managed cluster. So that's a Git repository. And this is the one that is cluster specific. And for simplicity's sake, I'm using the same repository in just a different branch, right? And in an actual production environment, you would use a different repository obviously and provide the team that is to be the tenant on that cluster access to that Git repository. But yeah, I didn't wanna create like 10 Git repositories just for a demo. So yeah, I hope it gets my point across. And then it creates a customization. This is one of the intricacies with flux. You have flux customizations and you have Kubernetes customizations. And when we talk about customizations at data IQ in the beginning, everyone was so confused because nobody knew what the other person was talking about. Like, is this a flux customization or is this a customized customization? Yeah, I guess when you start using it, get that right first. And then it pushes all this stuff. And for speed's sake, it kicks off a reconciliation wide away. If you don't do this, you just wait for the source controller to kick in and it's during its regular reconciliation loop. So that's attach a cluster. I have prepared something here. We're going to attach a cluster and call it test, right? I'm providing, I have to provide different cube configs here because I'm using kind and the usual kind config uses localhost or 127.001 for the API server which obviously can work from inside of the cluster for flux to use. So I'm providing two cube configs. One is the internal one to be used by flux and the other one is the one to be used by the shell script directly, for example, to run flux install. That's not necessary if you're using like actual clusters. And the W flag lets you add the cluster to a certain workspace. As I said, a workspace is a group of clusters that has a different get repository and receives all, like all the clusters in that workspace receive the contents of that get repository. So it's just blah, no. Let's just do it and watch on the white side what is happening. So flux is installed. So it's not saying anymore. It doesn't find the resource, like the definitions, the CRDs on the white side. On the left side, it's still verifying flux installation, waiting for it to get ready. Now it's cloning the management cluster repository. I can't talk as fast as it's progressing. It's adding the resources to the get repository, kicking off a reconcile and white at that point, the attached cluster receives everything it needs. So you can see, it's a little hard to see. Maybe I make this a little smaller. You can see a get repository that's called cluster. That's the cluster specific get repository. Then the common get repository is the one that's the same across all the clusters that holds components such as a cube cost, for example, or a cube cost agents or whatever you wanna deploy across all attached clusters to make them manageable. And then the third repository is the workspace repository. And this receives its own namespace, right? So each workspace is an isolated tenant in a namespace on that cluster. To make it actually isolated, we already saw Michael showing that. I'm referring to that talk a lot today because it had so many awesome points. I've been using gatekeeper here in this example to really isolate tenants. So you cannot escape your namespace by creating, for example, a home release that would install stuff in another namespace, right? So gatekeeper enforces that or rather the policies that I defined in the management repository enforce that until flux don't let the user create anything outside of this namespace, right? And then there's the workspace customization that basically just points to the root folder of the workspace, a gate repository, and then you can just put anything there that you wanna deploy in the workspace, okay? And there's the single-hand release running right now which is gatekeeper, which is like a prerequisite to using workspaces and isolating tenants. So now the cluster is attached, now it's managed, we can see it in the gate repository, like in the management gate repository. There's a directory here that has the secret right here which holds the cube config of the cluster, there's the sync manifest that points to that secret and from then on, and let's see, right? So it points to the remote directory of cluster's test, so that's specific to each cluster and it uses the secret graph, right? And then remote has, in turn, the sync manifest of the cluster-specific gate repository. All right, everybody following? When I did this, let's see how the attachment works. It's pretty simple, it basically just removes the remote customization as soon as it's done with that, you will see that all the resources on the attached cluster will vanish, everything is gone, the cluster is detached and it's in the same state as it was before you attached it which is pretty nice. And it's basically just removing stuff from the repository, waiting for flux to reconcile and be done with it. What it doesn't do is it doesn't uninstall flux so as not to remove the CRDs and any associated work loads that may come from somewhere else which shouldn't really happen, but I just wanted to make sure that I don't delete anything a cluster user has deployed before. How does this look like from maybe a 10,000 feet view? You have the management cluster, you have the management cluster git repository and the persona which is the management cluster administrator having access to the management repository and to the common repository which is holding the stuff that every cluster receives then we create a workspace which basically means I create a workspace repository and some resources and the management repository. You have the workspace administrator persona that has access to that, then we create, we spin up a cluster it has its own cluster repository, then we connect the cluster to the workspace, then we connect the cluster repository to the cluster and then we connect the common repository to the cluster. So there's three repositories being reconciled by the cluster each for different target groups. Another cluster in the same workspace would look like this but also be connected to the common repository and to the workspace repository but have its own git repository for cluster specific stuff. So these may be two different teams working on different applications but still in the same workspace for various reasons like there's a million reasons to wanting to group clusters. And there's a third cluster there that's outside of the workspace so it only reconciled the common repository to make it manageable and the cluster specific repository. If that sounds complicated I try to come up with a visualization of what my demo repository actually does and this is like all the resources that are being created and the different colors which cluster receives them so the blue ones are for the remote cluster the red ones are on the management cluster. So it is really confusing like I got so confused when I initially bootstrapped that model like okay which customization does what and at what point and what do they depend upon and it turns out that making things simple like for each persona, right they just have access to their git repository and just put their stuff in there but making this possible is really hard, right using the tooling that's out there in the world today and that's like one of the key takeaways and why I wanted to come up with like a multi cluster template because everyone's doing it differently and because it's getting complicated really quickly there's so much you can do wrong, right and I think as a community we can just provide this template to users actually wanting to use multiple clusters. There's some takeaways that we as Data IQ took away from implementing a multi cluster approach with flux very quickly embrace upstream open source don't try to come up with your own solution if there is something out there already, right if there is something contribute to it if it's missing something, okay keep an ear to the ground like don't just stick to what you have but see where the community and where the rest of the world is going, right like that's how we came up with using flux because it gained traction and we saw like oh that's maybe an interesting project to look at and we actually incorporated it into our product at Data IQ, right I love making it simple for users so I love creating very fine grained and very focused CLIs and using the tooling that's out there it's so simple, right we at Data IQ we build CLIs with Go and there's like all the tooling like Viper and Cobra to build CLIs in a very standardized way make it simple for your users by creating these small little CLIs don't stop experimenting play around with stuff and maybe eventually it'll be incorporated into your platform one day maybe you just throw it away but you always gain knowledge and that's it it was really hard for me to pack everything I wanted to bring across in this half hour let's talk you can reach out to me via email, old school way I'm on Twitter and Marcus is basically my nickname everywhere else as well I'm active in the CNCF Slack you can post your question in the Flux channel if you have Flux specific questions there's also the SIG multi cluster in the Kubernetes Slack which has gotten a little quiet recently but that's a place where very many talented people sit that have much experience with multi cluster deployments and last but not least if you want to work on that stuff you can get hired by the ADOIQ and work on the next level of multi cluster thanks awesome so do you want to take questions or not? I would if you had the time we got a couple minutes I think any questions? Hi we've been there we've done all of that have you thought about actually scaling that to thousands of clusters with things like staged rollouts or incremental or progressive rollouts across hundreds of clusters at some point because the problems that we've run into with exactly this approach is that if you change something somewhere in your repositories it will get rolled out very quickly to all of your clusters and if you break something the impact create is quite big yes yes absolutely one thing that we built into the ADOIQ Kubernetes platform is that something that makes this not happen so for example when you upgrade a certain application you can make it roll out gradually and that's a reason to use workspaces you can start by rolling it out to a single workspace and then if it works in that you can go to the next workspace and then to the next so that's one of the things that you can use to gradually roll out changes applying changes to all the clusters most of the time is not the best decision right? you're right we know thank you so I was noticing that when we're watching multiple repositories like the common repository in the workspace you gave the example of using KubeCost as something that you might want to apply in all your cases what if you had a KubeCost version 1.0 and then in another repository KubeCost version 2.0 would these reconcile loops start fighting each other or does one take press sense over the other? yeah I mean you surely have to to to make sure that you don't like deploy the same application in the same namespace from different repositories right so the common repository would host something like that right and there like you have this you have this responsibility for that repository and make sure okay KubeCost in that repository works across my clusters but sure you have to find a certain way how to polish that right and that is also something I've been thinking about and not really like found a solution for what happens if something is deployed in a workspace and also in a cluster specific repository they would obviously be overriding each other right so that's something like that's a problem you still need to solve yourself and that's I haven't solved that yet alright that's all the time we got for questions thank you Max thank you everybody for watching another round of applause please