 Hi and welcome to our talk about patterns of multi-cluster in Kubernetes. It's awesome to see so many people. We weren't necessarily expecting a packed room, but it's really cool. I'm Dan McKean. I'm a product manager at MongoDB. And I'm George Hansaras. I'm an engineering director at MongoDB. Together we lead MongoDB's effort to support Kubernetes, specifically through Kubernetes operators. And you probably have spotted that there's a third person in the presentation, but only two people here. Our colleague Mircia, he's contributed a lot of information and insight to this presentation. He couldn't make it today, so this is our way of crediting his contributions both in our multi-cluster journey as a product and as a presentation. He's also following closely the multi-cluster SIG. And there's obviously a lot of overlap of the topics discussed there and what we're going to present here, but we're going to try and keep this through our perspective and try to tie theory with kind of our decisions on multi-cluster. Yeah, we're not claiming to be absolute experts. And if you are a member of the SIG, please don't raise your hand. We don't want to be even more nervous. So MongoDB's interest in multi-cluster Kubernetes started a few years ago before either of us actually joined the company. It came about because we had customers wanting to run MongoDB within Kubernetes across multiple clusters. This was something we already supported with MongoDB itself on kind of VMs or bare metal, but we didn't yet support it on Kubernetes. And although a few did kind of go ahead and build it themselves, the time and the complexity to do so made us realize we needed to do it formally and properly. So that's kind of where it came about. So we've come a long way since then and we've invested a lot of time and effort. We've reviewed. We've revisited a lot of our decisions. So in this presentation, we're going to try to kind of squeeze in all of this experience in this half an hour. So we're going to cover a few things today. And to try to keep it interesting, we're going to keep on alternating between theory and our decisions and what we are doing. So we're going to, we're going to present some concepts and considerations before showing you how we chose to implement them. So for that, we're going to look at our own multicluster implementation in the Kubernetes operator for MongoDB enterprise. We wanted to hold their hands up at this point and admit that while it's not all open source at the moment, we are, we are heading towards that direction. As you can see, it's quite a packed agenda and in all honesty, each of these topics could easily have been a 35 minute talk all by themselves. So we're going to do our best to give you an overview and kind of explain our experience as our perspective with some time for Q and A at the end. We can start with the motivations and benefit from multicluster. Talk a little bit about the architectures before kind of digging into the network centric pattern, which we'll explain in a minute. We'll talk about each of the three key challenges that kind of revolve around implementing multicluster, cluster inventory, workload, distribution across clusters and networking. And then we'll cover the considerations for multicluster operators and controllers before finishing off with our own thoughts on this topic. So multicluster offers a number of benefits. Improve performance and reduce latency by hosting services closer to the end users. Companies can significantly improve the application performance and the user experience by reducing latency. Compliance and regulation. Some regions have strict data sovereignty rules that dictate where data must be stored. Some even require that data has to be stored in a certain way, such as different clouds to ensure that the data is never lost. And finally and probably the biggest and most obvious reason for most companies is the high availability, disaster recovery and data redundancy, hosting both the services and the data across location ensures that failure at one has minimal impact. So we wanted to dig a little bit deeper in terms of MongoDB itself to illustrate how this actually works in practice and why we decided multicluster made sense. So MongoDB offers two topologies that work well with multicluster. Replica sets that offer data redundancy and high availability by maintaining multiple instances of the same data. And sharded clusters that split the data into shards and each shard is effectively a replica set. This ensures that each shard has redundancy and that the data can be split for performance or geographical distribution. So both replica sets and sharded clusters are supported by MongoDB Enterprise operator and with replica sets directly supported at the moment for multiclusters and sharded clusters coming soon. So that gives geographic distribution of data across different Kubernetes clusters that can be in different availability zone, different data centers, or even different cloud providers. And once sharding is supported that gives the ability to segment data and localize data in a specific Kubernetes cluster. We were able to get a lot of a leg up by the fact that MongoDB already has these capabilities. We just had to make it work within Kubernetes. And when it comes to multicluster, there's two major architectural models, though each has a lot of, there's a lot in each about different ways of doing things. So on the left, we have the Kubernetes centric model, kind of more often known as cluster federation. It's all about establishing some sort of shared control plane where the goal is to enable developers and operators to treat the multiple clusters as one when deploying managing workloads. To do it, you have a federation control plane and a high level that's that's the kind of two main components. There's the API server to provide the single unified API point for users to interact with the federation and manage workloads and a controller manager that actually manages the lifecycle of federated resources. Now, on the other side, we have what we call the network centric centric model. The concept there is distributing workloads across multiple distinct clusters, each of which has its own control plane. Now, Kubernetes operators deploying and managing workloads on the clusters have to interact with each cluster independently, using each of the API servers in the different clusters. But the underlying clusters are managed individually by the administrator. So we don't try to provide the view of one single cluster. So while neither is simple of the two models when it comes to actually deploying and managing workloads, federation simplifies things by allowing you to treat everything as one cluster. And the multicluster tooling helps you kind of put all the complexity under the hood. On the other hand, distinct clusters can offer more resiliency as one cluster failing has no direct impact in the rest of your infrastructure. Now, in our case, we're focused on distributing workloads across distinct Kubernetes clusters to support users who've not yet deployed federated solutions, federation solutions, and who want the increased resiliency for their data of totally separate clusters. So the network centric model is the primary focus of the next slides as well. So the network centric or workload distribution comes with its sets of patterns and concepts. So the hub cluster pattern refers to a model where you have one central Kubernetes cluster, typically known as the hub, managing one or more additional member or spoke clusters. We're mostly going to call them member clusters through the rest of this talk. In this setup, something in the hub cluster, usually a Kubernetes operator and controllers, is responsible for centralizing the management tasks and acting as the multicluster control plane for workloads running across the member clusters. This includes deploying applications, managing the workloads, setting global configuration, potentially distributing shared secrets, controlling access, integrating into and leveraging CICD pipelines, centralized monitoring and logging, and more potentially. Applications can be deployed across the member clusters from the hub cluster, taking into account factors like their workload requirements, the need for geographic proximity to users, optimizing the use of resources, their high availability needs, etc. There's different schools of thought about whether workload resources should be run on the central cluster or not, but from the customers that we have already deploying MongoDB, which uses the hub cluster pattern for multicluster, we've not seen anybody that isn't deploying their workloads on the central cluster as well. So another model can be described as the replicated architecture, replicated application architecture. In this case, an entire workload or it can be even an entire stack is replicated to another cluster, and the whole cluster itself, in my cases, might even be cloned to another location, to another cluster. Depending on whether the database backing the application is also replicated, centralized, or somehow distributed, it's possible that this architecture could still serve to provide some of the benefits like performance increase, reduced latency, load balancing, and so on. Now, if the secondary cluster is not actively serving traffic, it becomes more of an active passive setup, which we might be familiar with. And this is a less cost effective option, though. And lastly, multicloud, which is arguably less of a pattern on its own, and more of an implementation choice that can apply to either of the other two. Here, you can see the hub cluster pattern is implemented over three clouds, two private clouds, and in this case, Amazon EKS, just as an example, third cloud. It's something we're seeing from a few companies with the highest requirements for resilience. It also seems that a lot of customers are using this kind of hybrid on-prem and public cloud approach as a tentative step towards trusting the public clouds with their most critical workloads. The biggest challenge with multicloud is ensuring that your choice of tools and mechanisms are supported across the different environments. This makes it incredibly important to choose tools and mechanisms that are compatible with base Kubernetes, because that's the consistent elements across different Kubernetes-based distributions. This does limit how much you can use the value add capabilities of different clouds, whether private or public, that they typically layer on top of their own Kubernetes offerings. In the case of the MongoDB operator, we've chosen to go with the hub cluster pattern, the first pattern you saw. So the central pattern has a few different capabilities. First off, it hosts the Kubernetes operator and acts as the control plane for multi-cluster deployments. It hosts the MongoDB multi-cluster custom resource in which the operator defines a deployment. And optionally, it can host the MongoDB management server that we call OpsManager. And as we mentioned before, it can also host members of the MongoDB deployments. It can host a replica out of a replica set. Member clusters, the only thing that they do is that they host MongoDB replica sets. And while the data plane is already resilient to a cluster failure, we're currently working on making it easier to recover both our operator and their management server, OpsManager, to another cluster. So until we can offer a highly available operator, this will at least make disaster recovery quicker and easier. Now we're moving on the three key challenges that we need to be solved to successfully implement multi-cluster. The first of those is cluster inventory. So cluster inventory is kind of what it sounds like. It's all about keeping track of the clusters included in a multi-cluster environment. But this covers more than just listing them. There's cluster credentials, which arguably are bare minimum. You need these in order to be able to access the clusters and to manage workloads across them. And between the kind of cluster addresses and the credentials, you can actually deploy and manage workloads. But another arguably pretty firm requirement is cluster health checks. There's really no point in trying to deploy to a cluster that's missing. Many solutions, ours included, provide an inbuilt mechanism to monitor clusters and redistribute workloads to healthy clusters if a single member cluster goes down. And lastly, cluster resource tracking. This is less of a hard requirement, but very much a nice to have. It's something most multi-cluster efforts seem to be aiming for. It's very similar to how Kubernetes monitors nodes within a single cluster, assesses their resources, and works out which you can reschedule or schedule pods to to ensure that they have sufficient resources. In the case of multi-cluster, that's more complicated as you're not just talking about the nodes in the cluster. You're talking about multiple clusters. It's also valuable to be able to configure rules that actually govern and dictate when and where things should be moved to give users the control over where things are running. And it's useful to support affinity and anti-affinity to ensure that workloads are running where they're needed to. Now, in going back to our use case, for the cluster inventory, we have a relatively simple approach that we've built in-house. Firstly, we require the creation of a number of our back resources on each of the member clusters. This creates the service account used to manage resources via the cluster API, via each cluster's API server. Next, we need a cube config secret for the member clusters. You can see an example of that on the right. So, the cube context needs to contain the address of the cluster and the certificate created with the service account. And at the bottom, you see the user to use, including the token created for the service account. Now, while we see some companies automating this, we also provide a cube-cuttle plug-in that does it all for you by creating the namespaces, automating the application of the RBAC on the clusters, and creating the cube context on the central cluster, the hub cluster. It's worth mentioning here that our automation uses the legacy long-lived API tokens for the service accounts. And in the future, we'd like to switch to using the token request API to refresh credentials used to access the member clusters. Now, we're on the second key challenge that someone needs to solve to successfully implement multi-cluster, which is workload distribution. So, that relates to how workloads are actually distributed across the clusters. For example, in this simple diagram, we've got a hub cluster pattern across three clusters shown vertically. Each of the three workloads shown horizontally has a different distribution across the three clusters. And there's a number of considerations that can factor into how you distribute a workload, the required level of redundancy, the required geographic spread, either for resilience or to locate workloads near to users or other systems, and the resources available on each of the clusters. For example, workload one in blue may be serving users in each of the three locations where the cluster is running. And it may also need three replicas to provide the level of performance that's needed. Workload two in the yellow may primarily serve users in the location of the central cluster, but want the resilience of running copies in each of the other two clusters. And finally, workload three in the orange may have arguably the simplest requirements, primarily providing a service in the central cluster with resilience only needed on one of the others. So, it's also possible that there may be affinity or anti-affinity considerations in play to ensure that certain workloads remain co-located or distributed from each other. Going back to the MongoDB operator, we've chosen a declarative approach for workload distribution across multiple clusters. So, you can see that in the custom resource, we include the cluster spec list parameter. Users can set how many members of the replica set they want on each of the member clusters. And then we deploy the appropriate stateful sets on each of those clusters. Now, in case of a member cluster failing, the operator is able to automatically redistribute those workload, those stateful sets on the available clusters. And this is currently enabled in the operator level, so this is not shown here. Using the automated redistribution seems like it kind of undermines GitOps workflows, which are usually used, but the changes are stored as annotations and they persist while the administrator is able to restore the lost cluster. They can then manually reconfigure the distribution. So, multi-cluster is pretty impossible without the networking needed to connect the clusters. So, there's actually two subtopics within this, larger topic. There's inter-cluster connectivity, connecting the clusters and their workloads together. But there's also egress and ingress to or from within the clusters. They have pretty similar set of considerations. There's network topology. It's critical to consider how connectivity is already or can be established between the clusters and also to external services and users. There's security, what kind of security, what level of security do you require between the users and the workloads or between the workloads? And is it going over private networks or the open internet? Changes things considerably. Performance, how can you maximize the performance of the connectivity? Is it sufficient for the workloads you're running multi-cluster? And finally, reliability and resilience, how will you ensure that the connectivity is reliable and fault tolerant? So, looking into inter-cluster networking, there's a number of different options. First shown in the top left, we've got the VPN option creating a secure tunnel between the member clusters. On the top right, we have VPC peering, which creates a private network connection between different VPCs where our Kubernetes clusters live in. On the bottom left, we have load balancers and they expose traffic services externally and they connect through the internet. And lastly, we have the service mesh option. Service mesh is the most popular option amongst our users. They can provide routing, load balancing and security for communication between the clusters and even for services and users that are external to the clusters. So, we found that it's rarely the case that MongoDB is the first workload spanning multiple clusters for our users and most of the users already have a service mesh like Istio, Lingardee or something in place. And there are also cloud providers, specific service messes like MeshApp for AWS, Anthos service mesh for Google and so on. Moving on to Ingress and egress, which we're going to cover quickly, there are a few different options. Again, top left, we've got load balancers. They expose the services on a public IP providing ingress and egress. For multi-cluster ingress, typically you need a global load balancer that actually distributes and balances the traffic between the different clusters. On the top right, we've got service meshes again, again the most popular choice for our own users, which enable inter-cluster connectivity and provide routing, load balancing and security for external access. Bottom left, we've got network policies that can control the traffic to the clusters and between the clusters, but again they need something to distribute the traffic between the clusters. And finally, bottom right, we have the ingress controllers, great for managing ingress traffic, but again they need something to distribute the traffic between the different clusters when you're running multi-cluster. Now in the case of our operator, we wanted to allow users to choose how they want to handle networking. We've based our own quick start on the use of service mesh and specifically STO, which we use to do our own testing, but ultimately users just need to meet the minimum networking requirements, which is FQDN resolution between the workload pods across the different clusters. We also have guidance for using load balancers, but so far we've seen most companies choose to use service mesh for network connectivity. If they don't, as we just saw now, there's a number of options that they can use, but it's ultimately up to them to ensure that the connectivity between clusters is there and working to support the deployments. Now we're going to move on to multi-cluster controllers and operators. This could be a huge talk in and of itself, and it's actually one we're thinking about doing perhaps in Paris next year for the next KubeCon and CNCF. So there are a number of considerations when designing controllers and operators. Consistency is a massive one. You need to maintain a consistent state across all of the clusters. That can be challenging if different clusters are running different flavors or versions of Kubernetes and may even have different hardware and software configurations. Again, this makes compatibility critical. Resilience, mainly to cluster failure. This ideally covers both the workload and your multi-cluster control plane, whatever you're using. And scalability to support a large number of clusters and workloads. And finally, a big one, security to protect both the control plane and the workloads being run. Now in terms of functionality, there are a few key areas that unsurprisingly align with the topics that we just covered. First off, cluster inventory and workload distribution. Ideally, it should take into account the available resources on each cluster and what performance requirements the workloads have. Failure, handling. Ideally, the workloads remain healthy even with a cluster down, but at the minimum, a user should be able to reconfigure as needed. Preferably, this would be automated. So able to detect failures and take corrective actions such as rescheduling workloads to remaining healthy clusters. Monitoring and observability. Administrators need to be able to monitor both multi-cluster control plane and its managed workloads. This means that your solution should emit metrics and logs that can be used to track their performance and health. Ideally, via external and commonly used systems like Prometheus, alert management, manager and so on. Now, we're taking a slight departure from the pattern of covering a bit of theory and then looking at what we did at MongoDB. Instead, we're going to take a quick look at a couple of prominent options that already exist for implementing and deploying multi-cluster. So these solutions are agnostic of any particular vendor's offering, which means they don't take into account the logic of a service like MongoDB, for example, but they do allow you to run existing workloads across multi-cluster environments. Kamada and Open Cluster Management are two of the more prominent CFCM projects that we're aware of. I don't doubt there are probably others, but these are two that we're aware of and we picked to talk about. They've been a source of inspiration for us for both the existing and future iterations. They're both great solutions for implementing multi-cluster distribution in a unified manner, without relying on custom controllers for each piece of software that you're trying to deploy. Like I said, some tools like MongoDB have this kind of multi-cluster functionality built in and it has the advantage of catering to our own logic and requirements, but obviously at the cost of not necessarily being able to support other services running multi-cluster. We're not going to go into massive detail here, since both of these do a lot, but there is a lot of overlap, such as considering cluster availability and resources, automated or manual failover of workloads between clusters, and cross-cluster service discovery. They also both offer solutions to cross-cluster networking, which as we mentioned, we don't. We are relatively unopinionated about that, though each does do it in a slightly different way. One slightly subjective difference between the two is that Kamada is a bit more abstracted, while open cluster management offers more control through prescriptive configurations. They can both run with an agent on each of the spoke or member clusters, which pulls the config from the center cluster and applies it, but Kamada can also run on an agentless mode by pushing configuration to the member clusters, again through the respective API server of each cluster. And one last difference that we can see being useful to some users is that Kamada supports retrofit fitting a single cluster workload to multi-cluster. So having a look at a couple of great existing solutions and also talked about the considerations for building your own multi-cluster offering or just choosing one off the shelf, we wanted to reflect on some of our decisions at MongoDB by sharing some of the changes we'd like to potentially have made in the past, but at least make going forwards. And perhaps some of these might help you on your own journey identifying what you need and what you need to do. So as you saw today, we offer a relatively prescriptive approach to workload distribution. We mentioned the declarative approach. So you said how many instances of a replica you want on each of the member clusters. This works, but we would love to abstract it a bit more. It is a bit counterintuitive if you think of Kubernetes primitives and the philosophy behind it. So today we compute an index of each of which instance of the database lives on which clusters. Users can see the index in annotations, but can't edit it even though if you would like to. But what they really want is to decide where the primary instance of their database lives since that's the only one that they can write on. And that's the important part. They don't really need to edit the index. They just need to prioritize the list of clusters and have the operator manage things so that the primary ends up in the right place. That's one of the things that we hope to offer in the future. For us, resource awareness is also another existing piece. Although we check the availability of the clusters in order to potentially reallocate workloads in need and also to deploy them in the first place. We don't currently check the available resources. Essentially, for us, this pushes the resource allocation problem onto the users, which is, again, something we'd like to make easier for them in the future, as it's especially something you don't want to have to consider in a DR scenario. On a similar note, we do offer automated failover, like we said, but it's not highly customizable. It's also on or off at the operator level. And while it's valuable, we'd like to make it on a per workload basis and iterate to give more rules and more control over when a workload should be redistributed and even potentially how it should be redistributed. And lastly, we'd like to make the operator highly available, have it spun multiple clusters. It's easier to restore it on another cluster than it would be for our database or management server, but it still is a single point of failure for this setup. Having the operator also run on other clusters could also have an added benefit. We would be able to offer a pull model, as we described before, for that similar that Carmada does. This would allow secondary operators on each member cluster to intermittently pull configurations from the center cluster and apply it locally instead of the current model which, well, the model of having, of pushing from the central cluster to the member clusters. This would be really good for scaling and supporting multiple Kubernetes clusters, but also allows for secondary operations to member clusters to make changes in the event of losing the central cluster, electing a new primary or putting a database in read-only mode on a single cluster cutoff, and so on. So we're very close to the end, which also means you're all quite close to getting lunch, which I'm excited about. Before we wrap up with some Q&A, we wanted to share some final thoughts on what we see being the future of multi-cluster. Some of this may already be more in progress than we're aware of, but this is kind of what we imagine coming up. The first and most significant thing we expect is that it becomes significantly easier to run multi-cluster systems, likely through more powerful and user-friendly multi-cluster management tools. There's always a drive to simplify across kind of our ecosystem and community and automate and abstract the complexity further and further away to make it much easier to manage. We've already outlined a couple of existing projects that are making great progress here. We don't think that it will stop, but simplifying for the cluster admins either. You also need to consider the developers who want a simplified and abstracted way to actually deploy across multiple clusters without knowing all of the finer details about how it necessarily works at a deeper level. Most application developers don't have the time or the inclination to become Kubernetes experts. The next is almost a cliche now, increased use of AI and ML. It's already started. There's a load of boots in the expo hall that you can go and see that are already working on this and integrating this into what they're doing, but we're expecting this to be used in multi-cluster management and arguably Kubernetes use in general. This can free up the administrators to work on more strategic tasks. We're also expecting better network support for multi-cluster Kubernetes. This is one where there's already a lot of work happening. There are a lot of interesting CNCF projects. Some of them are cloud agnostic, commercial offerings, and of course, this is something that cloud providers already have many solutions. And finally, the bullet that's been there all along. Better support for hybrid and multi-cloud deployments. As we mentioned, we're seeing some of our users spanning across more than one cloud providers, some even using it as a stepping stone towards cloud adoption, some on-prem, some on-cloud. And multi-cluster Kubernetes management tools need to provide better support for these type of hybrid and multi-cloud deployments. And that's it. This QR code or link, they go to the same place on the right. We'll hope you get to various related reading materials from docs about where an operator, if you're interested in that, to infer about the CIG multi-cluster. And there's a load of information on their site. It's really interesting reading. If you enjoy this, then you'll probably enjoy that as well. To more about Commada and open cluster management, because we talked about them and we're fans. But now I think we do have a little bit of time for questions. If you do have any, please make your way to the microphone as this is being recorded. And that way, people will be able to hear your questions as well as our answers. The MongoDB has two modes of operation. All the writes go to the primary and then the reads can't come from secondary. Sorry, can you get a little bit closer to the microphone? So yeah, so MongoDB has two modes of operation. Strongly consistent and more like eventually consistent. Reads can come from stale secondaries, right? It's multi-cluster setups. What's the typical deployment? Your customers run them in a strongly consistent manner or eventually close to the manner? And how's that specified by your users? It varies on the use case, but through our operator, whether it's multi-cluster or not, they can configure how many nodes need to be written to before it's confirmed as written to. So effectively they have control over it on a per application basis and it does vary based on the use case as to whether they need the speed or the guaranteed consistency. The reason I ask is that how do customers specify a strongly consistent or eventually consistent to the federated services or community subtraction? Is there a way for them to tune them? Yeah, it's the same way across both types of solutions. I think the terminology is called write concern and I think the default is that it has to be written to the majority of members of the replica set, but they can alter that. They could say that the write concern is one in which case as soon as it's written to the primary, they'd get an acknowledgement that it's written. They could say that it's all in which case they get maximum consistency and it's guaranteed to be written before the application gets a response saying that it's been confirmed written. You did speak about the improved resilience with the multi-cluster design. I just wanted to get an idea of how you undo same maintenance modes across multiple clusters. Sorry, I missed that. How you undo like maintenance modes across multiple clusters, cluster upgrades, post them, replacements, etc. Like that kind of thing. How that process, how you orchestrate it and move it. Did you catch that? No? Sorry, we're not hearing you that great. Okay, the summary is how do you handle maintenance modes across the multi-cluster design that you are selecting? Did you say maintenance modes? Yes, maintenance, yeah. Like Windows rather, maintenance modes. In what sense do you mean maintenance modes? No, maintenance windows is the question. Oh, maintenance window. All right, okay, I see. Well, that's not something we've necessarily got a great solution to that we've built into Kubernetes. MongoDB already takes into account that. Our upgrades, for example, though, are done in a rolling fashion even across multi-cluster. If you needed to upgrade, the operator will actually handle that and do it in a rolling fashion. MongoDB already has support in for that. Most other impactful changes are done in the same rolling fashion as well. That mostly covers most of the maintenance that's required. That's covered by the part we said that you use a declarative configuration to define what number of replica sets lives in each Kubernetes cluster. If for any reason, one cluster is brought down for maintenance upgrades, whatever, then these are going to be rescheduled to the remaining clusters. The control plate manages that automatically. Yeah, the operator manages that. Transparency moves the visit in the same easy, and you try to attach the same volumes, or how does storage, basically, how does the new replicas coming in the new cluster get seated with the right storage? Well, that's something that happens not because of the operator, but because of Mongo. Essentially, because as we said at the beginning, we want different Kubernetes clusters to be isolated and not managed in the same way. We're talking about different PVCs, so it's not easy to... Essentially, what happens is that you spin up a new replica set, and Mongo takes care of replicating the data there. Yeah, as if you would add a replica set into your Mongo deployment and then it takes care of it. It's similar to if you scaled and you added new members of the replica set, they would just synchronize the data. That's automated, so they're identified as effectively new members of the replica set. At the moment, we haven't gone that far in that sense in addressing that. That's something we're very conscious of and pod disruption, as you say. It's one of the iterations, but if we listed all of the changes we'd like to make, we'd have been here all day. But yeah, thank you. At the moment, we're using static API tokens to actually manage the workloads on the member clusters. As we mentioned, we'd like to switch to the more automated model with the automated rotation and so on. At the moment, we're avoiding it by having static. If someone were to rotate those on the member clusters effectively, we would lose access to manage the member clusters. That's one of the reasons why we want to move forward and iterate on that. How does the hub cluster call each of the API servers? That's if I understand correctly. The first step of setting it up is we set our back in all of the member clusters, and we create the roles to give access to the member cluster API. We issue the tokens and those are all stored in CubeSecrets in the hub cluster, and those are used to make the API calls. Don't remember that. Yeah, I'm not sure. But this is one of the advantages of moving to a pull model. If we could have the operator BHA, then we effectively have an agent running in each of the member clusters, and a lot of those considerations get easier. Well, essentially, it's just a cube cluster where you just install the operator, and then that becomes your hub cluster for their MongoDB deployments. There's no other official designation other than the fact the operator is running there. And then the other question is, how do you handle it? So that's another thing on our to-do list, eventually. At the moment, you'd have to go on to any of those hub clusters to pull the relevant logs for our workloads running there. That's why we propose you use something external, like Prometheus, because then you can add an extra layer like Add Thanos or something else, and then you can have a unified view of what's happening in the different clusters. Thank you. Thank you. You're one of the scary people for us anyway. So when I was doing research, and I was talking to, in particular, Mertz here about this, I kind of asked, like, are these tools solving some of the problems that we want to solve? Now they're gaining maturity. Why wouldn't we move over and stop worrying about building our own? And we kind of had this long and in-depth conversation about the fact that there's so much bespoke logic to MongoDB that we want to take into account. Building those considerations on top of that would probably be as expensive as us just continuing our journey with our own solution that already takes into account lots of that logic. And it's also a matter of reducing the number of tools that our customers need in order to use it. We have seen cases where people don't even want to use hell more. Like the bare minimum, so we try to use only the bare minimum, so we make it easier for everyone. In general, we're very unopinionated about third-party tools, because if we recommend anything, let alone require them to be running anything, then they expect us to know loads about that and support them on that. And we honestly, we don't have the time and the expertise to be able to do that sort of thing. So yeah, thanks. I have a question about Rob Cluster. How did you set up watches across different clusters running a single copy of an operator? Did you use some kind of framework or all your own? Do you have any issues with latency, et cetera, watching remote clusters? So we're using, I don't quote me on the terminology, but we're using a health endpoint on the member clusters to basically check their availability. We'd like to go deeper and check their resource availability as well. In terms of latency, no, we haven't experienced any latency problems either in the kind of control plan of our multi-cluster setup or in the workloads, but we can't take credit for the workload aspect. That's MongoDB's inherent functionality. But there's no MongoDB operator? No, they're not. The operator on the central cluster is literally just using the API server on each of the member clusters to manage the stateful sets that it's running there that, although they're individual stateful sets across all the clusters, it effectively amounts to one replica set. And it's watching, right? For changes across one group of clusters. Yeah. Okay, thanks. Well, because we deployed the stateful set, within that Kubernetes cluster, a lot is managed automatically using the inherent stateful set functionality and so on. So like if the pod is last, it will be replaced and so on. And we get a lot of that for free because of Kubernetes is brilliant. And somebody deletes a stateful set there. You can make that again and stuff. Yeah. So we're still reconciling periodically to make sure that what's running on the member clusters is correct. Okay. Thanks. Thanks. Sorry, I don't know if we're positioned badly to the microphone, but we've been struggling to hear a lot of people. So regarding to the rolling upgrade to the multi-cluster, so do you provide the new features running well, like on a single cluster, while the deployment is finished by the first cluster? So is that seamlessly so that there is no chance to test on the separate cluster to see if the feature is deployed or not? So if I understand correctly, if we have a rollback mechanism, if we do a rolling upgrade on multi-cluster, do you have the answer? I know we have visibility of it and we expose it in the status of the custom resource that you deploy in the central cluster. I don't know if that answers the question. So it will be exposed from the central cluster that some of the features deployed to maybe one or several parts of those multi-cluster racks. So maybe I can have a denies to only just separate clusters so that I can test if the feature is already deployed there or if some arrow comes up so that I can roll back the deployment or something. So you upgrade only one member and if it goes okay, then upgrade the others? Yeah, actually that's what we are currently doing now. So I'm wondering if that is implemented in your system, so maybe it's an automatically process so you roll up rolling updates to all the multiple clusters. So I'm wondering if there is a point that I can check if that's running well in the deployed cluster with this new feature. So for the actual kind of functional features of MongoDB itself, we don't have that at the operator level. There's a lot of interaction with the management server which provides the automation config and all that. At a kind of upgrade level for instance though, if we upgrade a given member of a replica set, we wait until it comes up and reports healthy before we then move on to any of the others. So if it failed and got stuck, we wouldn't proceed. I think that's all the questions. Thank you everyone for coming.