 Hi, everyone. Welcome to this SIG multi-cluster intro and deep dive session at KubeCon. I am Stephen Kitt. I'm a software engineer at Red Hat. And with me is Jeremy Olmsett-Thompson. I am a software engineer, and I work on Google Kubernetes Engine. Welcome to our maintainer track session at KubeCon North America 2023. In this session, we'll be covering what this SIG is about generally, the problems we're working on, what we're thinking about next. And we'll spend some time bringing those of you who are new up to speed on current activity. There's a bunch of projects that we're very excited about happening right now. We're seeing a lot more engagement from folks from different groups, from different backgrounds. It's an exciting time. And so we'll get through what are we covering right now in the SIG that we've got a new website. We'll talk about a cluster set and what that is. We've got an API, I call it the About API, for metadata, our multi-cluster services project, and exciting new project right now around building blocks for orchestration in the multi-cluster space. And then most importantly, we'll cover how you can contribute. So what is this SIG about? In traditional Kubernetes, you have single clusters. And we want to think about a bigger picture than that, where you have multiple clusters and you want to do interesting things with them all. So that involves, of course, initially being able to expose workloads from multiple clusters to each other using tools that are different from the traditional ingress-egress models in Kubernetes. You also need to share cluster metadata so that clusters know about each other and how they all individually fit into the bigger picture that you're trying to build. And just in general, break down all the walls that exist between clusters in a controlled manner. This involves lots of different functional areas, lots of different usage scenarios. But what we want to do as a SIG is identify what the core primitives to describe all this are, what the semantics are that are universally applicable across all the scenarios that we want to be able to describe, and what the best way of describing all this is in the way that enables interesting use cases without imposing specific implementations. And we really need what we want and we need your input from people who are trying to use multiple clusters together, what your use cases are, what your stories are, the difficulties you run into, or not, what you find helpful when you're trying to think about all this. And we're really interested in finding out what you're working on right now or what you plan on working on in the future. So I'll take a minute to talk about our approach as a SIG. And I think this has evolved over the years. The most important thing we've learned is to avoid premature standardization. The Kubernetes community has gotten really, really big. This is consumers. This is people managing their own clusters. This is cloud providers. This is tool builders. Flexibility is definitely needed. That's one thing that's become clearer and clearer over the years. And we just see it more and more evident. So the idea is let's focus on APIs. Let's focus on building a common language that we can use to connect clusters, to connect concepts that enable future tools to be built, that enable new patterns and practices, not the implementation so much. As different clouds, different environments have different networking constraints. As an example, it can be very hard to standardize there. There are so many different patterns that evolve, whether you're on-prem, whether you're cloud-based, whether you're doing something interesting that we haven't thought about running Kubernetes on Raspberry Pis. You name it, there are so many ways that Kubernetes can be deployed. Focusing on implementation, especially when stitching clusters together can make the problem space too constrained. So we really focus on those APIs and the core problems. So this is another piece. Avoid solving optional problems. This is super important. It can be compelling, especially for engineers, especially for me. I can speak from experience on this to try to solve all the problems. But getting to something useful that we can all agree on with these diverse backgrounds is much, much easier if we just start with the core problems, the problems that everybody has, no matter what. And figure out how to adjust the optional problems in the future as they become less optional, as the core solutions are consumed, and we see what's next. We also have a goal of keeping multi-cluster as consistent with single cluster as possible. I think this is important for just expanding existing tools, not coming up with new concepts that are different names to different ideas, but roughly overlap with things in single cluster. So wherever possible, we reuse concepts. And then the last piece is to try to work backwards from those specific problems into something bigger. And today, we'll talk about how some of the work we've started with, one problem, has kind of evolved into bigger problems involving more SIGs and more people. And you'll find a lot of this information on our nice SIG website, which is multi-cluster.sigs.kates.io. And this has higher level documentation for end users and also product status updates. And the goal is really to describe the different APIs in the SIG cares for, what they provide, how they can be used, and also to help connect implementers who are trying to implement, well, provide real-world services that provide these APIs to help them understand the goal of the APIs, provide also descriptions of our tooling. And once we, for all those implementations that do exist, catalog them so that end users can find them. And we're really interested in feedback on this website. What are we missing? What are the patterns that we should be describing? Because currently, it's mostly about the, well, like I said, what the APIs are, but not so much about how they all fit together and how they are actually used. So we're very interested in feedback on that. And open questions, of course, problem spaces that fit in with the SIG's concerns and that we're not addressing and ways to use all the existing APIs that we haven't described. So now I'm going to talk about the cluster set. The cluster set, this is a concept that we came up with a few years ago. It's basically under the realization that we needed some basic consistency in multi-cluster to start figuring out what we could build next. So the cluster set is the concept we came up with. And this is basically the core group of clusters that the concepts and projects that we work on in SIG multi-cluster rely on. So the cluster set represents a pattern of use from the field. And this is learning from talking to operators, from our own experiences, talking to customers, users, everyone who's been around Kubernetes trying to deploy multiple clusters, learning from what worked, what doesn't work, the pain points we've seen. The key here is that a cluster set represents a group of clusters that are governed by a single authority. Now, the scope of this authority varies in deployments. It might be in an organization that has their own cluster set. It might be a whole company, or it might be the company's platform admins. The important thing is that there is an authority who can make strong statements about the cluster set. Within a cluster set, there's an assumption of a high degree of trust. This is not external consumers. Kubernetes supports that well with Ingress already. If you don't care that the consumer is a Kubernetes cluster and you don't care about that metadata, that you can get from work clothes deployed within Kubernetes and you want to treat the consumer as external. That's great. That's been well supported for a long time. We're trying to solve the problem for within that organization when there is some degree of trust, when you do want to know more details about each consumer, when you might care about identities and access control consistency within clusters. And the core concept that makes this work is namespace sameness. The idea of namespace sameness is that a namespace basically means the same thing in every cluster in which it exists within a cluster set. Permissions and characteristics are consistent across clusters. This is really important. So if I can access namespace foo in cluster A and namespace foo is also in cluster B, I should be able to access both. If a namespace is used for a web service front end, it probably shouldn't use the same namespace for payment processing in another cluster. So same use case, same characteristic, same permissions. Namespaces don't have to exist in every cluster. That's completely fine. But in the clusters in which they exist, they should behave the same. And even you may have some namespaces that are like the default namespace that just have all kinds of things in them or system namespaces that our cube system is inherently local. That's OK as well. So long as if those namespaces are local in one cluster and don't have fleet awareness, they are local in every cluster and don't have fleet awareness. So the important thing here is consistency. The idea is that cluster set and the namespace sameness building block gives us one step closer to cluster fungibility, the idea that you don't have to care about the specific cluster that you're working in, that clusters may be replaceable. By relying on namespace sameness, you know that if you can access a namespace in the cluster, it's always going to be the same. And that takes a huge operational load off is what we've found. Now, given that, we just need to figure out how to solve cluster identification. And this is covered by the Abite API, which allows you to attach metadata to individual clusters. The cluster inside Kubernetes is such a fundamental concept. And it's assumed at such a basic level that Kubernetes itself doesn't actually have a way of talking about clusters at all. So in traditional Kubernetes, a cluster doesn't even have an identifier. And this is addressed by the Abite API. And you can see here, we have all the individual clusters within our cluster set. And they all have a way of attaching identification information and associated metadata to each individual cluster. So you can see all the four clusters are part of the same cluster set. But they all have different names and then additional metadata, which is added on top. The Abite API is described in the Kubernetes Enhancement Proposal Number 2149. It's available at the Abite API site. And it's currently in beta. It is a cluster-scoped cluster property CRD with a very, very simple schema. It's just name and value. And you can store cluster identification and any other properties that you're interested in that are cluster-scoped or rather that are attached to the cluster as a concept. And they replace annotations that you could typically find on objects that are similar or at least feel similarly scoped. So there are some well-known keys that are used as names for the cluster ID, for example, or the organizational ownership which cluster sets a cluster belongs to. So that's clusterset.case.io. And where it is in the overall world of your cluster sets, where it is on the network or networks, which environments it's used for and what the cluster's purpose is. This is really a core building block for higher-level orchestration. And we'll get back to that in just a little while. But before that, I'll talk about the multi-cluster services API. So earlier, I mentioned how we had built some tools to connect clusters together and try to break those walls down between clusters. And the multi-cluster services API was kind of the first big push of the SIG into this space. It was developed under the realization that if you want to connect clusters together and deploy applications across clusters with dependencies across clusters, services are kind of the core multi-cluster building block. You can't do a lot if you can't talk to other applications. So we decided to start by asking ourselves, what are the core problems in multi-cluster service discovery? And how might we extend the service concept, the existing service concept across clusters, really targeting that kind of high-trust east-west case? And what we came up with was a very simple API, actually. It is called a service export. And it's the responsibility of an implementation to figure out the actuation. But the idea is that by creating a service export resource, which is really just a resource name map to a service, we allow a single Kubernetes service to span or be consumed by multiple clusters. When we did this, we focused only on the API. And this is because a lot of platforms have very different traits. And there are a few implementations out there now on different platforms that have various constraints. The core of all of them, though, is that you can create a service export resource. And then that service becomes consumable across all of the clusters in the cluster set. And this is very powerful. One of the constraints we came up with with the API is that consumers should only ever rely on local data. This means that regardless of the implementation, whether it's Submariner or GKE, which I happen to work on, or any of the others, you can consume it just by seeing what's in your cluster. And we have a corresponding service import resource that's created in a cluster as a service is actuated in that cluster that makes it easy to consume. This is a very important resource I'll talk about in a minute. But the result is that cluster IP services and headless services basically just work as expected across clusters, which means that if you've built a multi-tiered application with dependencies on other services, you can start moving those services around across different topologies. And everything continues to work. So it makes it very easy to move from that single cluster model to a multi-cluster world. And that service import resource ended up being even more useful. So we worked with SIG network and the folks there on the Gateway API. We actually figured out that we could get the Gateway API to point at the service import and take the Gateway concept, the new Ingress API, and allow it to point to multiple clusters. So this becomes a really powerful tool with multi-cluster services. We sell that east-west cluster to cluster communication. And then with the Gateway API added on top, we make it possible to consume these services externally without having to care about the topologies and the clusters underneath. So we've talked about so far having services, having clusters that are described. And that works really well if you have an existing set of clusters that have existing services on them and you want to be able to build workloads that use services across different clusters. Using the MCSC API, as Jeremy just said, you can describe which services are available where. And thanks to the main space seamless principle, you know who can access what. And your workloads can have certain expectations about what they're going to find, even though they don't necessarily know or even care about it where things are, and they shouldn't have to care about it. But in practice, that's all well and good when you already have things in your clusters. But when you're trying to build bigger systems across multiple clusters, you quickly end up with a desire to be able to drive changes across those clusters. And this is where orchestration comes in. And this is often talked about as federation. And the goal there is to be able to have some central component that says I want to see this state, let's say, or these features available in these different clusters. So here, for example, to be able to say I want these services in green to be available on these two clusters. And so you're pushing configuration into all the clusters in a cluster set. There have been multiple attempts to address this. The first one was cluster registry, but it didn't really have a well-defined use case. And so it faltered. Another attempt, which was more based on an actual concrete implementation, was KubeFed. And this did have some use, but it ended up trying to address too many concerns. And again, the development stalled because it was too difficult to determine what the core features of this were supposed to be and what a well-defined set of use cases that it was trying to address. And so obviously, since there have been two attempts already, and all the implementers of multicluster solutions have their own private way of dealing with this, there is clear demand for multicluster orchestration. But it's also obviously very hard to define that in a way that makes sense to everybody and where you end up with APIs that work across different types of use cases, across different implementations. And so we're really excited to see a new proposal to try to address this, which is called the Cluster Inventory API. This is backed by a number of different groups and projects being discussed in this SIG right now. And it tries to provide a way, well, to see here are certain workloads that I want to have on different clusters. And it tries to address the problems that come up with that in a fashion that's consistent with the SIG principles where we, as we've described already, try to come up with an API that addresses the core concerns without enforcing implementations and so on. And there is a real opportunity to help influence this proposal to help give it meaning. And we are really interested in feedback on it. So please come work with us. So what's next? And this is really kind of where we look to you. We've been thinking about a few things that come up as recurring themes. But a big one is canonical patterns. What should we be encouraging the community to do with the tools that we've built now that we have them? What are the recommendations we can make around patterns or workflows based on our experience, based on folks using our tools and familiar with the space? What has worked? What should we be sharing more broadly? Another area that's come up many times now is leader election. I think once you have deployments across clusters, especially once you start having this orchestration, there's going to need to be some coordination required, some form of consensus. How do we manage, how do we coordinate in multi-cluster deployments? And then the big open question, what else do we need? We want to shift operations above the cluster and across clusters. What are the other tools that are missing to make that a reality? This is, as we've already said several times, where you come in. How can you be a part of the SIG? We have some well-established APIs, things that we are also working on. But we want to know what you are working on. You've got real scenarios, real use cases that you're trying to address. And if you're making progress towards actually implementing them, or if you already have something, then you've got tools that you've built or that you are building. You've also identified tools that would be really useful but that don't exist. And we are very interested to find out what those are. What problems have you solved? But what problems are also left to address? What problems do your customers have? What are the big missing pieces that we haven't even talked about that you've noticed? And well, is there anything that the SIG can do to help you? Or is there anything that you can do to help the SIG? And we've talked about multiple different areas already in this presentation where we are looking for feedback or looking for help to give direction to future proposals because we are well aware that the current APIs that exist and that the SIG has either defined or contributed to don't provide a complete solution for multi cluster use cases so far. So we are very keen to get more input from more people. And we have a number of places where you can come and meet us. So there's the home page that we already talked about where you can find out more about the SIG and what the SIG has done and what the SIG is working on. And if you want to actually come and talk to us, we have a Slack channel and you can see the address there. We also have a mailing list that you'll find on Google groups that you can join. And it might not seem all that active but there are people reading it. So if you post messages there, you won't just be shouting into the void. And we have bi-weekly meetings on Tuesdays at 12.30 Eastern, 9.30 Pacific, 4.30 PM, UTC. And they are usually well attended. They do get canceled if there is no agenda. So if you want to talk about things, be sure to add your information to the agenda. And you'll find the information about where that lives on the home page. So as usual with Kubernetes SIG activities, the way to find out about all this is to go to the website and join the Google group. And that will automatically push invitations to you for the bi-weekly meetings. And the meeting invites have links to the agenda that you can fill in ahead of time. Or just look out to see what we're going to talk about. And you'll also find there are notes from previous meetings so you can see what's been discussed in the past. And you'll also get a link to the Zoom calls so that you can actually join and practice. Thank you. Thanks, everyone, for the time, for listening. Please post your questions. We'll be standing by for the next few minutes to answer them. And enjoy the rest of KubeCon. Yeah, thank you.