 Hi, welcome to the SIG multi-cluster intro and deep dive. I am Laura Lorenz and I work at Google. I'm Paul Mori. I'm Stephen Kitz. And I'm Jeremy Olmshead Thompson. I also work on GKE Google. So today we want to cover what the SIG is about. We especially want to highlight some of the next problem spaces that we're looking to take on. And we want to cover details about our current activity. So make sure everybody knows about some of our flagship projects, some of our new projects that we've been recently working on and also give an update, especially on some of the cap style content that we've been working on. And then we want to do a little bit of a deep dive. We want to talk about conformance tests and Gateway plus MCS in particular to share about how those things all work together and how they could help you with interacting with some of the projects of the SIG. And finally we want to share with how to contribute because we want to see you at SIG multi-cluster. So what's this SIG about? It's primarily about figuring out collectively what the Kubernetes native way should be to handle a number of scenarios. The first one is exposed workloads from multiple clusters to each other so that multiple clusters can be joined and have services, for example, use each other's services. It's share multi-cluster data and where it lives, where each cluster is compared to the others. And just in general, breakdown walls between clusters. And this of course touches many different functional areas. But we don't really know this is an evolving space so we're still working to figure out what the best and most durable primitives are, i.e. the best way to represent all the objects that we need to care about. And since this is an evolving area, we want and we need actually your inputs, real user stories. So what you're trying to do, the problems you're running into doing multi-cluster work or trying to architect multi-cluster solutions, deploy multi-cluster workloads, anything like that. We are keen to hear about it so come and tell us what you're working on and we'll go over how you can do that at the end of the presentation. So how do we go about solving these problems? I think multi-cluster is this huge, basically endless space and there's a lot of complexity that comes in stitching together clusters across different environments. So to make this a little bit more tangible and to prevent us from getting stuck on these big complicated ideas that maybe aren't solving tangible problems, we like to start with specific problems and work backwards into something bigger. So a good example, we'll talk about a little bit later here, multi-cluster services, where we start with connecting services together between clusters and just solving that one problem and then using that as a building block to figure out what's next. So we avoid premature standardization. We try to focus on the APIs that solve that specific problem and only define requirements that really need to be defined to solve the problem. Leave as much room for implementation as possible. This is the avoiding optional problems. If something isn't actually required as part of that core solution, we want to leave it open. We want to leave it open for interpretation. We want to let various implementations, various platforms, figure out how to solve it in their own way. And one of our primary goals here is to keep multi-cluster consistent with single cluster wherever possible. I think the goal of this sake is to make it easy to extend the same constructs you might use in a single cluster to this new multi-cluster world. So let's talk about our next problem spaces. You're going to notice, as we talk through these, that we've attempted to select work that's very focused on very specific problems, as Jeremy was saying. So for example, on multi-cluster networking, we're looking for more sophistication now that we've done a little bit of the groundwork. So with regard to network policy, we want to look at applying policy uniformly across clusters. We also want to look at stitching together clusters on different networks. And I'm sure that probably resonates with anybody that has attempted to make things work across clusters that are on like a organically developed network topology, you might say. When it comes to multi-cluster controllers and multi-cluster leader election, we're looking for use cases and specifics that can inform the SIG about recommendations we can make or work that we can use as a reference. In the area of the work API, we're looking at spreading groups of resources to different clusters. We also know that there's some interest in the areas of multi-cluster registries and control planes, but I wanted to make sure that we made a note that these are very, very broad problem spaces. And they're also ones that have deceptively intuitive naive use cases. In the past, when we've gotten into details, we really found that there was a lack of alignment on problems to solve and a purchase to take to do so. So we're very interested in this one in particular in finding what the most essential problems there are to solve in these areas. So let me provide a little bit of emphasis that if you're interested in these, we could really use some specifics to help guide and inform the discussion in the SIG. And finally, there's some work going on staple set slices for migrating staple sets between clusters. So hopefully it's coming across in the slide that we're interested in these really specific tangible and concrete problems. All right. I'm going to talk a little bit about some of the current activity of projects that we have going on right now in the SIG. So the first topic to cover is one of the earlier established ones in the SIG. So if you've been to this presentation before, you may have seen this a couple of times, but to reiterate it for anybody who's new, one of the core concepts that SIG multi-cluster builds our solutions around is the idea of a cluster set, where a cluster set is a pattern of use that we have observed from the field and talking with people who are interested in multi-cluster deployments, where it's a group of clusters that are governed by a single authority. So interestingly, they have a high degree of trust within them. And this is a point that we can leverage to build more standards on top of this unit. And in particular, a property called namespace sameness applies to clusters in a cluster set. And what namespace sameness means in brief is that within a given namespace, permissions and characteristics are consistent across clusters. So this will come up specifically later when we talk about multi-cluster services, where the principle of namespace sameness means a service of one name in the same namespace in cluster A should have the same characteristics as a service of the same namespace in cluster B. They have namespace sameness. They're expressing similar characteristics within that namespace. It doesn't necessarily mean that the same namespaces have to exist in every cluster or that the same workloads exist in every namespace or anything like that. It's just that if they do, then they should behave similarly. So that's the crux of namespace sameness. And this cluster set membership, what cluster set a cluster belongs to is a really important property for our solutions to build on top of. So one notable tie-in to the next slide is that what cluster set a cluster is a member of should be stored cluster locally somewhere that the cluster knows so that any components can operate under the knowledge of what cluster set they're in. And one of the projects of the SIG is to provide a place to store that, the about.kates.io cluster property called cluster set.kates.io. So going to the next slide, let me detail that a little bit more. So since we're establishing all of this context around clusters like cluster set membership that needs to be built on top of by multi-cluster tooling, we saw a need for there to be a place to actually store that. And we came up with the about API as a location to store technically any arbitrary cluster metadata, but there's a couple ones in particular that we think are important for the work that we're doing today. This is written up in the form of a cap. You can go see cap two one four nine for all the details. And it also is available as a CRD that you can install from SIGs.kates.io slash about API. But the idea here is that this is a CRD that describes a cluster scoped kind called cluster property that has a very simple schema just the metadata.name and some value where value can take on all sorts of different forms. But the certain resources of certain names might take a specific structure of value, but the CRD doesn't strictly say so in general. And two properties that we have specifically laid out in that cap are the two that are shown over here in this yellow box and in this blue box, which are one to uniquely identify a cluster itself and also to identify its membership in the cluster set. So this one cluster.clusterset.kates.io, the idea is that any resource in this CRD of this name should contain a value that is an identifier for that cluster within this cluster set. And this property called clusterset.kates.io should always have a value that represents what cluster set that cluster is a part of so that we have these two pieces of information locally and easy to access in the CRD. But generally the entire about API just including these two specific resources described in the cap are to provide references for any sort of multi-cluster tooling to build on top of. And it can be and it's explicitly in the cap that this is available as a well-known place to store these properties or any other properties that you might have for your other implementation of anything that might otherwise have been implemented as ad hoc annotations on semantically adjacent objects. So this is kind of a little example. You could have some other suffix here besides kates.io for your cool implementation of whatever and store some other type of data that is relevant to the cluster as a whole or that you feel is better suited to be put inside the about API. And so the kind is still, you know, this schema, this name value scheme is very flexible for whatever people might want to do outside of the well-known properties that we have described in the cap. So the next project I want to talk about is the multi-cluster services API. And I mentioned a little bit in the prior slides as well because the concept of the cluster set and the utility of the about API in great part came from the discussions around how to solve MC multi-cluster services. So this is also in the form of a cap 1645 and the API and some other reference implementation and end tests and other types of tests we'll talk about later too are available at kates.io slash MCS API. But this is an API that describes the building block of how to expose a service from one cluster to another. So this is a very specific problem that the SIG wanted to address. And the way that we did this was with this multi-cluster services API that describes what is the behavior of a multi-cluster service? How can an API just describe that a service should be available in other clusters? If we do so, what changes need to occur for things that people already expect from services like DNS? How does multi-cluster DNS work compared to single cluster DNS? And in the end, what we were able to achieve here is to allow an API to express how a single service should span and be consumed by multiple clusters. Similar to the approach slide we talked about earlier, it was really important for us on this project to focus on the API and the common behavior and leave a lot of room for implementations to actually fill out any details that weren't, didn't need to be common to the standard. So there's several different implementations, some of which you may have seen prior demos of from this maintainer track or other talks. And each of them are able to vary in the implementation details but still center around this API and the common behaviors that are laid out in this cap. It was also important for us that the consumers, clusters and workloads that are consuming a multi-cluster service only ever rely on local data. So that ties in a little bit to some of our decisions regarding needing to provide, for example, the about API as a cluster scope CRD that has local information about the cluster's metadata. And the other important part is that cluster IP and headless services work as expected that there's some continuity with the single cluster experience for these types of services so that it feels really natural for people who are already used to the service API to use the multi-cluster services API. All right, let's talk about the SIG multi-cluster website. I want to give a major shout out to Nikola Pinto for all of his hard work on the website, which is at multicluster.sigs.cates.io. The website has higher level documentation for end users. It's got project status updates. It's also got content to help connect implementers to our APIs and tooling and catalog implementations of the different APIs for end users. Thanks a lot, Nick. We really appreciate your help. All right, so I wanted to throw up a little bit of detailed information. I know there's a lot of words on this slide, but about some of those projects I talked about before that are built around a CEP. They have specific graduation requirements that we've been steadily working on in the SIG to graduate them to their next stage through the CEP process. So just to give a little update to everybody who is here, the about API and the MCS API are two that are following this graduation process. And for the about API, we're currently at a stage where, and maybe by the time this goes live, I guess, actively working on or through the next stage of PRR approval and beta API review. So for this project in particular, since we're using the Cates.io domain, we've already previously undergone an API review at the alpha stage. And so we'll do another round of that at the beta stage to move on to the next step. And for the MCS API, I'm going to skip the first one just for a second to talk about the last three, which it has a hard dependency on the about API. That's what this cluster ID CEP is referring to. So they're synced with each other. So one is depending on the other. And then the MCS API, other steps that we want to proceed with is the same because it's a CEP needs to go through this PRR approval process as well. And for that to get to the next stage. And also, the MCS API doesn't have the strict requirements for API review compared to about API, but it's still a voluntary step that we're interested in going through. And the last thing that the MCS API, the more content blocker as opposed to procedure blocker is about the end to end tests. And I want to hand this over to Steven to talk a little bit about the end to end tests for the MCS API and especially some work that is going on to build beyond just the end to end tests and into something that is really valuable for implementers and end users to evaluate and improve the implementations of the MCS API through conformance tests. And so major shout out to Steven who's going to give you the update and also Nick who worked a lot on these end to end tests as well. Thank you so much to both of you. So Steven. Thank you, Laura. The MCS project already has some end to end tests, but these have emerged from developing the proof of concept implementation of the MCS controller. And so they have some, well, they're fairly basic and they also have some assumptions that aren't necessarily valid compared to the actual spec as it ended up being. However, it can already be used as a sanity test of MCS implementations. So if you have your own MCS implementation, you can start two clusters, join them in whatever way is appropriate for your solution and run the end to end tests using those two clusters and you'll get to quick results telling you whether you match the end to end test expectations or not. But that's not really all that useful compared to the spec itself. So we started working on an actual conformance test suite and the goal there isn't to test the implementation that we have in the MCS API repository. It's to provide a tool which can be used against any MCS implementation to give you a report on how well that MCS implementation satisfies the specs requirements and recommendations and suggestions. It includes a while it will include once we've finished developing it or with your help if you want to get involved. It will include tests that model realistic flows of data and connectivity between clusters. It will also include references to the spec for any tests, not just the failing ones really, but all the tests that are run have pointers to the spec so that you can determine exactly which part of the spec is being tested and when your implementation doesn't match what the spec says, you can go and check what the spec actually says and figure out whether you should fix your implementation or whether you should come and ask us to change the spec perhaps. And there are obviously a number of non-spec attributes that can affect the results or that you might want to vary when you're running the conformance test suite and so these are all configurable. This is things like timeouts. For example, there's an expectation that when we export a service in MCS then eventually that service will be accessible across the cluster set but that might take more or less time depending on the implementation so that can be varied. Also the number of clusters, obviously you might want to run tests against two clusters, three clusters more, use this conformance test as a sort of scale test as well and you can even test some parts, use some parts of the conformance test suite with a single cluster because that does have some sense in the MCS spec. So that's pretty much all of the conformance tests and as I sort of alluded to this is still being developed although by the time this video is shown hopefully we'll have made some further progress but I imagine there will still be work for interested people to join in the fire. So now I want to talk a little bit about the Gateway API and MCS and how Gateway has extended the multi-cluster world. So first a big shout out to Rob Scott and other folks at SIG Network who have been driving us forward you know over the past couple years now. The Gateway APIs are really cool they are you know a flexible way to define gateways and figure out how services should be exposed beyond the cluster and build routes and you know basically the service networking model and how that should apply to to Kubernetes in a more flexible forward-looking way. So for quite some time now we've been talking about how Gateway can apply to multi-cluster services. In fact this is something that we kicked around back in early alpha days for Gateway and it's actually even supported already in some implementations like with GKE. But basically what we wanted to do is support multi-cluster services via gateways just like you would a service in a single cluster Gateway implementation and how we do this is we allow you to actually target service imports with a Gateway just like that single cluster service and now you can the same way that the Gateway API wants to expose a service beyond a single cluster you can use the Gateway API to expose a multi-cluster service beyond a cluster set and this gives a lot of power. So first you can define you know these flexible routes and rules but really cool features are for example using the Gateway API to do weighted balancing and canarying traffic between multiple clusters or multiple instances of a service if you want to do like a blue-green service rollout. You can now do this in the multi-cluster world where you know those services may be spread across multiple clusters each maybe in different clusters make it really easy to kind of handle that you know old service old cluster new service new cluster canary and rollout in a gradual way. So there's a lot of power here introduced with the Gateway API so again big shout out to the folks at at SIG network and also some other folks who've been involved like the SDO community. Right so before we finish this presentation how can you get involved? As we mentioned earlier we need your input we are very interested in hearing about your use cases your problems and your ideas so come and share them with us. It's our meetings which happen bi-weekly on Tuesdays and you can see the times there on the slide. Of course before you do that you might want to find out more about us on the home page which we talked about earlier. You can come and talk to us as well if you don't want to wait for a meeting or if you can't if it's not convenient for you join the Slack channel we have conversations there anytime of the day or night and you can also send email join the list on the address that's given there and if you join the list you'll automatically get invitations to the bi-weekly calls so they'll show up in your calendar and that will give you access to the meeting agenda and notes from previous meetings and the zoom link to actually come and join us. So we're looking forward to meeting you don't hesitate to come along we're all very friendly. Thank you so much for coming to our presentation on TIG Multicluster and we will see you at the Q&A. Thanks everybody. Thanks everyone. Thank you.