 Hello, I'm Jeremy Olmsted-Thompson and I'm Paul Mori and this is the CIG multi-cluster intro at KubeCon North America 2020. So in this talk we will cover what is this CIG about and how do you contribute. We will cover turning off one's camera in order to conserve storage space. What happened before now in the lifetime of the CIG. We're going to talk about KubeFed, we're going to talk about the concept of a cluster registry, and we're going to talk about what's currently happening. So we're going to talk about things like cluster set, cluster ID, namespace sameness, multi-cluster services API, the work API, cluster sets, and what do they mean. So what is the CIG about? Well multi-clusters an extremely broad topic that means different things to basically everyone, but our best definition is just making multiple clusters work together somehow. It touches many different functional areas. We are really focused on trying to figure out what are those key primitives that you really need in all multi-cluster scenarios to make it work. And we need your input. We're looking for real use cases. We're looking for understanding the projects you're working on, but we're going to tell you a bit about ours. So many of you are probably familiar with KubeFed or Kubernetes Federation. There's been a couple different versions of that. The first one was built before we had custom resource definitions and was really about spreading a single resource where that might be a resource of one of the, well, that was a resource of one of the built-in types that was pushed to different clusters from a central cluster that hosted Federation and sometimes overridden. The V2 Federation took a similar approach in the sense that there are template resources that are spread to multiple clusters with overrides, but it heavily leverages custom resource definitions to create a new Federation API surface for any type. And this was a response to one of the shortcomings of V1, which is that in V1, since we didn't have CRDs, there was no just have a new part of the control plane for a new resource. So we leveraged CRDs to create new API surfaces for any type. And if you take a look over there at the right-hand corner of the screen, you can see an example of a federated deployment. So you'll see there we've got under spec, we've got a template that probably looks a lot like you would expect a field in a spec called template to look like. And then we've got a section called placements and overrides. And those are sort of what they sound like. The placement one contains which clusters does this resource go to? The overrides contain how does it change in a particular cluster? So if you take a look there, you've got under overrides, it's a list. There's one item in it. Cluster name is cluster two. And the override says the spec replicas field should be five instead of three. So similar API regime and approach, but trying to address that limitation of V1 by using CRDs to generate new API surfaces introduced a new problem, which is that having a distinct API surface for the Federation APIs means that existing resources require transformation to be used with that API surface. So you couldn't just take a Helm chart, for example, and use it with cube fed V2 without some alteration or transformation. And another point to mention here is that we attempted an integration with the cluster registry, but the correct boundaries were never apparent. And in this, in this presentation, we're telling the story, maybe with a new ordering, we haven't talked about the cluster registry yet, but we're about to. So just remember that we tried to integrate cube fed V2 with the cluster registry and there were never really a clear good set of boundaries. That's sort of a theme that we'll come back to perhaps about it. If we could just head on over to the next slide. So with all that said about cube fed V2, there's really no one size fits all solution in this problem space. If the model works well for some users, and there's definitely users and vendors that are out there using it. The the folks that are currently working on cube fed are considering how to add pull reconciliation, where what the contrast might mean, or means between this word that I used earlier push and pull. The push model is you've got something running in a cluster that's hosting cube fed. And it's pushing, making client connections to the clusters where resources are supposed to go and pushing to those things versus pull where you might be running a an agent in the cluster that's supposed to have the resources that were being pushed in the push model. And it's maybe watching a cluster that has the Federation API surface and pulling it in. So again, this touches that kind of cluster registry concept. But there, there's not a real clear boundary about what parts of that problem, a cluster registry solve. I do want to just pause here and say, and give a shout out to Jimmy and Hector and others from D2 IQ for helping to move this project forward. Thanks a lot, your work is appreciated. Why don't we just head on over to the next one. All right. So that cluster registry thing we mentioned a couple times, you probably heard of this. The history behind the cluster registry is that in between cube fed v one and cube fed v two, there was a point within the community where basically the only thing that folks could agree on in this very spacious problem space is that if you're working with multiple clusters, it makes sense to try to track them and to have some registry type thing. This is a very deceptive problem as problems with computers can be. It seems simple. It seemed very easy at the time that we would just build a registry and you can sort of imagine what it might be. It seemed easy to us. It turned out not to be easy. As I said, there wasn't ever really a great boundary that was super clear about what should a cluster registry type thing do versus something that would be a user of cluster registry. If you actually go and look at the cluster registry project, you'll see there's an API. There's no controllers to back it because we weren't sure what they should do. That's maybe a good point for us to talk about what approach are we taking in the SIG today. So with everything Paul just mentioned, we've kind of decided to rethink our approach. So instead of trying to solve problems right away, we're taking a step back and we're trying to think about how we can avoid premature standardization. We don't want to solve problems that don't really need to be solved and really just focus on the specific functionality that we want to build to meet real specific needs. So we've decided to work backwards from specific problems into something bigger only where necessary. We're not looking to solve all of the problems. We've got a few use cases that seem very, very clear that are well-defined that we're seeing users and other community members continuously bring up. And we're trying to figure out how can we just address those real concrete use cases and just see what evolves organically beyond that. So let's talk about cluster set. So note that even though it looks like the other Kubernetes set resources you may be used to seeing, this is a pre-API concept. It does not currently correspond to a real resource, but it's a useful concept to talk about. So a cluster set represents a pattern of use that we see. That is a group of clusters governed by a single authority that could be a team, a company, any organization, even an individual, but some set of clusters around which you can create concrete statements owned by someone who can make basically absolute decisions about the behavior of this cluster set. Within the set, we see a high degree of trust. Think about this as the extension of the trust that you'd see within a single cluster, not necessarily a case where everything can talk to everybody or everything needs to be exposed, but you should consider running applications in the same cluster set if you would run them in the same cluster. That doesn't mean they need to talk to each other. Obviously, there are still permissions and stuff in play, but owned by that same authority who feels comfortable running these applications together. We also introduced the concept of namespace sameness. This applies across all clusters in a set. What this means is that permissions and the various characteristics of a namespace are consistent across all of the clusters in a cluster set for a given namespace. That is, you shouldn't have namespace foo be used for one thing and have one meaning in one cluster and in another cluster in the same set have it used and means something else with completely different ownership. Namespace should be that primitive that you use to share ownership across clusters and should behave consistently in each cluster in which they're deployed. That said, namespaces don't necessarily have to exist in every cluster. It's just that any cluster they exist in, they should behave in roughly the same way. One of the first use cases for this cluster set is the multi-cluster services API. This started as a cap and has been evolving to alpha right now. Services are a multi-cluster building block. It's a specific problem, but it's a specific problem that basically everybody has in the multi-cluster case. How do I extend the service concept to multiple clusters? It builds on the concept of namespace sameness and allows that single service to span or be consumed by multiple clusters within a cluster set in a similar way to the way you would consume a cluster IP service in a single cluster today. We just wanted to focus on the API and common behavior. We didn't define an implementation. We left a lot of room for implementers to make different decisions. We wanted to focus on that basic API and behavior that means that whenever you use this API with any implementation on any platform or however you want to deploy it, you get similar characteristics and can kind of count on a standard behavior. We actually see a few implementations in progress already. We have Submariner has been working on an implementation with Submariner Lighthouse. We had a demo at the SIG from Cisco showing their implementation of the API. We've even seen Istio has plans to introduce the API in an upcoming release. The control plane in our design can be centralized or decentralized, but consumers only ever rely on local data. This is a big point. We left open for all of those implementations I mentioned and any more that may come. The way that you build your control plane is not dictated by the API, but we did want to make it consistent that consumers only ever have to rely on cluster local data when consuming a service. We're in alpha right now on the way to beta, but we'd love your input. Come by our meetings, join our group, give us your feedback. Another project that's in flight right now within the SIG is what we call the Work API. This is a different pattern for distributing resources than the one that we previously identified that was in use in Federation. Instead of distributing individual resources, the main idea of the Work API is that you'll distribute a collection of resources that are related to one another. Think about the difference between distributing a single deployment to many clusters versus distributing a deployment. Maybe it's got a secret or config map that it references. Maybe there's a service in front of that deployment and you'd want to distribute those things together. I'll also just add that this is a pre-alpha state right now with the cap that's coming together. You should not expect when you go and look at that that you'll see a super featureful thing being described, because we are working backwards with the initial concentration on finding the right API surface to model this problem for a single cluster. If we think about taking away all the dimensions of scheduling and having overrides and powerful selection primitives for where something should go and just think about how would we want to make sure that we applied a collection of some resource on a cluster and we had a meaningful status about that work, no pun intended, a meaningful status about whether those resources had been fully applied, was there a problem, that's sort of where our initial concentration lies. If we think about working backwards from that sort of leaf node in the problem of how do I do this correctly on a single cluster, we can sort of think about things like higher level primitives than just work for a single cluster like scheduling to some part of a cluster set or defining a primitive that allows us to specify overrides if we want to do that type of thing. But if we start walking backwards thinking about those higher level problems, we start to touch this sort of registration concept or at least identity concept around having a coordinate to use in reference to a particular cluster where something should be scheduled. So this is still a work in process. We are work in progress. We need your input in a sense that there's probably fairly strong chance that maybe one or two people at least watching this presentation have built something like work API before or at least are interested. So please come and join us. We'd love to have you participate. And I do want to give some shout outs here. Shout out to Valerie Lancy for her contributions in catalyzation to get the ball rolling on this and also shout out to Chojin for his contributions moving things forward and drafting a cap. So we come back to this kind of central question and problem again that it's easy to think about a cluster registry like thing. It's really hard to materialize in a functional functional software. The boundaries aren't clear. It's tempting to build first because it seems easy. And for those reasons and our previous history attempting to solve problems in that neighborhood, we're just going to avoid characterizing a registry as a first step and avoid that gravity that I think it's pretty safe to say, Jeremy, that we still feel the tug of that gravity sometime in the SIG. But we need some coordinate to use in APIs. What coordinate would that be? So we're talking about the concept of a cluster ID. And so the trick here is we do not have a registry API. So how do we come up with a coordinate without a registry API? And what we're looking at is taking that step back and trying to come up with the real common use case. And really what we need is a way to just uniquely identify a cluster within a cluster set for the lifetime of membership. So this isn't attempting to define the identity of a cluster forever. We want to confine the scope to the cluster set that boundary within which we already kind of have some use cases and can reason about. It does seem that this ID needs to be discoverable within the cluster. So we want a cluster to know who it is relative to the cluster set. We want to give a reference for multi-cluster tooling to build on within the cluster. So you know there's some characteristics that we want there. You know for example having the ID be a valid DNS label. This is something that's important to the multi-cluster service. Having a way to disambiguate backends for headless services between clusters. Other uses are maybe a coordinate to use for scheduling work with the work API. Or just an annotation for metrics and logs so that you can determine which cluster is potentially having problems and aid in debugging. But again we're kind of taking a step back and trying not to solve optional problems and really just get to that basic consistent use case that we need in order to enable kind of the next level of multi-cluster tooling. So in case it's not clear, we need and want your input. We'd love to have you involved in our part of the community. Share your use cases, problems, your ideas and you can check out our homepage. It's under SIG multi-cluster in community. We've got a Slack channel also called SIG multi-cluster. And hold on to your hat because we have a list which is also called Kubernetes SIG multi-cluster. And if you'd like to join us for our meetings there Tuesday, 1230 Eastern 930 Pacific Time. We hope to see you there. And let me be the first to say thank you for coming and watching our talk. Thank you very much. We appreciate it.