 Hi everybody and welcome to the multi-tenancy working group update for KubeCon North America in Detroit. We're super excited to be here today and we're going to be doing a brief review of all of the tools and capabilities that the multi-tenancy working group has been releasing and give you and so like kind of glibly we're going to be doing tips, tricks, tools and tests and also talking about both where we've gotten in multi-tenancy and Kubernetes with the working group and where we think we're going to go next. So quickly introducing ourselves, I'm Tasha Drew. I'm one of the co-chairs of the working group for multi-tenancy in the Kubernetes upstream community and my day job is Senior Director of XLabs at VMware. I'm Ryan Bezutrak. I'm a Principal Engineer at Twilio and I mostly spend my time with HNC. Hey folks, this is Jim Baguardia, co-founder CEO at Nermata. I'm also a co-chair of the group. I also work with the policy working group and I'm a maintainer of Kverno. Hi, this is the very cool. I'm currently working with Microsoft in the Azure APS team working on various Kubernetes related projects. And I'm Adrian Ludwin, original author and maintainer of HNC, hierarchical namespace controller and a software engineer on GKE at Google. So the goal of the multi-tenancy working group in the Kubernetes community, working groups are set up as sort of time-bounded groups that come from a bunch of different special interest groups and are looking to solve a specific problem or challenge for the Kubernetes community. So in this case, when the working group began, we were really looking at how are people going to achieve multi-tenancy in Kubernetes. And so the goals of the working group here were to define the models of multi-tenancy that Kubernetes will support, discuss and execute upon any remaining work that needs to be done to support these models and create conformance tests that will prove that these models can be built and used in production environments. So towards that, what we have worked on is a project that was originally called virtual cluster and is now a cluster API provider called nested. We've also had the hierarchical namespace controller HNC. It sounds like agency, but it's the letter is HNC. And then the multi-tenancy benchmark project. If you would like to check out any of these code, we're on GitHub, you can see the link on this page, just Kubernetes slash SIGs, multi-tenancy. We have a Google group where you can see all of the kind of regular communications on our mailing list. That's also how you get an invitation to our regular meetings and all of our documentation. And we have a Slack channel in the Kubernetes Slack that folks are welcome to join. We do have regularly scheduled meetings every Tuesday at 11 a.m., every other Tuesday, sorry. And again, you get those calendar invites by joining the mailing group. So I'm going to talk a little bit about one of our most recent projects as a working group, which was, as we said, part of our goals were to define the tenancy models. And what we found as we started talking to people is that everybody had a good idea of what a tenant was. The only problem was all of those ideas were different. And then after a fair amount of work, we were able to find out that you could group them into sort of two large categories. Either you had multiple teams, for example, they were all owning one or two microservices, and they wanted to share the same cluster. Or you had some kind of SaaS model where you were running many copies of the same application. And so we started calling these things like multiple teams or multi-team tenancy and then multi-customer tenancy. And even those were a little bit too restrictive. We found, of course, that people could mix and match them in various different ways. But what we have done now is we've codified these ways of talking about tenancy into documentation that went into the official Kubernetes docs earlier this summer. And so that was one of, as we said, the original goals of this group was to categorize all the different types of tenancy. We feel that this captures the vast majority of the tenancy models we see. We also, when we started, we found people talking about things like hard versus soft multi-tenancy. And again, this was very clear to any one user what that meant. But what we found, again, was that everybody had a slightly different idea of what that meant. And so what we've done is we've broken this down along three, let's call them axes, control plane isolation, data plane isolation, and then other considerations. And each one of these is not binary. It's not hard versus soft. They're all kind of spectrums where you can have degrees of hardness or degrees of softness. And they mainly break down along these three axes, control, data plane, and other considerations. And so most of the people I think in this group, plus another couple of folks from AWS and a few other contributors helped us deliver this. And so that was one of our achievements over the summer, was to get this checked in. So I'm actually going to go very briefly now and with Jim's help, we'll very quickly outline what those isolation techniques are. So on the next slide, we can talk a little bit about control plane isolation. Sorry, back one. So control plane isolation are basically anything that you access using kubectl, anything that's on the API server itself. And so this is the way that most of us interact with Kubernetes on, at least if we're trying to tell it to do something. And there are a lot of built-in constructs in Kubernetes that help with tenants, with tenants at the control plane level. The most obvious one is namespaces. But there are other things such as access controls such as RBAC and quotas to make sure that no one user or tenant is using more than their fair share of the cluster's resources. And so again, there are isolation in the control plane is on a spectrum. You can either use exactly what Kubernetes gives you, which is okay, but not a ton. If you want to completely isolate control planes, you can use projects such as virtual cluster or cappy nested, which we'll talk about later. And then in between them, you've got things like HNC, which allows you to group namespaces into hierarchies, hence the name hierarchical namespace controller, and share things like the access controls and shortly quotas across different tenants. And so it's so many people, this is what multi-tenancy means, but what we discovered the more we looked into it is that this is only part of the story. And so Jim, now we'll be talking about the other parts of the story. Yes, in Kubernetes, the data plane or what's referred to as the data plane is really the worker nodes and the pods and the applications that run on it. So just like Adrian covered, the isolation techniques Kubernetes provides on the control plane, there's several different resources, different configuration options you can leverage to start isolating the data plane. So the first one which is notable here is network policies, of course. So having some default set of network policies where tenants don't, you know, can't see or can't get to traffic from other tenants and there's strong isolation across those, that's important. Similarly, on the storage side, following some best practices for configurations, which the docs go into details on, making sure PVCs are configured PVs and PVCs, so that one tenant can't either inadvertently or through other mechanisms claim, you know, volumes of other tenants. Beyond that, there's also, of course, things like container sandboxing, which are, which is a way of creating stronger isolation for data and for container runtimes. So through tools like gVisor and other, you know, solutions like Cata containers for sandboxing. And then finally, you know, what we're seeing is patterns in the community where some users, some large deployments have used node isolations for specific tenants, through labeling, through placement, through other mechanisms that Kubernetes provides itself. So all of these combined and again, like Adrian, you know, emphasize it's a spectrum, you can choose which ones make sense for your deployment and come up with the right data isolation techniques. But beyond that, there's a few other things Kubernetes provides, which are important to take consider and leverage as well. One of it is, of course, we've all heard of noisy neighbors and tenancy. Now, of course, if this is teams sharing the same cluster, maybe you have a little bit more control of this within an enterprise. But if you have, if you're using multiple customers as your use case, you want to make sure that there's priority fairness across those customers, across the different, you know, types of tenants that you have. Similarly, you can configure quality of service for your workloads. And Kubernetes gives you, you know, an extensible set of classes for this, which you can leverage across different tenant types, and within your tenancy models itself. And finally, one interesting thing to call out is Kubernetes, of course, in a cluster, you have your cluster constructs, you have shared services, like DNS is a good example of a shared service, which is necessary, you know, to complete the runtime operations. And though you have to think about whether you would, you know, you can manage a shared DNS service, or you would want a DNS per tenant, so that, you know, you can't do a lookup or other things across tenants itself. But all of these constructs are available. There is, you know, the Kubernetes stocks go into, of course, much more details into this in the multi-tenancy section. And then there's links to other areas where you could find this itself. So given the control plane and data, data plane isolation techniques, that lets you set up and some of these other considerations, that's how you would configure multi-tenancy. But going back to the use cases, Ryan's going to start describing how end-users can actually consume these different tenancy models. Right. And as we kind of gave, and probably as we've talked in today, we've kind of decided over the last few years, there's three main types of tenancy models for Kubernetes that we've seen. The first being namespace as a service. And this is just that. Teams are generally given a namespace and a namespace only for control, which means CRDs and other operators are shared on the cluster, which has as pros and cons. This is generally the easiest lift for dev teams to start consuming, so they don't need to learn Kubernetes as much as they may in others. But it does have some tradeoffs of more operational toil for teams and versioning of operators, et cetera. The next kind of level down from that would be cluster as a service, which is what many teams, I think, broadly start with. Each teams get their own cluster. They have a lot more broad control. But as a dev team, you either are responsible for a lot more of that operational task, or you have a centralized SRE team, which gives you a lot of clusters for all. And kind of a happy medium between that it would be the control plane as a service. An example here would be the virtual cluster service, where each team has their own control plane, but the workloads are shared across many worker nodes within isolation. Meaning from an operational perspective, you have a smaller set of nodes to manage, and the control planes are generally virtualized through Kubernetes pods or something to that effect. And so those are three tenancy models that we've we've kind of outlined in our blog post here and within our docs. And with that, we are going to move on to a panel discussion asking the other team members about these different models and what projects we've created that help you implement these. And our first question will go to Faye. Faye being a core contributor to virtual clusters and cluster API provider nested. And now that virtual cluster did merge with cluster API provider nested, how do they, the projects cooperate? What new features can be leveraged now that it's in cluster API? Can you kind of briefly discuss the differences in what's going on there? Yeah, you're unmuted. Thanks, Ryan. So I can share a little bit about the background. So the virtual cluster project was original in CRB called cluster version, which is used to do the tandem control plane management. Well, that CRB is pretty simple, which is kind of with building in memory EDCB, and there is not a video of doing upgrades. So because of that, we are trying to see if we can leverage the existing, the big CAPP community for doing the cloud management. So for the CAPP, that is purely contributing to the project without any workload included. So we think it's best for the places like virtual cluster where you don't need a dedicated node resource for tandem control plane. So that's the reason we think these two projects are good fit together, and we put this two projects in one report for easy development of the management. From the feature perspective, I think from the virtual cluster, for the virtual cluster itself, it kind of emerged in the past years, our major focus was on the box fix and the test coverage. And we do have several enhancements in the sense, especially for the extensibility for other features that people count. One example is the tandem scoped call time management. So we are supporting this in a slightly different way that we don't define the API for this capability. Instead, we use the way like Kubernetes schedulers does is that we have some essential point in our virtual cluster synchronization controller. So people can attend some other checks when we do pop provisioning. So by that, by having this, people can develop their own quota checkmate, okay, using their own CRDs. But that solution can be integrated with virtual cluster so that we can come with, work together with customized quota capability checks for the virtual cluster. From the KPM perspective, we are working towards speech to use class API style control plan provisioning model. But the challenge is that we're trying to leverage most of the code of the Kubernetes control plan deployment implementation. But while in KPM, there is one challenge that we have to mark the machine providing infrastructure because the virtual control plan doesn't need a dedicated note. So this requires some certain kind of implementation efforts on that. We are still working on it. Here also, so welcome anyone that are interested in this area so we can work together to make this work. Yeah, that's pretty much in virtual cluster. So, Ryan. Awesome, thank you. I'm moving on to Adrian. Huge milestone this year. HNC has moved to version 1.0. So naturally, what's coming next? And what are we still missing? Yeah, so we hit 1.0 earlier this year. And so we've been working on 1.1. It's a little slow going because as is always the case with open source, a lot of us are volunteers. And so the big feature that we've got planned for 1.1 is hierarchical resource quotas. And so these are drop-in replacement for regular resource quotas, but they do not apply to only namespace. They will apply to a tree of namespaces. And so that's the biggest feature. But we have had a couple of contributors join up. So the other big feature that is going to be added in 1.1 is what we call the allow propagate mode. So usually what HNC does is it copies policy objects from parents to children. So you put an RBAC permission in the parent, it'll get copied to the child. And what we have allowed in the past is this concept of exceptions, which is that, well, it will get copied to all of the children except that one. People wanted basically to invert that. They say, well, by default, I actually don't want to copy any of these things unless I specifically ask for it to be propagated. So that is a new feature that will be going in. That's probably all that's going to make it into 1.1. Beyond that, the list of features is actually getting fairly short in terms of future features that we want to add. And we'll be looking at things like startup latency improvements and stability improvements. Not that there are that many issues, but you can get some corner cases, especially on startup. The only other thing that we've been asked for from time to time is the ability to propagate objects that don't actually exist. So let's say that you want an RBAC object in all of the children, but not in the parent. We don't have a way to do that. That's pretty much the only feature request we've seen over the last little while that we simply haven't started working on yet. But beyond that, it does look like the project is stabilizing, which is good. HNC was not designed to be a project that would constantly grow and flower. It was supposed to plug a fairly narrow problem set, which is that it's hard to manage groups of namespaces. What is the minimal construct that we can add to that in a pretty solid and well-contained way that would allow us to do that? So we're getting pretty close to that with 1.1. And I don't anticipate there being a ton of changes after that. Of course, the more people who use it, we might discover that we were wrong. But most of the features that we've been implementing over the past year are things that people have been asking for over the past two years. We think we have a pretty good handle on what people want. And with that, it will be getting pretty complete. So I think that we're getting close to declaring success on HNC. Awesome. And to Phase Point as well, I'd say anybody willing to volunteer or contribute, reach out on Slack. Great. Moving on to Jim. Jim, you are also a big player in the policy working group and one of the founders of Kiberno. Can you kind of describe how Kiberno can maybe help people implement multi-density? Absolutely. Yes. So we talked about maybe about eight to twelve different types of resource configurations that are available in Kubernetes for both control plane as well as data plane isolation, some of the other things like API fairness. So the simple way to think about it is policy engines can be used to enforce or even automate. Like Kiberno has features from mutation as well as generation of resources. And you can validate for proper configurations. Right? So using policy engines in conjunction with HNC and conjunction with CAPI allows a lot of flexibility and power because now you can enable self-service to your tenants if they're internal tenants and dealing directly with Kubernetes and KubeCuttle. But you still have the right guardrails in place and you're automating all the required configurations. Awesome. Yeah. I'm a big fan of Kiberno and I use it heavily. So I appreciate it. Great. Thanks, Jim. And Tasha, over to you. As you mentioned, we've been working on these projects for almost four years. We've done a lot. What's next for the multi-density working group? Yeah. Great question. So when we all got together and I think most of the people on the Zoom were there in Seattle when we all met in a tiny office just to kind of talk about multi-density in Kubernetes and the working group. And it was really interesting because at that time we had a lot of questions coming to us from various parts of the Kubernetes community as far as are we going to need to change the API mechanics of Kubernetes? Are we going to need to do like a major refactor or really re-architect the way Kubernetes itself works in order for us to meet what users of Kubernetes are telling us they need from a multi-tenancy perspective? And so that was really what the working group was created for was to kind of dig into that problem and figure out what did we need to do and what potential re-architecture might need to happen and then how critical would that be? And over time, as we put a project plan together and kind of started to think about these elements of namespace as a service, control plane as a service, clusters as a service, and started to build out these projects with the virtual cluster project that's now become a cluster API provider and the hierarchical namespace controller. And then we now have an entire SIG devoted to just automating the delivery of multiple clusters and how do you manage that fleet? It became really apparent we didn't need to re-architect Kubernetes. Kubernetes fundamentals got people where they needed to go from a multi-tenancy perspective and basically adding this tooling. And we've seen other startups and other projects create very similar tooling, kind of following those same paradigms of namespace as a service or control plane as a service or cluster as a service. And so it kind of seems to be like the working group answered the question. We added some tooling to the community to be able to handle these different use cases. The major re-architecture at that time that we were wondering about seems like it's not required for people to be able to use Kubernetes to get that level of isolation that they need to do the jobs they're trying to do. And then with the benchmark project that Jim spearheaded, we now also have multi-tenancy benchmarks for folks to basically be able to run against their clusters to see, have I correctly set up this cluster for multi-tenancy? Have I left any sort of gaping holes anywhere? And then in addition to that, the working group also just in the past quarter or two came together to finally add official documentation to the Kubernetes documentation saying, hey, this is what multi-tenancy is, and these are the tools you have to address it. So it seems like the working group has really achieved what we originally set out to do. And, you know, as we kind of hit these big milestones with these projects that we originally incubated and now have graduated, it may be time for us to start thinking about winding down working groups. The multi-working groups in general in Kubernetes upstream are not intended to stick around forever. They're really intended to answer a question. And so we're kind of starting to feel like, hey, we've answered this question fairly conclusively. And so unless people really kind of jump up and say, no, no, there's more like here's something that we need to come together and work on, it may be time for us to kind of say, yay, we won and go tackle something new. Awesome. Thanks, Tasha. And with that, you know, that is our talk for Detroit 2022. We hope to see you on Slack. Thanks, everyone. Thank you.