 Welcome. My name's Joe Betz, and today I'm going to be talking about Cloud Provider integrations with Kubernetes and the extensibility features around that. Couple of things about myself. So I'm an STD maintainer. I've been working at Google. I've been involved in Kubernetes for about four years. I'm mostly involved in API machinery. I've contributed to bringing custom resource definitions GA, bringing admission webhooks to GA, and more recently developing the server-side apply feature, which is slated to go to GA in the upcoming 1.22 release. More recently, I've gotten involved in SIG Cloud Provider, and I'm going to be talking about that quite a bit today. I've got a lot to cover. I'm going to start by talking a little bit about the Cloud Provider extraction project, which is a fairly large initiative in the Kubernetes community. I'm then going to switch gears and focus on Kubernetes extensibility features in general, and then start to work our way down to the more Cloud Provider-specific extensibility features. From there, we'll dig into how those features can be used to build Cloud Provider integrations and help move the Cloud Provider extraction project forward. I'll then finish up by talking a little bit about how this makes Kubernetes better, both for the Kubernetes developers and for the ecosystem at large. So let's get started. Let's start by talking about the Cloud Provider extraction project. This is a project that's led by SIG Cloud Provider, and it really comes from their core mission statement, which includes the goal of evolving the Kubernetes ecosystem in a way that is neutral to all Cloud Providers. To get a sense of what that means, let's have a look at the code. So on the left here, we have a list of the Cloud Providers that are in the main Kubernetes source tree. These Cloud Providers are directly compiled into the main Kubernetes binaries and are deeply integrated with the Kubernetes code base. On the right, we have the out-of-tree Cloud Providers. These are Cloud Providers that exist in their own separate source repos, are built out of their own binaries, and only interact with the main Kubernetes binaries through extension points. You might have already noticed that there's some overlap, like some of these Cloud Providers are in both, in tree and out-of-tree. There's a reason for this. Currently, a migration going on to move all in-tree Cloud Providers out-of-tree. So these are the source and destinations of that migration. When this migration completes, we will be able to delete all of this code out of the main Kubernetes code base. I haven't listed all of the Kubernetes Cloud Providers implementations here. There are many, many other third-party by-provider implementations that are really well-built. I have just kind of limited my list here to the ones that exist in the Kubernetes community-owned source repositories. Let's talk a little bit about the benefits of this. So one benefit's kind of obviously, we're gonna delete a lot of code out of Kubernetes, and that's gonna result in a more stable and maintainable core. A lot of that deletion is actually gonna be dependencies. The Cloud Provider implementations pull in a lot of complex dependencies, and we're gonna be able to take all of that out of Kubernetes. That's gonna result, of course, in easier development, but it's also gonna result in nice things like smaller binaries. Of course, this also helps us achieve that goal of having a Cloud Provider neutral ecosystem. And it does it in a nice way because what we're doing is we're up-leveling the extensibility of Kubernetes so that anything that the entry providers can do today will be available to all Cloud Providers. Lastly, this is a big benefit for developers in general. If you are developing Cloud Provider integrations, you're gonna be doing that in your own source repository. You're gonna have your own release process with your own cadence. You're not gonna be locked into the main Kubernetes release process, which is fairly infrequent. And we're seeing the benefits of this happening. We are seeing increased activity in extensibility point development. We're seeing increased activity around Cloud Controller manager infrastructure. The ability to build a Cloud Controller is getting easier because we're up-leveling that as part of the migration and all third-party Cloud Providers will benefit from that infrastructure as well. I'll talk about that more so we kind of get towards the end of the talk. This project has been going on for over two years in open source. Just recently we hit one key milestone, which is in the 1.21 release, a moratorium was placed that restricts any feature development from happening in tree. That's a pretty strong incentive for these migrations to be completed. And then in the 1.24 release, which should be roughly a year from now, all of the entry Cloud Provider code is slated to be deleted, which kind of brings this whole migration kind of to a close. I'm gonna switch topics now. We're gonna talk a little bit about Kubernetes extensibility. Just to kind of warm up, we're gonna talk about some of the core extensibility features of Kubernetes. We'll dive into some more advanced ones, and then we're really gonna get down into the kind of Cloud Provider specific capabilities that are available. We're gonna talk a lot about things that you would only really see if you were a vendor or a Cloud Provider adding functionality to Kubernetes. So let's get started. This almost goes without saying, but controllers really are the original Kubernetes extensibility feature. The idea in Kubernetes that you have this central component in your control plane that doesn't actually have any logic in it, and all of the logic is pushed out into processes that communicate to it via a common API. That's a very Kubernetes specific idea. And it's really shaped the Kubernetes API. The Kubernetes API is a full API. If you can use that API, you can do anything that can be done in Kubernetes because everything that we do in Kubernetes is done through it, right? Even things like scheduling of pods and garbage collection, those are all done through the API. This means it's a really powerful extension point. If you write your own controller, you can do pretty much anything that can be done. And the way that you set up controllers is really well thought out too, right? You can run a controller right there in the cluster. You can just tell Kubernetes to run your controller and you just provide it with a container image and we'll start it up for you and manage it. You can also choose to run into the control plane if you want to manage it that way. There's a lot of flexibility. Kind of the complement to controllers in some ways is custom resource definitions. I really like that phrase that I hear a lot, which is that Kubernetes is all about the API. It's a really powerful, complete API and the idea that you can take and add more first-class objects to it is really powerful and that's what CRDs allow you to do. And when you take and combine a CRD with controller, you get things like the operator pattern, right? And that's a huge part of what the Kubernetes ecosystem is, that you can customize Kubernetes to do almost anything. And let's look at the mechanism use. So for a custom resource definition, what you do is you add an object to the API that says what your new resource type is gonna be, right? And you specify the format of it and how it works and then you can just start creating those objects. We're gonna see these kinds of patterns as we dig deeper into extensibility features of Kubernetes. So let's go layer deeper. Admission web hooks. So dynamic admission control is kind of the idea that you can intercept a request, come into the API server and you could do whatever you want. You could reject it, you could modify it, things like that. Pretty powerful concept, lots of different applications to it. You can add security checks. You could add additional validation to a CRD. You could make it so that every time a certain kind of pod is created, there's some sidecar that's added to it. There's a lot of different things you could do with this. But it's more advanced and there's more sharp edges around it, right? Each web hook is part of that critical serving path of a request that gets the API server. With that comes a lot of responsibility, right? If your web hook fails, what's gonna happen? You have to choose. If you choose to have your web hook fail open, you're saying that if the process I'm serving my web hooks from can't be reached by the API server, then we should just accept the request in, which means my web hook wasn't run. If I fail close, then what I'm telling the API server is that if my web hook process can't be reached, that the whole request should fail. Well, that's good in terms of like security. We're not introducing a security hole, but we are potentially causing the entire control plan to go unavailable if my web hook becomes unavailable. So there's a lot of responsibility there in web hook development. Also, web hooks are part of that critical serving path and so their latency really matters. If you add a bunch of mutating web hooks that are slow, you're gonna add a lot of latency. And in fact, you can reach a point where the Kube API server will time out web hook. It's web hook handling requests and then it will fail. So you have to keep your web hooks fast. Adding a web hook is a relatively well-defined process and it's kind of like adding a controller, right? You can just run it in the cluster using existing Kubernetes constructs and then just register with the Kube API server where it is and it will talk to it. And that's really easy to do too, right? You just add a Kubernetes object that explains your web hook and you're all set. So this kind of fault, you're kind of seeing some patterns here, right? Running binaries as dedicated processes and then configuring how you communicate with them as a way of doing extensions. The next extension I wanted to talk about is a little less commonly seen and it's called the aggregation API server pattern. So what you do here is you tell the API server that for some API endpoints, like for some URL path, don't actually serve it from the API server, but instead forward their quest to some other system and you get to decide where that system is and you can control exactly how it works. You can have it still act as though it is an API server and have it use Kubernetes object formats or you could have it do something completely different. It's entirely up to you. It's a really powerful extension point and can do a lot of different things. The way you use it again, fairly similar, right? You can just run your process either in cluster or in the control plane and then you configure the API server to tell it where your extension server is. The next couple extension points we're gonna talk about are a little more low level. So typically you would be a vendor or a cloud provider developer that is adding your system to Kubernetes. The first one is the container storage interface. So this allows you to extend Kubernetes to support vendor or cloud provider specific volume implementation. And the way that this one works is that you need to add something called the device plugin binary to every node. And then the cupola is gonna talk to that. Well, one way you could do that is through a daemon set. You could use a daemon set and you will get a container running on every node and this works just fine. So you could deploy your volume support onto the cluster that way. And then the way that this extension point works is that your device plugin, when it starts, it's gonna register itself with the Kubelet by sending a request to a socket at a well-defined location. And then once that happens, it's gonna tell the Kubelet where its socket is and the Kubelet is going to then start making requests to it for container-related operations, okay, sorry, volume-related operations on its socket. And so now you have a communication flow and everything's all hooked together. The container network interface or CNI is somewhat similar. Here again, we're adding a process to the node. This one though, you have to configure at Kubelet startup. So you provide flags to the Kubelet telling it that it's gonna be using CNI and where the config for that is and where the binary is. And then what happens is when the Kubelet needs to set up some networking, it's gonna exec this binary as a sub-process and communicate through it via standard in and standard out. This works pretty well because typically you don't need to run this process for very long. Like it might configure some IP tables and then shut down and you're done. And so it tends not to be running very often and so kind of executing as a sub-process ends up being fairly low cost. The last of these container interfaces is the container runtime interface. This was introduced back when Docker was still the only way that communities can run containers. And it was introduced as a way of allowing alternate container runtimes. This one, the way it works is you provide flags to the Kubelet on startup, telling it what sockets to communicate to for container runtime and for image service management. And what the Kubelet then does is just send requests to those sockets. It's expecting that service to be running when the Kubelet's running. So you cannot run this service with a daemon set. You really do need to set this up the same way that the Kubelet is set up, right? Part of the problem here is if you try to use a daemon set, well, what a daemon set is gonna do is it's gonna pull an image to run a process and then it's going to start that container. Well, what you're trying to set up here is the infrastructure to pull images in a running container. That's a bit of a chicken and egg problem. So typically here, if you provision a node by using system D to start the Kubelet, you would probably also use system D to start that CRI service that you need. So now let's talk about some of the new extensibility features that are available. We're gonna talk about kind of three different areas. We're gonna talk about networking between the control plane and the cluster. We're gonna talk about container image credentials and we're also gonna talk about the more general problem of how do you run cloud specific controllers? So first, networking. So one limitation with Kubernetes is that it is not secure to run the control plane and have it communicate to a cluster across an untrusted network. That is not something that is securely supported by Kubernetes and it's generally problematic because you're gonna have to somehow let open ports to every node in your cluster on an untrusted network so that the control plane can communicate to it. Even just managing those firewall rules would be a real nightmare. So what this extension feature is, it's called the API in server egress is the way that you can have the API server talk to a proxy which then is responsible for talking to your cluster. And so you can basically introduce a single communication channel from your control plane to a cluster and then you can go ahead and secure that as you need. And I'll talk about like a de facto implementation of this is pretty good. And the way that that extension feature works is that when you start the Kube API server you provide a configuration file and that configuration file is a Kubernetes object that says what the redirect rules are. So you can both redirect traffic to the cluster so that the control plane can talk to all the nodes. You can also redirect traffic to other components of the master. So for example, the Kube API server needs to talk to EtsyD you can proxy that communication as well. There is a pretty good implementation of this extension point called the connectivity server. It uses MTLS over TCP for everything. So it's a pretty standard. It's easy to set up a single firewall rule. You get that one connection. It's bi-directional. So you're both your control plane talking to your nodes and your nodes talking back to your control plane all can go through that same system and everything's along that whole communication chain. You have MTLS and side certs all the way across. For container image credentials, this is an interesting problem. So if you're building a cloud provider typically your developers might have a container registry which might be private, right? So for example, if you're running you might have an account in a cloud provider as part of that account. You have your own container registry which is private to you and you can upload images to it and do everything you wanna do. You also need the credentials to that container registry to pull the images onto your nodes. So to have a really nice turnkey cloud provider implementation of Kubernetes that should probably all be integrated together and that's what this extension point allows you to do. You can tell the KubeLit where it can load credential providers from and then you can provide a binary that does whatever you need. This uses that exec sub-process pattern again where the KubeLit whenever it needs credentials it's gonna start a sub-process and then talk to it through standard and standard out when it's done it will shut it down. It's not that frequent it's gonna need to new credentials. So this works out pretty well. You configure it by telling it kind of a list of different images and where it looks up credentials for each of those. And then once you've got this all set up you should be good. A good example of this is like for example in GCP every image, every VM has a metadata endpoint and so those can be loaded up with credentials and then the credential provider can just grab the credentials from that metadata endpoint and provide it back to Kubernetes. In a different environment you might get credentials from somewhere else there might be some service that you load them from or some other way of looking them up. You can do any of those with this extension point. Finally, the last thing we're gonna talk about is cloud provider specific controllers. So in the main Kubernetes code base in the entry implementation there's a bunch of controllers that do cloud provider specific stuff. That includes nodes related controllers. So creating nodes and deleting nodes often means creating or deleting VMs. It can involve volumes because there's a lot of like cloud specific storage systems involved and involved routes and services where load balancers are involved or lastly it can involve like IP address management where IP ranges need to be allocated or provisioned for various things. All of those controllers in the entry implementation were activated when you set flags on the Kube controller manager. You would say which cloud provider you were using and then you would try some additional config and the right set of controllers would be run. In the out of tree architecture the Kube controller manager is not gonna have anything cloud specific in it. So when you run it, these controllers will not be run. Instead every cloud provider is gonna have its own cloud controller manager that it provides us a separate binary. When you start that you'll provide a bunch of cloud provider specific flags and it's gonna be responsible for running those controllers. This is a pretty big change, right? Right now if you're running any of those in tree cloud providers you're only running one controller manager. By the end of this migration there's gonna be two controller managers running. So in order to achieve that there's like been a couple of things that's happened. One is that there's been a bunch of improvements around controller manager infrastructure. This makes it easier to define an author new controller managers and wiring all the flags that you need. And also very importantly it's added HA support for migrating controllers. So if you are running a Kubernetes cluster that is mission critical you may be using a HA control plan which means you're running multiple replicas of every component of the control plan. The API server, your scheduler, your controller manager everything you're gonna run probably like in triplicate. And if one of them goes down your control plan is still up. Or if you need to do a rolling upgrade you can upgrade the entire control plan without any downtime. We need a way to do the cloud controller migration moving those controllers, those cloud specific controllers from the entry architecture to the architecture without any downtime. This is a little tricky to do and I'm not gonna get into all the details but it was a really interesting, there's a really interesting solution that is being rolled out where we use multiple leader election leases and or locks, sorry multiple leader election locks. So basically what happens is the Kube controller manager claims two locks. One for all cloud provider agnostic controllers and then another for the cloud provider specific controllers and then over time what will happen is in the subsequent release the cloud controller manager will start to run and it will try and claim just the lease for the cloud specific controllers. And then once you release a version of the Kube controller manager that doesn't have cloud controllers in it anymore then only the cloud controller manager is gonna be able to claim that lease. And so you get this kind of series of operations that happen where you hand off ownership of those controllers from one system to another but at any given point in time you still have the same guarantees around controller management which is only one controller is being run actively at any given point in time but there are still backups that can take over if that one shuts down. The last thing I'll mention about controllers is that if you are doing cloud provider development something like Kube builder is a really good option. We're doing a lot of improvements around the core controller manager infrastructure but a system like Kube builder if you are building a lot of new controllers is a really good way to do that. Our team has quite a few engineers that work very closely with Kube builder and it's a really solid system. That's it. The thing I'll kind of end with for everyone here is that when we get to a point where we've really moved all of this kind of cloud specific and code out of Kubernetes we really have this cloudless Kubernetes core. It has less code in it, less cloud specific stuff in it but more extension points. It's a more versatile, more tightly knit system and I really think that is gonna help with the longevity and health of the Kubernetes code base. I think it's going to start to build an increasingly vibrant ecosystem. I'm really excited to see like what other extensibility points people find that are needed. If you do find that you have a gap please come talk to the cloud provider SIG and we're very interested in hearing from you. To get involved you can come to our Slack channel or our mailing list and we meet regularly every Wednesday at one PM Pacific time. There's a lot more details on the link here to our SIG page. Thank you and I will open it for questions.