 Everybody, welcome back to the rolling technology track. And right now, we'll welcome Mikhail Fedosin and Michael McCoon for Kubernetes deep dive on Cloud Controller Managers. And Mikhail and Michael, the stage is all yours. All right. Thank you very much, Selby. Yeah. So, welcome to everybody. My name is Michael McCoon, and I'm joined by Mikhail Fedosin today. And we are engineers at Red Hat, where we work on the OpenShift Cloud Infrastructure Team. And we're going to talk to you about Cloud Controller Managers. So, take it away, Mikhail. Okay. Hello, everyone. And let's start our talk by defining what the Cloud Controller Manager, or CCM for short, is and why it's needed. In a nutshell, CCM is an optional component of Kubernetes that is responsible for basic interaction with the cloud platform where the system is installed. In a sense, this component came about as a result of the evolution of the Kubernetes architecture to allow the system to be deployed on different cloud platforms. It should be noted that historically, this problem was partially solved with so-called entry cloud providers. But this approach has a number of limitations that prevented from being used as a universal solution. And therefore, it has been deprecated. And in the medium term, CCMs is the only way to interact with cloud platforms. And each cloud platform should implement its own CCM, which is based on the cloud provider interface. And the goal of having the interface is to ensure that all CCMs in the ecosystem are integrating with Kubernetes in a consistent and extendable way. So let's take a closer look at what the CCM is. I took this picture of the architecture from the official Kubernetes website. And generally speaking, the CCM is a collection of Kubernetes controllers that embed cloud-specific control loops. There are at least four controllers, node, service, route, and node lifecycle. We'll talk about them later in this report, but in fact, these controllers were a part of the Kube Controller Manager or QCM for short. And now they are deprecated there. Then they decided to combine them in a separate component. We will discuss the reasons that led to this in a minute. And for now, we can know that the CCM is just a detached part of QCM that has cloud platform-specific controllers. And the way these controllers work has not changed. They monitor their resources and try to make the status match the spec. This is like the standard way of doing things in Kubernetes. Okay, next slide please. Now let's talk about how the out-of-tree cloud provider approach used by CCMs is better than the legacy approach using tree cloud providers. And also why adding new inter-cloud providers was banned in Kubernetes. To begin with, what are inter-cloud providers? And these are libraries or Go modules, if you want, which were used in KubeNet, Kube API server, and Kube Controller Manager. KubeNet needed it for cluster node initialization. Like setting of the region, the node is deployed into and the resources that it has available like CPU, memory, and so on. KCM as mentioned before had the controllers. And Kube API server needed it for the now deprecated persistent volume label admission controller. The inter-cloud providers were a part of the Kubernetes code and only supported the five blast platforms. AWS, Azure, GCP, OpenSecond, vSphere. The CCM, on the other hand, is an independent component and can support only one cloud platform at a time. From the developer's point of view, it's a compiled binary packed in the container that is installed as a standard Kubernetes application with either a deployment or demon set or bug, and the rest of the required resources. And it interacts with the system with Kubernetes API. Okay, now let's discuss what is wrong with the legacy approach. I think it's the obvious point that inter-cloud providers have their own dependencies on the relevant libraries to interact with their platforms. And this causes a huge number of potential problems when the number of platform increases. Like various version conflicts, sometimes lack of backward compatibility, increase of non-core system-specific code base, and so on. And eventually it was decided that Kubernetes would have no dependencies on third-party libraries for cloud platforms. But for example, Azure Cloud Controller Manager only has a dependency on the Azure SDK and nothing related to AWS or OpenStack. And it's the same for other cloud controller managers. The next question is the development lifecycle. And it's quite obvious that cloud providers are not logically dependent on Kubernetes. But with the legacy approach, they had to make releases at the same time on the Kubernetes core, which was extremely inconvenient. And the problem was also complicated by libraries for interacting with platforms. They also have their own release cycles. In this case, developers of the Azure Cloud Providers had to decide whether to use either a raw or old version of the library. For CCS, the development cycles are completely independent from Kubernetes, which makes our slides much easier. And another point is that all in three cloud providers and their dependencies had to be released under an open source license compatible with Apache 2 in order to be included in the Kubernetes repository. And for some platforms, this presented an artificial limitation. An example might be IBM Cloud, although they have now made it open source. Originally, they released their Cloud Controller Manager closed and for a private use only, which is impossible within three cloud providers. So yeah, all in all these are the main reasons that prompted the Kubernetes developers to CCS. Okay, next slide, please. Now let's take a look at how the architecture of Kubernetes has changed since we switched to CCS. As I said a moment ago, the Azure Cloud Providers were used by three components, Kubelet, Kube API Server, and Kube Controller Manager. This is clear, it just doesn't launch the corresponding controllers in favor of the CCM. It's the same for the Kube API Server that doesn't start the persistent volume label at Mission Controller. And it's actually their default behavior. To start these controllers in Kube API Server, you have to explicitly pass the Dash Cloud Provider platform name option at their startup. This Kubelet, everything is a bit more interesting and complicated. And as you know, Kubelet used in three cloud providers to correctly register nodes in the system. The system acquired the cloud platform API for some platform specific flavor region, and then created another source in the system. You can do it without entry cloud providers because Kubelet doesn't know how to interact with a platform anymore. But the same can do it either because it starts later when nodes are already created. And for this purpose, Kubernetes uses to step initialization of nodes. To enable this mode in Kubelet, you have to specify Dash-Dash Cloud Provider external to start up. At the very beginning, like the first step, Kubelet creates an empty node resource without adding any information from the cloud platform to it. And do it so it sets a thing to be a no-schedule effect that prohibits running any workloads on this node. Next step, the CCM is activated. It must contain a delegation which allows it to bypass the taint. And in other words, it will be the only pod on the node that hasn't yet been initialized. And once started, the CCM requests information from the platform, communicating with the platform API. It adds it to the corresponding node resource and finally removes the taint, thereby allowing workloads to run on this node. As a result, we have a well-working cluster. A separate topic here is container storage interface or CSI, but we'll have another slide on it and talk about it a little bit later. Now on the next slide, please. Okay, let's see what controllers were taken from Cube Controller Manager and are part of the CCM now. First of all, this is the node controller. In fact, its task is to pull the platform API and annotate and label the node resource with cloud-specific information, such as flavor or availability zone. Then service controller, which maintains services. And services integrate with cloud infrastructure companies such as managed load balancers, IP addresses, network packet filtering, target health checking. And the service controller interacts with the cloud platforms API to set up load balancers and other infrastructure components for a service resource that requires them. Next, we have the route controller. The route controller is responsible for configuring routes in the cloud appropriately. So that containers on different nodes in the Kubernetes cluster can communicate with each other. And depending on the cloud provider, the route controller might also allocate blocks of IP address of the pod network. And finally, we have the node lifecycle controller. It verifies the node health. In case a node becomes unresponsive, this controller checks with the cloud platform API to see if the server has been deactivated, deleted or terminated. If the node has been deleted from the cloud, the controller deleted the node resource from the Kubernetes cluster as well. Okay, next slide, please. And I want to say a few words about container storage interface. This topic is not directly related to the topic of this report, but we cannot ignore it. It's all because in three modules, all also contained volume provisioners for cloud providers. And external cloud providers do not support this feature because it's not a controller. So in Kubernetes, they introduce container storage interface that allows users to attach volume drivers for their specific platform. And currently lots of different CSI drivers are available. And I'm sure that you can find one for your needs. And the whole concept of CSI, container storage interface is similar to what we have with cloud controller managers. Kubernetes just provides an interface and people develop an implementation of this interface separated from the core system. It allows decoupling platform specific code, but it also means that instead of having volume support out of the boxing Kubernetes, these drivers have to be installed and configured additionally. Okay, Michael. All right, thanks, Mikhail. So now that we know a little bit about what cloud controller managers are, let's take a look into how to use them in your cluster and some of the issues you might run into. So the first thing I wanna talk about is the external cloud provider flag. And Mikhail mentioned this, but this is something that you're gonna have to specify as you start to set these in the current timeframe. So one of the things to know is that on your Kubelet, you're gonna have to specify this cloud provider external. Now in the past, you probably would have set this to whatever provider you were using, AWS or OpenStack or something like that, this will now set the Kubelet to not use its internal routines and use the external CCMs for that. Now the next part is the docs recommend that for the API server and for the Kub controller manager, the KCM, that you not set the cloud provider. But in practice, you can actually set these to cloud provider external currently and they will work in an external manner. Now there's some deep details here and it gets into at least for the API server, how some of the older code pathways are still operational when you set that into place. But these will be deprecated eventually when we migrate past this initial phase. So probably in the 1.25 Kubernetes range. Now the other thing is this is a common pattern to see on the new cloud controller managers. So you might expect that if you were running the OpenStack cloud controller manager, I shouldn't have to set the cloud provider flag. But because of the common ancestry of the framework for a lot of these CCMs, you'll see that setting the cloud provider is usually something people do, even though you don't have to. Then another thing to be aware of is this uninitialized taint. And Mikhail mentioned this a little bit, but this is new with the external cloud providers. So when you set the Kubelet to use the external cloud provider, it will set the uninitialized taint on that node with no schedule. And you'll have to be aware of that and make sure to set any pods that you need to run, that need to happen before the CCM, perhaps like networking pods, those will have to tolerate that uninitialized taint. So there's another piece of this as well, and that is the disable cloud providers feature gate. Now this isn't needed to be used in concert with the cloud provider external flag, but you can use it to disable the old functionality so that you can ensure that your cluster is running an external cloud provider mode. So by setting this disable cloud providers feature gate, the Kubelet and the kube controller manager will actually take different code paths and completely turn off the old stuff. This was introduced in 1.22, so it's pretty new and it will probably be in use until 1.25 when all of this will become the default and there will be no old, you know, KCM controllers running. So another part of this, like as Mikhail mentioned, these are binaries that you're gonna run separately from the Kubelet and the API server and whatnot. So you're gonna have to think about how will you deploy these into your cluster? You are responsible for deploying these, they will not be deployed automatically for you. If you're using an existing core provider, meaning one of the providers that already has an in-tree KCM or controller, it's a little bit easier because you can see that there's a binary that's already provided in the main Kubernetes repository that essentially wraps all of the old in-tree providers to new out-of-tree providers. And there's an example manifest in the core documentation about how to deploy these. Another question you're gonna come across is whether you wanna deploy these using a demon set or a deployment. And this will depend a lot on your topology and how you're creating your clusters. The default recommendation is just to use a demon set. But there are some cases where you might wanna use a deployment depending on how you want to organize the CCMs being deployed and perhaps the ordering that you want to have them in. And also how they respond to other services that look for demon sets or deployments. You only need to deploy the CCMs on the control plane node, so they don't need to be on every node, only the control planes. You will need a service account and cluster roles for these. And the CCMs, they'll need to look at nodes, they'll need to look at services routes, depending on which ones they are. So you'll have to set all that up. Don't forget that uninitialized taint because it will come back to bite you if you have things that you're expecting to run before the node is ready. And the other side of that is the not ready taint. Now, this is something we already experienced today, but that not ready taint will most likely need to be tolerated by your CCMs so that they can run while the networking is coming up for the cooblet. And then you'll also need to manage credentials for these cloud providers. Now it's very similar to the way you're managing credentials now. So you'll use the minus, minus cloud config flag and you'll pass some local directory within the container. And usually this is a mounted secret or a config map. And the contents of that will depend on which provider you're using and what that provider's specific information is. But it'll be largely similar to what you've been doing already. So now I'm gonna show a couple examples of how we run the cloud controller managers on OpenShift. And this is just a brief snippet of the deployment we use to put these into the cluster. And what we're looking at here is the AWS CCM. Now this is a very simple cloud controller. We just specify what cloud provider we're using. We tell it to use the service account credentials because with AWS this is all plumbed through for us. And then we tell it where we want the namespace for the leader election because by default that will be CUBE system. And on OpenShift we use a different namespace for this. So this is pretty straightforward and not too complicated to deploy. But let's look at an example of one that's a little more complex. This is the Azure cloud controller manager. And you can see first off we've got two separate processes running here. This is just the way that the Azure folks have decided to break out their controllers. And this is perfectly acceptable. So they have the main cloud controller manager and then they have their cloud node manager. And you can see that there are different options here and for the cloud controller manager it's a little more complex than setting up the Amazon one. So we specified the cloud provider is Azure. This time we're passing in a cloud config. You can see we tell it what controllers we wanna start and passing some networking information and whatnot. So this just gives you an idea that you will have to have different solutions or different deployment methodologies depending on what type of CCM you're running. Now, as Mikhail mentioned, volume controllers and CSI don't really fit into this. And I just wanted to mention again that the CCMs do not implement any of the volume controllers and the CSI out of tree migration is a completely separate process. It's got its own docs and its own enhancements. And if that's something that is kind of on your critical path then I would recommend looking into those documentation sources. Now, another concern here is how do I migrate my high availability cluster that might already be running and I might have a control plane that's running in some sort of replicated state. How do I keep up time and switch from the KCMs to the CCMs? Because this will need to be done if you're doing a live upgrade. And first of all, you should already be using a leader election. You should already be familiar with this mechanism because the CCMs as you deploy them will need to have a leader election as there could be multiples deployed depending on how large your control plane is. The second step here is that there is a leader migration mechanism that has been added to the KCMs and the CCMs. And this is to help you make the transition between KCM to CCM easier. And it allows the two to coordinate and do that handover. There's also a configuration element to this that you could use to change the mappings between the controllers. So if you've changed the names from what they might be considered as default then you could use that mapping to change how your binaries are recognized by the leader migration. And of course, this is a flag that you would have to enable on your KCMs and CCMs so that they're ready to do this transition. There's a bunch more steps involved in this migration and I would really point to the documentation here because they clearly spell out what you need to do and kind of what needs to be set up in order to do it. So, perhaps you want to implement your own cloud controller manager. Maybe you have some custom cloud behavior you wanna implement or maybe you're setting up your own infrastructure provider and you have a bespoke platform. This is not as complicated as it might sound. There is a cloud provider library in the Kubernetes org on GitHub at the cloud provider repo and it defines all the interfaces that you'll need to match in order for your controllers to run there. And it's kind of a multi-tiered interface where you have the top level that gives you things like load balancers and instances and whatnot and then as you descend into them, you get more detail. It also contains a sample reference provider. So if you're looking for an idea of how you can get your binary started that is there as well. Now, for the sake of reference you can go back into the main Kubernetes repo and in this command cloud controller manager directory you'll see the binary that was made to wrap the entry providers. And this was kind of one of the first steps towards making this migration is that the community wrote a binary that would just extract the entry provider to its own binary and could be run separately. So you need to specify the cloud provider with this binary because although it is separate it contains all of the providers from the entry inside of it. And it's also really useful to look at just to see how they're structured and how you might wanna write yours. And then as always for a great reference if you go into the Kubernetes org and look for a cloud provider star you'll see all the cloud providers that exist there and you can see how others have implemented this for their cloud. A great point of reference. So another thing that might come up and this is a little bit of shameless self promotion from our team is that you might want to automate the deployment of your CCMs. I mean, creating operators is a great thing to do in Kubernetes and these are some controllers that you need to automate the deployment of. And so what we did for OpenShift was we created an operator that can detect the platform you're on and then deploy the proper CCMs based on what's happening there. If you're only working in single platform environments though you probably don't need this type of automation because you're always gonna be deploying to the same platform. But it can be really useful in situations where you might have multi-platform deployments. And if you wanna see a reference you can see GitHub OpenShift, this is the link to it. So lastly, let's talk a little bit about debugging before we wrap up here because inevitably you will run into problems. The first thing I would say is be mindful of networking. Networking can really bite you here because the way the new uninitialized taint gets applied to the nodes, you also need to have your networking components tolerate that so that they can come up before that other pod does to annihilate to have the access to the network to do things like check for load balancers and nodes and whatnot. So really be up on your networking. Also, as with most things in Kubernetes the logs are gonna be your friend here. You're gonna wanna look at the logs for your CCMs to see kind of a rich output of what's happening with those errors. Then double check your deployment options. This for the Kubelet, for the KCM as well as the CCM depending on how you're setting these things up you're gonna need to make sure that you've got those cloud provider flags set and you're gonna make sure that if you're running the KCM to CCM migration you've got those leader migration flags set as well. You will most likely need a way to interact with the nodes that you're working on. So whether this is SSH or some sort of cloud provider console or maybe a keyboard plug directly into a rack of servers whatever it is, you might need to get into those machines to see what's happening at a layer that goes outside of Kubernetes. So you might have to see if, you know system D units are running, networkings running those type of things. And then lastly know your infrastructure provider as this is the lifeline between Kubernetes and the infrastructure you really need to understand like how the infrastructure provider works and where you might run into pitfalls. So with that said, we'd like to thank you and here are a few references that will probably be helpful on your journey. There's a couple of links to the main Kubernetes documentation about cloud controller managers. The third one is a blog post that really does a great job of explaining this and the last one is the cloud provider reference. And if you'd like to stay in touch and Mikhail and my emails are here, please reach out like share your woes, tell us your successes, ask us questions, we love it all. So with that, I guess we have a few minutes for questions and thanks everybody. So I'm not seeing any questions in the chat. So I'm guessing that we've answered everything you wanna know.