 Tervetuloa, kaikille. Tällä kertaa, että olemme katsottaneet remote control planes, ja miten näitä on puhuneet tai muistaa komponentin, kuten Konectivityt. Ja tietysti olemme katsottaneet mitä se on ja miksi sinä olet katsottaneet niin kuin tämä. Ja sitten olemme katsottaneet yhdessä kertaa, että miten tehdään asioita. Olemme Jussi from Mirantis working mostly with our K-Zero's Cube distro. Hi, my name is Mod Kazem and I am working by Cubermatic and mainly working on cluster provisioning, cluster networking and life cycle. Glad to be here. All right. So as said, the kind of outline for our session is focusing first on what we actually mean when we talk about a concept called remote control plane. We'll have a look at a couple of different use cases. Where would you actually want to do something like this? And then what are the building blocks and concepts in Kubernetes and other components that you can actually kind of pull this off in a way. And of course we'll have a bit of kind of a history lesson too that what has been there and what is there now in Kubernetes. And a couple of real world integration examples on this. Of course as always we're standing on shoulders of giants here. So we're us as two we're not really the sort of inventors of this stuff. Mainly like more like happy users of what the community is building. What we wanted to have this sort of a talk is the main thing is that there's very little documentation and knowledge on this topic. So we want to raise the awareness that you can actually do something like this. Whether it makes sense for your use case or not that's a different discussion. So basic kudos to the all the original inventors and people that have been working on the different caps in the past and to make this actually happen. All right. Okay. So as what you say already spoke about we're not going to dive into the whole technical background of how we achieve things. So usually a Kubernetes cluster just looks like this. It's a master control plane component. It's a node and you have other worker nodes as well. Where you have your API server or the whole control plane components deployed on the same node and then your worker nodes just keep talking with that guy. Now this was working, still working. It has as what we say we have a bi-directional connection between the worker nodes and the control plane nodes. It's unrestricted so they can speak whether that's a requirement or not. It's fast since again there is no a lot of restriction in there. Reliable of course and secure. And often node local or you have they are on the same layer. Two networking probably and nice things are working. Everything is okay. Sorry. And what's really this whole is all about about remote Kubernetes control plane where you have the control plane components being running on a completely different probably data center or different node out of that subnet and use and those are quite hard to tweak and play with. It's really not that easy to have your ping pong connection going through your worker nodes and the master node or the control plane components. You need a lot of moderation configuration of tweaking and with that we have quite few challenges such as probably one direction since you have you will have some nap going between your control plane components and the worker nodes. It could be very restricted because again you might be hooking into a different network so there should be or I assume there should be some firewall rules and whatever load balancers that you were speaking with. You have a smooth connection in there. Slow of course latency as we said could be on a different data center, different subnet could be here in the US and your worker nodes are in Europe. Not sure if that's a wise idea but who knows. Hashtag itch. So let's reliable of course again network partitioning could happen between your worker and your master node. Probably insecure because there are a lot of plumbing between your worker nodes and your master node component. And of course that everybody don't like. I hope that don't like that there is a possibility that you are doing things over the internet and if not probably going to do a lot of tunneling into that guy. Exactly. So what kind of use cases that we are looking here at where you would like to go for remote cluster instead of just being a very happy user with these local cluster we spoke about in the beginning. So to begin with you have of course this the trust or so to say the segmentation where you have like the control plane components being on a different data zone or different data center where your worker nodes are. Next we have also like of course you need to do a lot of configuration and plumbing and this could be actually also something that where you isolate your control plane components away from your worker nodes so whatever happens there you're still safe in a way. It is also human error, I mean less human error because usually when you have those control plane components being somewhere else then your real work is being carried out when your worker nodes probably human wouldn't be or users in other words normal users wouldn't be able or shouldn't be able to get access on your main components or control plane components that easily. In addition to that you have also as I mentioned earlier the Kubernetes edge use cases where your workloads or apps that you are running is on a very very remote cluster meaning that it could be in someone else's house or in someone else's garage, car, you name it. And of course the last but at least is hybrid cloud the thing that hopefully everybody is now aware of where you might have some connections to the cloud providers but still you have some kind of somewhere dark where you still run your physical data centers and you like to achieve this kind of hybrid model where you have, things are on multi-cloud and other things on your data centers that gives you a lot of flexibility. And this is exactly what we are looking at in this image here. What we see here as you can see on the left side you have your control plane components being deployed somewhere and your workloads are completely isolated could be running really far away from where those control planes are and they could be even like co-located meaning that you could have one node or one cluster where your control plane components for different clusters are being in the same place. Of course you should have then you should take care of namespace isolation, things like that and for you, you could, if you are building a multi-tenancy application here and you would like to get a grip on your control plane components of course and clusters, you could come up with a very nice architecture where you can operate and run hundreds or even thousands based on how your initial architecture scales. Of course with that you have HA because eventually what you have on that isolated thing it's your game, it's your playground so you can really scale up scale down based on the demands that you have and of course you have the very same control plane experience across multiple clusters meaning that there's not too much context is being given to one cluster to another, you treat every single control plane as the same and you leave whatever that has to run on these clusters for the users and of course if you have you need some orchestrations for these control plane components and if they are existing in the same place and you have your controllers for example in one place you could easily operate in that without any, I hope, slowness or latency because you're probably speaking to the same edge to the same end point of course could be like more a lot of specifications into that and eventually it's Kubernetes as a service it's just, you know, you don't need to let your users think a lot about the control planes, think a lot about the Kubernetes is not that easy probably you know that already, networking is not that easy so you need to try to surround these bottlenecks and these obstacles as much as you can otherwise yeah have fun alright, back to UC alright, thanks so why is all this like a challenge I can run my APS server, schedulers all the control plane components I can run it on one box and then I can have a worker that connects to the API right what's the challenge there so the APS server actually has to talk to the in cluster components in many different use cases so when you want to do something like QPCDL, logs XYZ the API actually calls to the cubelet which then talks to the container runtime so API needs to be able to talk to the cubelet but remember previous pictures we had some NAT layers or firewalls in between so API just cannot do that in those test days there's few different use cases where API server expects to be able to actually communicate with the cluster network to, like service IPs pod IPs so when you do QPCDL proxy the API actually takes a proxy connection to the pod or service IP admission controllers which API will be able to call with webhooks which are on the cluster network so it's really this kind of from the API to the cluster communication that is the main challenge with these remote control planes it's not the part where you run the control plane itself that's super easy or well it's still Kubernetes so looking back in time which is always fun there actually was something for this specific use case built in the API server in the past so the API server actually supported SSH tunnels in the past but that was deprecated on version 1.9 for various reasons and after that there's been people like us for example building the same or similar capabilities using various different custom solutions like OpenVPN tunnels or some other other networking gimmicks, let's put it that way but then luckily there was as I mentioned we're standing on soldiers of giants here so there was this KEP 1281 was born in spring 2019 and that's really the kind of foundation on the actual solution and the kind of architecture building blocks that we can utilize nowadays to kind of punch through the firewalls network segments and all that stuff so basically the KEP 1281 lays out the architectural components and concepts how we can actually route these API server to cluster calls and different use cases via something that actually makes it work in these use cases so one of the concepts in that KEP is defining a concept of egress selector on the API server so it's essentially a config flag plus of course YAML so basically this egress selector on the API server side acts as sort of a routing mechanism that when the API server in these different use cases wants to talk to the cluster or to some other external components how do we route that call that it actually can reach the cluster whether it's a pod or whether it's a cubelet or whatever it is as you see in the YAML config example you usually have something the routing the proxying component running kind of as a site car is set up for the API server so the API server actually talks to a local unique socket and then there's something behind that unique socket that then routes the call towards the cluster there's various different types that you can configure for the egress selection so when you're having a type called cluster we mean when API server wants to talk to pods wants to get logs from cubelets when you do proxy stuff and those sort of use cases HCD well that's kind of obvious when API wants to talk to HCD you can actually route that even through this egress selection mechanism and then for the control plane you can configure how the API talks to admission webhooks admission controllers so essentially like our title says it's connectivity so connectivity component is kind of born through that KEP so the KEP is like many other KEPs in Kubernetes it's kind of laying the architectural foundations defining the interfaces how things work and it's really the same here too so the egress selection and how the API talks to the unique socket that's like the interface you can have basically anything behind it but connectivity is one of the SIG projects which is kind of the sort of like the reference implementation for this whole thing as far as I know it's actually the only implementation but still and I wouldn't want to write one on my own because it's actually quite complex if you look at the whole thing alright so how does it actually work and how do we punch through the firewalls and NAT layers so in this case when we use connectivity so it's divided to two different components a connectivity server and connectivity agent so the connectivity server is acting kind of like the routing logic kind of as a sidekick to the API server so the agent is running within the cluster on the worker nodes whether it's a demon set whether you run it as a deployment there's options and the agent actually opens up a connection to the connectivity server now we have a gRPC tunnel that we can use now when the API wants to talk to say to get logs of a pod API server talks to the local unique socket and behind that socket we have the connectivity server and now the connectivity server knows that okay I have this agent connected to me and they have an open gRPC tunnel so we actually use that tunnel to call to the agent basically saying that hey we want to get logs from this specific cubelet or well it doesn't actually say anything about logs but it basically says that we want to connect to this specific cubelet on this IP address so in a nutshell it's pretty much like reverse tunnels I bet many of you have used those in the past to fancy tricks and all of this is going on top of gRPC so it's secure because it's on top of TLS of course and it's actually in my first impressions it was actually even quite fast I was thinking when we built this it's reverse tunnels on top of TLS on top of gRPC oh god it must be slow as hell but actually it's pretty pretty slick some real world examples as you have already seen what Yuse showed the stack is quite interesting for the whole networking with that at kubermatic we leverage how connectivity do things and what you see over here is a true multi-tenant architecture that we run every day where you have on the right side these blue boxes that has user clusters those are simply a Kubernetes clusters that only run node applications so workloads these clusters there will be no node that's being marked as a master node all of them if you end up with a 3 node cluster and you do the usual kubectl get node you will only see 3 node clusters and none of them is actually marked as master because none of them are master what we do instead is on the left side we have these seed clusters we call them and there in that specific cluster we compact and collocate collocate the control pane components namespaced with all what you need it is completely isolated from what the user the regular user is seeing there is no way that they could communicate and see what's going on in that control plane namespace because it's already on a different clusters and of course there are a lot of RBAC rules for example you have a lot of network isolation so it's not really easy if you end up with a kubectl for the kubectl cluster on the right side for you you won't be able to see how these pods for the control pane components pods for example looks like they're not there, you only see what it matters to you you only see your applications your workload and this is exactly what we are doing we are leveraging connectivity on the left side on the seed cluster we actually inject the API alongside the API server the connectivity agent and there is a connectivity agent running on the on the right side on the user cluster side what happens then when you create a cluster we have some certain controllers that take care of the control plane are in place now all the components are now and then we spawn the machines machines gets worker machines only gets what they need and then user connectivity the agent for example receives a request it finds the right one as what you already mentioned and then the connection between kubelet on the right side and the API server on the left side will be there and what is quite interesting about this you don't have to run the connectivity agent on each node these nodes they would know each other they are observing about each other and they can forward the traffic or route it from one node to another very actively and a lot of observability is being induced so this is exactly how our so to say multi-tenant slash co-location for the control plane components work again right side user clusters or clusters in general there are no control plane relation there is like they are not there when you try to list them and this is a very real world scenario as what you already mentioned like in the past we only had to fiddle a lot with a lot of different applications for example openvpn server where you needed to manage this, manage that and run an openvpn client on the right side it was like a whole mess for us for us we really wanted just to have a reliable connection between the api server or between the control plane components and the worker nodes and call it a day it should be of course reliable, secure, fast and this is what we experienced so far with connectivity and that's why we like it on the left side as you can see also you will see a lot of containers like 35 containers that were only there for connectivity that's all, you need these containers to have connections between your worker nodes and the control plane components which was too much for us but on the left side down you see the openvpn client that eventually will connect and if you I mean openvpn server that's yeah, openvpn is nice but if you do it on that scale very automated sometimes it could be really hard to debug and know what's going on of course never to mention a lot of IB tables rules that we have to introduce to make that on the right side you see the happy path where we use connectivity we have one connectivity injected into our control plane component part up on top and on the lower part you already see that yeah we have only like two agents two pods, those are node demon sets they don't have to run on each node so you don't have to worry about having like 100 of nodes and taking care of agents being running on each one of those this is not the case here and all that done beautifully like the connectivity that you have between your API server or control plane components in general and your worker node is being smooth and secure as possible alright and there's a lot of similarities of course what I've been working on is the K0S distro and in our case we really built connectivity into the picture from day one and for us one of the main driving use cases actually was not this like network segregation or anything but the fact that who are who can deploy stuff on masternodes so if you take your traditional QBADM setup cluster where you have actual masternodes and worker nodes how do you control if Joe can deploy stuff on masternodes there's no R back for that yeah of course you can throw in some OPA gatekeepers what not but should you have to do that in my opinion no so that's why we landed landed on having connectivity in the picture and sort of as an added bonus we got the ability to have this real network segregation and that really really enables some super super interesting deployment scenarios where you run your control plane in a location X and then the workers are somewhere deep down in your bunker and it's enough that they just can call out to the control plane and everything works as in your traditional QBADM cluster that's super cool we actually manage the agents as a demon set to have also high availability for the connections well of course there's no solutions without challenges so as I mentioned and what we're actually giving this talk is lack of documentation and lack of general knowledge that you can actually do something like this the agent setup for the connectivity server and agents is actually a bit tricky how they if you have multiple agents and multiple servers each agent has to have a connection to many servers of course otherwise it's not AJ so it's like M times N issue so that's bit of a tricky topic debugging things is bit hard because there's something in between the call now so you can't do like TCP dump at least that easily on things and then when the connection tunnels are not working as expected you can't do something like QCDL logs the errors that you see on your CLI are super super weird you're not gonna able to figure out that it's actually somewhere in connectivity unless you've seen those errors too many times as I have it's like shooting in the dark so to say and also what I encourage everyone who's interested in this join the SIG that's maintaining this just there is a true lack of contributors so we're trying to do of course our best and but it takes time to land PRs and fixes and all that so join in and help all of us alright there's couple of minutes for questions and I guess we can hang around in the hallway after the talk thank you for sharing your knowledge is it possible to apply this principle to distribute the controllers of the root cluster I mean in this scenario the root cluster is the single point of failure can we use the connectivity to extend this idea also to the controllers thank you we can maybe quickly go back to the slide but let's try so what we have over there as we call the root cluster this is actually a component that would keep things sink down like from upstream to downstream to those seed clusters so it's just if you are not able to speak with the root cluster for some reason maybe it's not available I mean eventually it's one cluster as you already mentioned the real active work so the heavy lefters isn't the root clusters are the seed clusters so the seed cluster this is where your components are going to be co-located usually how we treat those you have like a seed cluster with multiple data centers across I don't know maybe multiple cloud providers as well and then what happens is when you lose access for the root clusters because in general when you access root clusters you will be accessing the seeds based on a lot of authentication authorization and if you lose access to that you can just simply connect to that seed cluster because by the end it's a seed it's a Kubernetes cluster that it has its own kube config and then instead of doing it from upstream to downstream you can just do it downstream like right away hook into it and then see what's going on over there so this is like the model that we kind of think makes sense for that so starting with the open VPN implementation and looking towards the connectivity implementation is there a migration path there that you were able to achieve or look at was it tear down the old clusters and the new ones just get connectivity I think it's pretty much tear down and build again I mean there's no there's no real kind of Kubernetes or any configuration in the API server that you can say that okay you have an open VPN tunnel here so you have to build it yourself so you have to rotate your machine at some point yeah this is pretty awesome what are the strategies that are in place or what are the thoughts on how to make this easier for dynamic environments things like API server pods changing or yeah bouncing around and having new IPs that you need to configure with all the agents yeah alright so that involves something kind of sort of an external load balancer general from Kubernetes point of view so in normal case you configure the agents to talk to a load balancer and basically the connectivity servers you configure that okay there's three servers running so what happens in the agents and this is why I said that HA is a bit tricky so what happens in the agents is that they connect to the load balancer and they take as many connections to the load balancers so that the load balancer does unique connections for the agents so it's kind of like a brute forcing the HA connections gotcha okay and then one other question is there plans to do this bi-directionally as well things like going from the Kubernetes service inside the cluster okay is that in plans or where is that as well there's an open longstanding open PR on doing the tunnel like so that the agent opens up the API port on the worker node and then goes through the same tunnel it's in the works but as mentioned we all can help to make that land and help in testing it and all that stuff can help also actually we have a question here okay one last question we are running out of time and I said I can hang out in the hallway if anyone has any more questions or wants to thanks actually which SIG sorry? which SIG would we engage with special interest group it's a SIG called SIG API server proxy okay something I should have had it and then the real question was what what breaks with this like things like service meshes and Submariner and other funky network stuff does this still work or do I limit myself by doing this I don't think you're limiting yourself at all doing this in what we've seen in K0s use cases what usually is besproken is the HA connections load balancer having wrong creation to have sticky sessions or something and those sort of things which are pain to debug yeah same here for us as I said you could run hundreds or maybe depends on your main infrastructure eventually the underlay and overlay infrastructure scales but mainly what we saw that the agent and the server of connectivity they run smooth as long as they are satisfied with the resources and I don't think that they need a lot of resources as we said some cases you really need to have as what you said already said a lot of redundancy like demon said running on all these notes and other cases you only need for example one deployment and you call it a day and with that as I said we are running this in production so this is a production this is not some kind of POC or demo and so far it has been working multiple clusters, multi cloud slash bare metal alright thanks everyone for joining something new