 Okay, I think we have quorum and we can start. I got the thumbs up in the back. Great. So before we get very far along, I have to give you an update. My co-speaker, Michael McCoon, couldn't make it at the very last minute to be honest. He got COVID and is doing the responsible thing and staying home. His topic as a co-speaker was going to be more about the security posture of running connectivity proxy in the Kubernetes cluster. And my material was going to be more about the experience I had in GKE running connectivity at that scale. So we've sort of tweaked slides around to essentially focus mainly on my content. So if anybody feels strongly bait and switched, I apologize. So getting started. Here's the structure of this talk. I'm gonna start with a generic overview of the problems solved by connectivity proxy and how it works. And then I'll go into, as a maintainer, and this is a maintainer track talk, I'll go into my personal experience in GKE using the proxy. Okay, so just to make sure. So just for starters, connectivity is a TCP proxy for QBAPI server running in the control plane to talk to cluster resources on worker nodes. This extension is associated with the API machinery and cloud provider SIGs. It utilizes a server and agent topology. It's a forward proxy. You'll see that later. And it's a specific to API server solution. It's not meant to be a generic proxy. Also, I wanna mention, this presentation will attempt to avoid covering too much repeat information with some previous CUBE contacts. In 2019, some of the designers and implementers of connectivity give a nice presentation of more of the inner workings and implementation details. So I won't get too into the depth here. And then just last year, a couple of presenters talked about some interesting rich use cases that they had found for connectivity. Yeah, they had a nice talk where they talked about remote control planes was the term they used. So I won't talk too much about the variety of use cases. So getting started. The diagram you see illustrates the typical divide of a Kubernetes cluster. On the left, you've got the control plane components. And on the right, you've got, there's usually one or more worker nodes running cluster resources. Most control plane traffic is client-based and destined to the API server. Even most control plane components, like the scheduler, controller manager, et cetera, are API server clients. Etcd is the primary exception to this rule. In a cluster with separate control plane and cluster networks, it's common to firewall off everything in the control plane, except API server, and make it available to the cluster, typically behind a load balancer. Some, the errors from Keyblit and Pod, I'll mention what they are. Some primary examples of node to control plane traffic include node registration, performed by the Keyblit, or a pod-based workload, calling the Kubernetes API. But, API server needs to, some core Kubernetes features, require that API server have network access to the cluster. Significantly, web hooks and aggregated API servers are core Kubernetes features that are common to run in the cluster on workers. So, this illustration shows the other kind of arrow flowing outward from API server. It's common to have local auth, authorization endpoints, as well as local web hooks in the control plane, but it's also common for the API server to call out to web hooks, aggregated API servers, also Keyblit, pod, node service in the cluster. When, by default, API server, when it wants to communicate with the cluster resource, it directly dials the relevant IP address. This default behavior assumes that cluster resources are routable from the control plane to worker nodes. Kubernetes clusters with a network split cannot assume that cluster resources are routable like this. That network split is the problem that connectivity proxy solves. And just as a side note, some of you may notice that a web hook, oh, I already mentioned it, you can have web hooks in both control plane and workers, and we might get to that later. So, connectivity proxy, this design introduces the connectivity server running in the control plane and made available to the cluster. In the case of multiple control plane nodes, the connectivity servers are assumed to be put behind a load balancer. A connectivity agent is introduced in the cluster as a system pod, and on startup, it immediately initializes a long-lived GRPC connection to the server. The GRPC connection is secured with MTLS and the server authenticates a given agent via API server token review API. Finally, once the connectivity server and agent are installed and initialized, API server can be restarted with new configuration that enables it to use the proxy. API server has private config called egress selector configuration. When it is used, API server looks at the kind of resource it is accessing and according to where that resource is located, a proxy aware network dialer is used. It is also possible to use the proxy with traffic to SCD or to the control plane, but in this presentation, we're focusing on the cluster. All the black lines in the diagram are normal TCP connections. The dotted lines are proxy data connections and control packets that are multiplexed over bidirectional GRPC streaming. The two orange arrows are the GRPC bidirectional streams brokered by the connectivity service. As a concrete example, let's say a user types kubectl port forward to a pod. API server requests the proxy system to dial a new TCP proxy connection. Control packets arrive at the agent which opens a normal TCP connection to the pod for the final segment. API server and the proxy components will continue supporting a full duplex and to end TCP connection until the user closes the connection. And you can try it out. This entire solution is fully open source and there are instructions at the link to play with it. So just briefly, I'll touch on the topic of security, which was gonna be Elmiko's topic. There is a historical class of security issues like the one shown that basically involved confused API server thinking it's dialing a cluster resource, but it's actually not. When API server runs with any cluster proxy configured implementation, traffic is effectively constrained into the appropriate trust segment, namely into the cluster. Even though known issues are patched, routing control plane to node traffic through a forward proxy enhances the security posture against possible future issues in the same class. So now to the topic of running connectivity on GKE. First, why did GKE need connectivity proxy? Historically, for a class of GKE clusters that we might call public clusters, we used SSH tunnels, which was deprecated in Kubernetes open source way back in like version 1.9 and ultimately totally removed in 1.22. So GKE needed a replacement for such clusters before then. More recently, clusters with private nodes have historically used VPC peering directly between the control plan and the worker nodes, but there's been a class of problems around that that are solved by a newer cross cluster topology that uses Google Private Service Connect. GKE is unifying around clusters to have that, and that link is to a blog post that explains that in further detail. So how does GKE run connectivity? First of all, GKE uses GRPC mode. There is an alternate HTTP Connect mode, but we chose GRPC because it has the promise of better multiplexing and performance. One note on that is that it only affects the first tunnel segment. It does not affect the tunnel segment between the server and the agent. Another interesting difference is we run the agent via a deployment rather than as a daemon set, like the example manifests. It's a more efficient use of resources, not just CPU and memory on the worker nodes, but also pod IP space. It would be too expensive to use an unnecessary amount of pod IP space. There's also a scalability limit that remains in connectivity. Cube API server can handle, in GKE can handle like up to 15,000 worker nodes all registering, but connectivity server does not have the same scalability, and it can't handle more than a smaller number of agents initializing that GRPC connection. So we avoid that scalability limit. At the scale of GKE fleet, everything has to be automated. All cluster maintenance events that may apply connectivity as part of a migration do so in two steps. One step to provision the tunnel portions, and another subsequent step to enable API server to use it, and this is really critical because if the tunnel is not fully ready for due to a variety of factors, we don't wanna take the second step, which could break some control plane functions. So what have been some challenges of running it? First of all, in 2021, when we first launched, it was clear that it's within the user's control to break it, which is actually pretty noteworthy because the appearance defect is part of the cluster's control plane function. It's not always obvious that KubeCuttle logs, if it fails to dial, was due to something that the user did on the worker network. In 2022, after SSH tunnels-based clusters had been migrated to connectivity, we did encounter some new and uniquely painful issues. In one case, a thrashing cluster workload included exercising control plane egress at such a high rate that the connectivity components became overloaded and it had a cascading effect. In another case, a cluster got into a deadlocked state due to a broken user-installed webhook that I think happened to be a gatekeeper webhook, but that's not important. But the critical point is that it matched all API groups and it was broken at some point. Note the circular dependency of the control plane needing a working tunnel to access the user webhook, but the tunnel is never able to initialize due to the agent pod's failing admission webhook. So it's a deadlocked cluster. And as noted, these are fairly difficult and painful to debug failure modes. And binary version skew, particularly across API server to the connectivity binaries is an extra headwind. So what did we do? In 2022, we had to invest further in restabilization. We also took a hard look at, there's always the possibility of replacing the reference implementation parts with a different design using the same API server extension points, but in the end, we were able to harden the existing connectivity service to meet our reliability expectations. In 2023, we have seen the fruits of those efforts and connectivity adoption is increasing as part of private service connect unification migration. So what's next? There's an active open source community that meets bi-weekly focused on connectivity, both maintenance and the path to GA. There are some interesting unsolved problems. On the connectivity side, for example, there's a proposal to simplify the proxy connection identifiers, but it would be a significant protocol change. In the broader control plane egress space, there is an open question of how best to support finer grained use cases, for example, identifying subsets of the cluster to dial. See the links and resources in the PDF if you're curious to learn more or want to get involved. And although we're pretty, we haven't used all that much time, we have plenty of time for Q and A. Please, let me go back to a diagram. So this diagram shows cube cuddle, which is assumed to be external to the cluster. Yeah, so it would be going over, there's an assumed publicly accessible, but secured endpoint to the cluster's control plane. In GKE, it sort of depends on which configuration you've chosen. It is the same kind of client though, as maybe back further in this example, where that pod is calling the Kubernetes API as an internal cluster case client in a fully private control plane. Then you would only support this pod use case and you would not necessarily support this direct cube cuddle client. Oh, I see. Well, partly to answer that question, the general answer I would give is no. In the connectivity proxy enhancement working group, there has been a discussion to support routing traffic in the reverse direction from outside the control plane into the control plane. But that was never in the original scope of the enhancement and that work, which even had some prototype underway, was never completed and merged. So at this time, connectivity is only a proxy for API server to use to route traffic to one of those three canonical locations, at CD control plane or cluster. Yeah, no problem. Any other questions? I was wondering if you could share, you had mentioned one of the challenges you had faced is like customers making a firewall rule or just like making their worker nodes unschedulable or not allowing the connectivity agent to schedule their worker nodes. How do you guys solve that? Yeah, good question. So let's see back here. So this is an example of that. If a user had very deliberately tight network egress deny rules or if they had the right kind of taints on all worker nodes and there are even other things like bin off as he is one, they can prevent the Google managed system pod connectivity agent from running. Part of the answer is that that's still a remaining headwind in GKE, but for one thing we have user guides that sort of discourage the kind of things that could break a system pod. And we have network firewall rule troubleshooting guides that mention, oh, you know, by default, network egress is allowed. But if you lock down egress by default and only allow QBAPS server for 443, you need to also allow connectivity server port 8132 for your cluster to be fully healthy. Thanks, makes sense. Yeah, the shorter answer is no, connectivity is not aware of a great number of things about the cluster that it could be aware of, such as data plane network policies. In fact, I glossed over it, but there is an assumption with connectivity default implementation and design that a given connectivity agent will be able to dial anywhere else in the cluster, which may not always be true. And there is a configuration of connectivity where if you can run it as a daemon set, and if the agent can provide some hints, like what IP address ranges it serves, then there are connectivity server configurations where a routing strategy can have the proxy server pick the exact agent running locally where the dial target is. But in the default out of the box configuration, there is an extra interesting hop from the agent to the pod. That might be across nodes in the cluster. In terms of policy, API server is really the only decision maker in this entire design. Everything downstream from API server is purely a TCP low-level network, like TCP over GRPC streaming network tunnel. Yeah, sure. The scalability limit. Some of the tuning parameters involve like, I think the connectivity agent has a keep alive, and there's also, there's a part of the protocol that the agent performs in order to support the number of connectivity servers increasing and decreasing. And that could happen if you wanted to change the number of control plan nodes to make your cluster more, like either horizontal scalability or going from single to fault tolerant cluster. The agent supports that dynamic server count change by sort of continuing to try to connect through the load balancer until it connects to all the full count of connectivity servers. It learns the count or number to connect to to know when to sort of back off via response headers as part of that initialization. But even when it catches up to the number of expected, it's still on a slower basis, polls just in case the number increases, right? So depending on the tuning of that poll loop, you can run into, I think it's an easily solved problem. What you'll run into is connectivity server itself is a Kubernetes client for that token review API call. And the default client throttling just does not have any sort of pushback. And we sidestep that scale, like so if you hammer it with enough agents at a rapid enough, then you can basically get queuing there when a more GA implementation would like load shed or pushback somehow. And it just hasn't been done. Frankly, in GKE we did the analysis to see like, okay, how much connectivity infrastructure do we need to run? And how is it related to the size of the cluster? And one interesting data point was that the clusters in the entire fleet that had the highest use of control plane egress like were like one or two or three node clusters. And what was happening is they just had like broken controllers, busy looping. And so the scalability had nothing to do with the size of the cluster. So we basically just scale enough agents to meet any use case that we had. Like we allow no more than 100 because in our manual testing, we saw that at 100 for the tuning parameters we had used, that's where things started to queue up. But for most clusters, the number is a lot less. No main sockets. I think we do use a Unix socket. So there's two details to how API server can communicate to connectivity server. One choice is whether to use a Unix domain or like a TLS connection local. And we used Unix sockets instead of the other. And then there's a protocol choice between whether to use GRPC or HTTP Connect. And the advantage of HTTP Connect is that it's an open standard and you could use that mode, that egress selector configuration mode if you wanted to completely replace the reference implementation with your own implementation. Using GRPC mode sort of locks you into really implementing the specific connectivity proto message API, right? You're welcome. I love questions. More questions. Thanks for the talk. I'm Tomer with Sony. So what's gonna be the next step or what's missing for the connectivity? Because it does seem to be pretty much done for me. I may be missed it. Interesting. You know, what's next? If you go watch the 2019 talk, there's further detail into sort of maybe what more advanced features and what could be next. I sort of agree with you. From my perspective, we should just focus on really locking down what appears to me to be a stable, basic dial flow and then call that GA. But one, in the community, part of the path to GA that we're discussing, we need to have a better regression test combination and matrix that we'll test across a version skew. Right now, all of our testing is kind of unit testing within a repo. And so we don't currently have regression tests against a Qube API server with a compiled in connectivity client library at an older version with a newer connectivity server. And there's also theoretically a connectivity agent. So you've got three different binaries that you could have version skew across and the test matrix combination against all three. So there's that. Another gap. I think I already mentioned there's a desire to simplify the protocol, what a proxy connection identifier is. Right now there's kind of a two-step identifier. Pending dials have a different distinct identifier and established connections have like a different one. And there's a proposal to have it be the same identifier for the whole lifetime of the dial flow, which I would personally like to see. Another gap that we would like for the path to GA is that the GRPC connection here on the diagram from the agent to the server today is truly multiplexed. So one agent GRPC connection is multiplexing any arbitrary number of proxy TCP connections. But the GRPC connection from API server to connectivity server is actually single use, which is very expensive. My understanding is that during prototyping they had a version that was truly multiplexed. But unfortunately when HTTP connect mode was added, they kind of couldn't get both working and due to some changes related to that, making it abstract, they reduced the GRPC code path to single use. So today's implementation is greatly less efficient than it could be. Although I have to say in my experience in GKE that's never really been a problem. Okay, thank you. If there are no more questions, then thanks for your time. And I hope this was useful.