 Greetings, I'm Sean McCord, an engineer at Talos Systems, and today we will be talking about crossing the boundary, running hybrid Kubernetes clusters with WireGuard. The whole idea behind Kubernetes is to abstract away the infrastructure and focus on your applications. This works brilliantly for deployments which comprise only a single physical location, but when you want to span out to multiple sites or multiple networks, suddenly everything falls apart. Why should you have to care where your infrastructure is, when the infrastructure itself is being abstracted away? Even cloud providers don't solve this problem. Why can't you span a Kubernetes cluster across regions? Some cloud providers don't even let you span across availability zones within the same region. Almost everyone wants a multi-location deployment for some reason, be it disaster recovery or high availability, performance or localization, perhaps mixing on-demand high-cost resources with fixed low-cost infrastructure. Will you point of use, point of source requirements, for instance, cash registers, data collection hardware, physical storage, or just simply to avoid vendor lock-in? Some examples from our user base. A voice systems provider with high volumes and low margins needs to get the greatest value for his core workloads, which mandates a bare-metal infrastructure. However, they also need to be able to scale quickly out to the cloud when there is a sudden rush, such as during emergencies or popular events. A large retailer needs to manage local compute resources for point-of-sale equipment, but their core applications and database run in the cloud. They want to have management-free resources in the store, controlled entirely by Kubernetes in the cloud. A large public transportation company has a number of mandates which require them to use specific compute resources, store rider data in specific places, and have a large number of point-of-use display systems which all need to be tied together. Everything is built on Kubernetes, but they don't want to have to manage multiple hundreds of clusters when each site only has a very few nodes. So while there is a lot of demand for this freedom, the problem is that current off-the-shelf solution set is... Now this isn't to say that it can't be done. Ultimately, the requirement for most CNI plug-ins is that each node needs free and direct communication to each other node. This can be achieved in a number of ways. Full native IPv6, though good luck with the universality of this option, particularly in cloud environments, and make sure your firewall works. All nodes with direct public IPv6 addresses, just in case you have an unnatural number of free public IPv4 addresses. VPN solutions? These are frequently difficult or expensive in the cloud, and they may require extra tooling or maybe even extra hardware. Mostly, these require some amount of external coordination, which is difficult or expensive to achieve universally in many environments. Ideally, each node should be able to securely and directly communicate with each other node, regardless of where it is, regardless of what network it's on, and regardless of the external features available to its location. This sounds like a job for WireGuard. Unlike IPsec before it, WireGuard seeks to provide secure transport between two endpoints. Unlike IPsec, it does this with much more standard tooling, is vastly easier to use, and is even higher performance while offering better security and more modern encryption algorithms. Mostly importantly, WireGuard is an efficient protocol for full mesh systems. While traditional VPNs scale poorly to full mesh deployments due to their overhead, both administrative and operational, WireGuard uses a very simple and efficient mechanism for managing a large number of direct peers. So if WireGuard is so great, why isn't everyone using it already? The main difficulty with WireGuard is with key distribution and peer discovery. With a highly dynamic system like Kubernetes, where nodes should be treated as cattle and not as pets, individually collecting and maintaining and coordinating communication between them is cumbersome. There are offerings, such as TailScale, which make this easy for desktop and mobile devices, but the market for this on the server side is as yet fairly small. There are a number of new WireGuard native CNIs popping up, but none is quite ready for general use, and mostly they rely on externally, usually manually, driven systems for coordination. TALOS systems is building TALOS, the Kubernetes OS, as well as a great deal of tooling to automate and manage large and disparate sets of compute resources on bare metal, on premise, and in the cloud. Our core product is an extremely lightweight, read-only, image-based Linux operating system highly optimized to running Kubernetes. It has no shell, no SSH, and its operation is entirely defined by a static manifest, and it's solely managed by API. If you are wanting to manage your own Kubernetes clusters, you want to be running those clusters on TALOS. We hear constantly about the need for hybrid and distributed Kubernetes deployments, and we have been developing a solution to this coordination and key exchange problem. As of the upcoming release, version 0.12, we include support for automatic node, automatic full mesh, WireGuard deployments requiring no additional external tooling or configuration. Users need only set a single configuration flag, and all communication between their nodes will be transparently, automatically encrypted and transported over WireGuard. We call this system Kubespan. The method we have developed is also entirely impact free. We do not manipulate IP tables. We do not manipulate main routes. We don't interfere with Kubernetes view of the node IP address in any way. We simply interact with the kernel's net filter and core routing systems to redirect traffic to nodes and pods through the WireGuard interface. As with all communication and code we write, the entirety of this system is open source and it comes with the permissive Mozilla public license. The key pieces of information needed for WireGuard generally are the public key of the host you wish to connect to, and the IP address and port or the endpoint of the host. The latter is really only required of one side of the pair. Once traffic is received, that information is known and updated by WireGuard automatically and internally. For Kubernetes though, this is not quite sufficient. Kubernetes also needs to know which traffic goes to the WireGuard peers. Because this information may be dynamic, we need a way to be able to constantly update and keep in sync this information through all the nodes. If we have a functional connection to Kubernetes otherwise, this is fairly easy. We can just keep that information in Kubernetes. Likewise though, we have to have some way of discovering it. In our solution, we use a multi-tiered approach to gathering this information. Each tier can operate independently, but the amalgamation of the tiers produces a more robust set of connection criteria. For this discussion, we'll point out two of these tiers, an external service and a Kubernetes-based system. The external service, we maintain a public discovery service by which members of your cluster can use a common and unique key to coordinate the most basic connection information to get the WireGuard link up, the public key and the set of possible endpoints. While we offer this as a public service, the same code is open source and may be run on your own internal equipment. We offer a simple configuration mechanism by which you can select a non-default discovery service. The Kubernetes system utilizes annotations on Kubernetes nodes which describe each node's public key and local addresses. On top of this, we also route pod subnets. This is often, maybe even usually, taken care of by the CNI. But there are many situations where the CNI fails to be able to do this itself, especially when this is done across networks. So we also scrape the Kubernetes node resource to discover its pod CIDRs. One of the difficulties in communicating across networks is that there is often not a single address and port which can identify a connection for each node on the system. For instance, a node sitting in the same network might see its peer as 192.168.2.10. But a node across the internet may see the same node as 2001.db8.1ef1.10. We need a way to be able to handle any number of addresses and ports. And also have a mechanism to try each of them. WireGuard only allows us to select one at a time. For our implementation then, we have built a controller which continuously discovers and rotates these IP port pairs until a connection is established. It then starts trying again if that connection is ever lost. After we have established a WireGuard connection, our work is not yet done. We still have to make sure that the right packets get sent to the WireGuard interface. WireGuard supplies a convenient facility for tagging packets which come from it, which is great. But in our case, we need to also be able to allow traffic which both does not come from WireGuard and is also not destined for another Kubernetes node to flow through the normal traffic mechanisms. Unlike corporate or policy privacy-oriented VPNs, we need to allow general internet traffic to flow normally. Also, as our cluster grows, this set of IP addresses can become quite large and quite dynamic. This would be very cumbersome and slow to maintain in IP tables. Luckily, the kernel supplies a convenient mechanism by which to define this arbitrarily large set of IP addresses, IP sets. Talos collects all of these IPs and subnets which are considered in cluster and maintains these in the kernel as an IP set. So now that we have an IP set defined, we need to tell the kernel how to use it. The traditional way of doing this would be to use IP tables. However, there's a big problem with IP tables. It is a common namespace in which any number of other pieces of software may dump things. We have no surety that what we add in there will not be wiped out by something else, be it from Kubernetes itself to the CNI to some workload application. Be rendered unusable by higher priority rules or just generally cause trouble and conflicts. So instead, we use a three-pronged system which is both more foundational and less centralized. NF Tables offers a separately namespaced, decentralized way of marking packets for later processing based on IP sets. Instead of a common set of well-known tables, NF Tables uses hooks into the kernel's net filter system, which are less vulnerable to being usurped, bypassed, or a source of interference than IP tables. But they're rendered down by the kernel to the same underlying X Tables system. Our NF Tables system is where we store the IP sets. Any packet which enters the system, either by forward from inside Kubernetes or by generation from the host itself, is compared against a hash table of this IP set. If it matches, it is marked for later processing by our next stage. This is a high-performance system which exists fully in the kernel and which ultimately becomes an EPPF program, so it scales well to hundreds of nodes. The next stage is the kernel-router's route rules. These are defined as a common ordered list of operations for the whole operating system, but they are intended to be tightly constrained and are rarely used by applications in any case. The rules we add are very simple. If a packet is marked by our NF Tables system, then it is sent to an alternate routing table. This leads us to our third and final stage of packet wrangling. We have a custom routing table with two rules. Send all IPv4 traffic to the WireGuard interface and send all IPv6 traffic to the WireGuard interface. So, in summary, we mark packets destined for Kubernetes applications or Kubernetes nodes, send those marked packets to a special routing table, and then send anything which is sent to that routing table through the WireGuard interface. This gives us an isolated, resilient, tolerant, and non-invasive way to route Kubernetes traffic safely, automatically, and transparently through WireGuard across almost any set of network topologies. With Kubespan, we can easily handle a huge set of hybrid and multi-location Kubernetes solutions. Taking a look at our example scenarios from the beginning of this talk, we can easily see how WireGuard and Kubernetes solve their use cases. The voice provider can run their core infrastructure in their data center, processing using their fixed-cost assets most of the time, utilizing no-cloud resources. If they are wise, they will have used our sedero, bare-metal resource management system, which gives them a powerful cluster API management plane and fully automated network boot servers. But, in any case, their normal workload is run entirely within their own hardware. When a high-traffic event occurs, say one of their bank customers needs to notify a large number of their customers in a short period of time, they can add in additional resources from AWS. They have a simple auto-scaling group defined there, which adds nodes very quickly if they're using tallows, which installs and boots into Kubernetes faster than just about anything else out there, to the cluster. When each of those nodes comes up, it discovers the WireGuard connection information for every other node in the cluster, and they, in turn, discover the new node. Everyone connects to everyone else, and even though some of the nodes are in AWS and some are in the private data center, everyone talks as if they're in the same place. The large retailer has their control plane, database, and most of their resources in GCP. At each store, they have a number of point-of-sale terminals and back-office workstations, which are run by a few local servers for redundancy. The workloads on those local servers are all managed by the control plane in the cloud, but if their spotty local internet goes down, they can still operate because those servers don't need a constant connection to the cloud control plane. They simply cache the new data and reconcile when the internet comes back up, receiving any new workload orders at the same time. Even better, if the network doesn't come back up quickly, they connect their router to an LTE modem, and they don't even have to change anything else. The nodes just connect up however they can and continue working. The public transport organization has their Kubernetes control plane running in an approved cloud provider, but all of their rider data is stored on physical machines at a data center inside their authorities' borders. Each of their transportation hubs has several display boards which regularly receive updates of times and locations of their buses and trains. They even have sensors and detectors connected over public Wi-Fi and dispersed across a large number of locations. And all of this is coordinated by a single Kubernetes cluster. All of it communicates securely to the other resources in the cluster, even over public Wi-Fi and common network links. WireGuard secures everything. All regulations are followed, but they are able to leverage the best resources for each component across a wide variety of locations. There are myriad other use cases for using WireGuard to tie Kubernetes together, and we are building solutions for them. Join us, explore our products and tools, and let us help you build the systems you want to build in the way you want to build them. We have the public room Talos on Matrix.org. If you prefer proprietary communications platforms, join our Slack. We have a great deal of documentation, examples, tutorials and other information on our website. Pull our code from GitHub, give us a try, and reach out. Thanks for your time.