 So thanks for coming to the talk today. Just start with a little bit of background about myself and then take you through what we're going to talk about today. So I work for Tigera. So Tigera is the company behind Project Calico, which is the most-applied CNI for Kubernetes. Hours about 2 million nodes daily in about 166 countries. Tigera is based in the US, but we have an office in Cork in Ireland as well, which is where I'm from. So it's been a nice treat to come to the Sunshine in Valencia from Ireland. Probably had too many AgriDe Valencia's this week, but it's been a great trip. That's my first time speaking at KubeCon. It's actually my first time speaking in front of a group this big, so please just bear with me as we go through the talk. My own background is in engineering, software development. I'm still technically involved in all of the projects that we're doing at Tigera contributing, but I'm not a security researcher, so I'm not going to go deep into ciphers and the type of cryptography that's being used by WireGuard. But we will talk about some of that as we go through it. OK, so the goals today, so we're going to talk a little bit about encryption as it stands in Kubernetes, what are some of the popular options that people use currently, and why Calico decided to use WireGuard. So we'll recap a little bit about what Calico does. We'll talk about WireGuard and how it works, and then how to use mesh together to create a fully encrypted cluster. We'll finish off talking about some gotchas. So we didn't get everything right when we implemented this feature. There's still things that we want to do in the future and look for contributors to help us to do that. And there's some gaps as well that we can talk about. Really, the thing with the talk today is to see this as an alternative option to MTLS with service mesh or IPsec. One is not better or worse than the other. There's pros and cons for different encryption options in Kubernetes, but this is to present a different option, especially if you're already using Calico as your CNI. So the kind of story with encryption in Kubernetes today, what we see when we talk to our community, our users, the kind of number one use case for using encryption is compliance, actually. Whether it's PCI compliance or HIPAA, that's always the top of the topics when we talk to our community. We also see a lot of our users talking about zero trust in general. So when you have plain text requests flying around your Kubernetes clusters, node to node, potentially in a shared environment like the public cloud, it can make you nervous if you have a zero trust approach to security. So that is a topic that comes up a lot with our users when they're talking to us about encryption. And there's a recent NSA report from last year about hardening Kubernetes. And that recommends encrypting your data in transit. So that's what we're talking about today. It's data in transit. It's not data at rest, which is a whole other topic in Kubernetes. So by default, when you deploy Kubernetes and you deploy your workloads, when those services are communicating to each other, there's no encryption by default. So everything is plain text. Some common options that people reach for, they can reach for mutual TLS, which is very popular, encryption, TCP, and above. Normally, this comes with deploying a service mesh, either Blink or D or Istio, et cetera. And there's so many different options now with service mesh. They all offer an MTLS solution. Another option is IPsec, which is obviously an older technology kernel-based. So there's a couple of different solutions. There are other options. And a lot of our users as well kind of roll their own certificate or TLS implementations, often embedded with the applications themselves. So lots of different approaches. So just to talk about MTLS a little bit more in terms of service mesh, this gives you client-server authentication, mutual authentication. So both the client and the server are authenticated together. With MTLS, with service mesh, they provide certificate management, certificate rotation, and they manage the lifecycle of their certificates for you. Different service meshes then will provide slightly different options when you're deploying encryption on top of them. So for example, Istio allows you to encrypt a corner of your cluster, just a specific part of your cluster via namespaces, which is a very useful technique. LinkerD automatically enables encryption when you deploy LinkerD. So it's not an option. You get encrypted traffic by default service to service when you deploy LinkerD. So what if you don't use a service mesh? That's where we were thinking about this feature and how we can leverage what Calico already does for you as a CNI, especially at the data plane level, and how we can provide a different encryption option, an alternative to IPsec at that lower level in the TCP stack. So what we'll do is we'll just recap a little bit here about what Calico does. It just hands up who uses Calico as their CNI. That's about half the people. That's pretty good. So it's very battle-hardened. It's been around for a very long time. It originally started as a Python project for OpenStack many, many years ago before pivoting to Golang and being a CNI plug-in for Kubernetes. As I said, it powers about 2 million nodes daily that we know of across 166 countries. So the purpose of talking about that is to show how battle-hardened it is. So we can leverage some of what Calico already does to give us some goodness when we're integrating with WireGuard. One of the strongest points of integration is how it manages the data plane. So this is where we're going to interact with with WireGuard. It supports Linux and EBPF data planes. On both of these, we want both of these options. So you can literally swap out your data plane option. You can go from a Linux traditional IP tables-based data plane to EBPF and back again in a deployed running cluster using Calico. And we want WireGuard to work with both of these technologies. We want it to be easy to configure. WireGuard itself is extremely easy to configure. And Calico is easy to configure. So this is just a, I'll just let you take in this diagram for a second. This is a kind of a cut-down architecture diagram of a part of Calico. As you can see at the top, interesting things are changing in Kubernetes. And they're getting stored in Calico's Datastore, things that we care about. And then down in the data plane, you have a couple of Calico components. Here, I've just highlighted a couple of them. Felix, which runs in Calico Node. And CNI plugin, which interacts with the container runtime. These data plane components are responsible for programming. The kernel, programming the networking rules. So you can see here on the bottom right the two important takeaways. Calico maintains the rules and configuration on each node. So what it does is it gets a bunch of updates. It has a calculation graph, which kind of very smartly determines the right order to apply the updates on the data plane. And then it keeps that in sync. So that's the main purpose and the main goal for the data plane components in Calico. And this is battle hardened. The important thing is to imply these updates in the right order without interrupting traffic. So again, a takeaway from this, Calico is good at programming primitives in the kernel related to networking. So how does that relate to wire guard? Why is that important when we started looking at wire guard? So wire guards are relatively new technology. I think most people have at least heard of wire guard. I know some people are using wire guard here that I've spoken to already. Some of its key characteristics, delineate, comes from how small it is. So it's only 3,000 lines of code. It's very, very opinionated technology. It makes it very easy to audit that code. It makes it easy to read that code and audit it for security purposes. It is positioned as an alternative type of sec at Layer 3. One of the key characteristics is that it's simple. So it's mostly transparent. So operationally simple. So when you deploy and configure wire guard, it transparently handles symmetric key exchange and everything else for you. Heartbeats and everything else. There's no key management involved. It kind of mirrors the way SSH works with keys. So you get a private key and a public key. And it's quite opinionated that way. So it takes no responsibility for how you get a public key from one peer or one node to another peer or another node. That's your responsibility, just like it is with SSH keys. Another reason why it's very opinionated is around the cryptography technologies that it's chosen. For example, Cha Cha 20 and it's Diffie Hellman. So these are important. And the reason why they're important is, well, they're state-of-the-art, so they're future-proof for now. But it's not trying to be everything for everyone, like IPsec, which is why it's so lean. When you're using wire guard, you are using, for the most part, your existing Linux utilities. So you're using IP route. You're using routing tables. You're using network interfaces. And you're using your standard Linux utilities to set that up, which is quite important for what we want. And there's also a fantastic tool called WG, which is for configuring the wire guard portion of that. And we'll talk about that in a second. It's been part of the Linux kernel for a while, 5.6 upwards. Okay, so a little bit more detail here about wire guard. So as I mentioned before, it works by adding a network interface, just your standard networking interface. And you can configure that in a normal way to create that interface. And then you can add and remove routes to send traffic to that interface using route or IP route. So very standard primitives. As I mentioned before, we have the WG tool, and I'll show what that wire guard configuration looks like in a second, and how it applies to pods and loads in the cluster. Essentially what you get after you configure wire guard is you get an overlay, which is an encrypted and encapsulated tunnel between two pairs. On either side of these network interfaces, WG0, WG0. These guys need to swap public keys initially, and then they need to be configured with a list of IP addresses that are allowed down the tunnel. So what's interesting here is wire guard associates the tunnel addresses, the tunnel IP addresses, these allowed IP lists with the public key, which is a one-on-one would appear. So you might kinda start to see some similarities here around peers and nodes, and allowed IPs and workload IPs, pod IPs, which is a really neat mapping for what we want. So another takeaway here, which is similar to Calico, wire guard is really good and designed upfront to work well with networking primitives and Linux utilities the same way that Calico is. So as I mentioned, you might have spotted that there's a natural mapping between peers and nodes and Kubernetes, and these allowed IPs, which would be a list of workload pods, for example. Okay, so we'll dive in a little bit deeper to Calico plus wire guard, and what wire guard configuration looks like and how it applies to nodes and pods. So what you're looking at here on the top, you've got three nodes in the cluster, a pod on node A wants to send traffic to a pod on node B, and he wants to do that over the wire guard tunnel, and wants it to be encrypted. And on the bottom left here, you have some representation of a couple of pods on this node B in the middle, and node B has an IP address on the network, and each pod has an IP address as well. So in this case, what you're looking at, this wall of text on the right-hand side is configuration, this is a wire guard configuration, this is what you configure using the WG utility. The first part of that is the configuration for node A. So it has an interface, a wire guard.cali interface, a network interface on node A. It has a public key and a private key. The private key is hidden, and it never leaves the node. You never share it at all. It's UDP, so it's listening on port 51820, and it has a firewall mark, which is used for routing to make sure we don't get stuck in a loop when we try to shove packets down the wire guard tunnel, and then have it exit the node at E0. The other two parts of the configuration are peers. So the middle one is node B, and the bottom one is node C. So you can see here the middle peer is reachable on endpoint 10240064, which is the IP address of node B. And then it has a list of allowed IPs, which represent host network pods, which is the first IP, and then the workload pods, which are the other two IPs. So you imagine as you scale this out and you add more and more pods, more and more nodes, this list is gonna grow and grow, so this configuration needs to be present on every node in your cluster to form that encrypted mesh. And that's what Calico does. So Calico is programming and reprogramming this wire guard configuration, the routing rules, the IP rules over and over again, every time it listens and gets an update from the relevant things that are changing Kubernetes or reprograms to data plane. So that's all automatic, and Calico is really, really good at that. So if you kind of walk through the steps, there's a packet that's meant for node B. What happens is wire guard asks, okay, what peer is that? It looks up its configuration and it sees that that IP address, that pod is represented by that middle peer ID, which I'm not gonna try to call it the IDN. So it encrypts the entire packet, the IP packet, using the public key for that peer. And then it looks up the remote endpoint. So what is the remote endpoint for that peer? So that's 10, 2, 4, 0, 0, 64. It encapsulates it in UDP and sends it out over the tunnel. And then the offset happens on the other side, where the packet is de-encapsulated and non-encrypted and then passed to the pod. Okay, so in summary, it's a perfect marriage. Calico maintains an eventually consistent data plane. Calico and wire guard, both like programming with Linux primitives. And wire guards peer and allowed IP concepts map very nicely to nodes and pods and Kubernetes. So as I mentioned before, as we're scaling this out, and one of the challenges is as more nodes come online and more pods are spun up and deployed, they need to be added as peers, first of all. So each new node is added as a peer to wire guard configuration. And then the public keys for that peer become part of the node manifest itself. So if you want to verify that a node is configured with wire guard, you can simply get that node using kubectl and it'll have a field which I'll show in a second, which is the wire guard public key and you can freely share that. That doesn't need to be kept secret or private in any way. Similar to an SSH public key. The pod IPs that are deployed on top of that node, they need to become part of the allowed IP list in every other peer in the mesh in the cluster. So the end goal or the end point of doing that, and this all happens in a few seconds. It is eventually consistent, but it only takes a couple of seconds across a medium-sized cluster. It results in an encrypted mesh. So all pod-to-pod traffic are now flowing through wire guard tunnels between all the nodes in your cluster. So this is a slightly different view. This is taken from a blog post I wrote recently about wire guard and AKS. So it gets quite tricky to support this type of model as you look at AKS and EKS and those type of managed platforms where the networking is quite different. So we do have support for EKS and AKS, but this is kind of a different way to look at it. Hopefully an interesting way to look at it. Green is unencrypted and red is encrypted. So in this example, this is actually a GIF originally, but it's not moving. The network packet will leave pod A. It will appear at the other side of the virtual ethernet pair, which is Kali1234. So it leaves the pod network namespace down to the host network namespace. It gets redirected via routing rules in the kernel to the wire guard device where it gets encrypted and encapsulated and sent out over EZero across the network, where it again gets routed to the wire guard device, de-encapsulated unencrypted and then passed up to the VAT pair to the right pod in the pod network namespace. So that's the way it normally flows, but you might see a couple of interesting things on this diagram. So first of all, well, there's one really interesting thing. It's unencrypted as it leaves the pod and it's unencrypted until it reaches the network device. And if you wanted to reach another pod on the same node, it's not gonna go through the wire guard device. So in that case, when it hits the routing rules, there'll be a troll rule. It will not hit that wire guard device and instead it'll be redirected to the other VAT pair for that pod on the same node. So imagine a U-shape of green unencrypted traffic. So pod-to-pod traffic on the same node is unencrypted in this approach. It's come up a few times. It's not easy to fix. It's not easy to support this, but we are looking at it. You could move the wire guard device into each pod, for example. But you'd be fundamentally changing the way some of our network policies are handled in Calico at that point. So it's not an easy problem to fix. Ultimately, where we ended up in terms of ease of configurability and setting this up, we ended up in a place where you can enable wire guard with a single command. So you patch the Felix configuration, CRD, and you enable wire guard. This deploys and enables us to trot your entire cluster. And similarly, you can disable it with a single command as well. As I mentioned before, you fetch the node. So when you get the node description, you'll have a wire guard public key, which indicates that it's been successfully set up on that node. This is very, very useful, but we could do a lot more here as well, of course, in terms of how we do preflight checks and things like that. I think we hit the goal of ease of configuration. Okay, so in terms of performance, so there is a trade-off to consider, and these diagrams are a little bit out of date, but I think they're still relevant. We're using this open benchmarking tool, and we're gonna publish some new results soon, and we're gonna include comparisons to MTLS and service match solutions. These are in comparison to tree other CNIs that are all using IPsec. So Calico using wire guard, and these three other anonymous CNIs that are using IPsec. And they're anonymous because some of these guys have changed implementation to also support wire guard, so I don't wanna badmouth them in any way by showing out-of-date performance metrics. The first one here is node CPU, so this is average node CPU usage as we're running the HTTP benchmarks. And as you can see, there's about a 30% increase in CPU usage when you're using wire guard, and that's all of the encryption that's happening. So that's quite significant, so it's definitely something to be very, very cognizant of if you're interested in using wire guard with Calico or other CNIs. But on the flip side, the true put is six times more using wire guard than using IPsec. So that's the diagram on the right-hand side. This is the bandwidth in terms of megamits per second. So higher is better in this case. So it's a trade-off, like everything else in software engineering, you have to weigh up the cost of true put versus CPU usage, resource usage, and determine which one is more useful for you. Okay, so in terms of some gotchas and some future work that we're looking for, and definitely looking for help with, we'd love some ideas and contributions. We've had a lot of issues raised on GitHub around wire guard, which is fantastic. It means people are trying this and exploring the feature. As I mentioned, pod to pod on the same host being unencrypted does come up from time to time. And it's something that we have some ideas around, but again, it's not an easy problem to fix. Some more pre-flight checks is another topic that comes up. So when you enable wire guard using that single command, you don't really know if it's worked across all of your nodes or not. So there's an assumption here that wire guard is actually installed in the operating system that you're running across all your nodes in your cluster. So if wire guard is not running in a node across your cluster, let's say it's installed on 50% of the nodes in your cluster, you'll be able to enable wire guard. Traffic will flow across all of your nodes, but it won't go down a wire guard tunnel between a node that has wire guard and a node that doesn't have wire guard. So what that means is portions of your cluster, the traffic could be flowing unencrypted. And it might be unaware of that fact, if you expect the result of that enable command to turn on wire guard to tell you that it's not working. So we have a little bit more work there to do around pre-flight checks. Is wire guard installed on all of, and supported on all of the kernels that are running in the node and has it been successfully installed across the node? We wanna show that to people in the command through an API so that they can determine and be safe and be sure that encryption is turned on and truly being used across the entire cluster. Just to skip race conditions for a second, so the flip side of that is, sorry, similarly, we'd like to give more fine-grained control. So right now it's all our noting, so you turn on encryption for your entire cluster. Now you can go through node by node and disable it on specific nodes with a command again if you want to. But we'd like to do something like this, you're doing with name space-based encryption and offer like this policy-based encryption where you could say this corner of my cluster, I want to be encrypted and we'll keep the rest unencrypted. So that's something that we're definitely gonna work on. IPv6 supports, a wire guard supports IPv6. Some of our code is IPv4 specific, unfortunately. So we have a little bit of work there to do. And it's not quite a simple case of just enabling IPv6 support. There's some quarks there as well that we need to figure out. We've had some race conditions around cleaning up, wire guard routing rules in terms of the speed of enabling disabling wire guards. So if you're quite trigger happy with enabling wire guard and disabling it, there can be some, there has been one or two race conditions that we've been solving as they come up when trying to test and find more. So we've been pretty aggressive at trying to hunt down those race conditions and fix them. So that's it, that's the end of my talk. There's a few links here for related material. And TIGERA and all the open source project helicopter guys are at TIGERA boot S24. So come over and say hello to us if you want to ask us any more questions afterwards. Thank you. We have some minutes for Q&A. For the folks who decide to run for lunch, please be quiet. Thank you. When I activate wire guard, does the MTU configuration of the VATH interface, Calico interfaces will change to have in mind that new overhead? The MTU overhead? No, like the MTU will change to adapt to the new overhead. I can't hear you, sorry. It's just some background noise. Basically, when I activate wire guard over an existing configuration, this activation will change my MTU configuration for the Calico interface automatically. It will, yeah, it will. Thank you. Anyone else? You said that prechecks are missing. I'm wondering what will happen if somebody will kill the process on one of the nodes. Is it still working or it will drop the encryption? So there are certain parts of the data plane configuration, the wire guard data plane configuration that if you went in and changed that manually, Felix, the data plane component will overwrite that and set the world back to the right way that it should be in terms of killing off wire guard itself. So wire guard runs in the kernel. But I think if you did try to kill it, I think you possibly could result in having unencrypted on that node here. Felix doesn't monitor that it's running. It's just responsible for the routing rules and configuration. So it's not gonna monitor the process itself. That's a good point, yeah. Thank you for the wonderful talk. I have these questions. You said that the biggest motivation for end-to-end encryption is compliance. But with the AKS and the EKS implementation, we see just the traffic between two ports, the service port and the less work port is encrypted. Would it just invite this to host these terms? So that's the way it used to be. So the question there is, so for AKS and these managed environments, a portion of the host network pod to pod could be unencrypted. So that's the way it used to work. We have a feature called host-to-host encryption that were rolled out for AKS and EKS only, and that encrypts everything. So including traffic to the management APIs, okay, to the master node. So everything is encrypted, host-to-host, kind of like IPsec. You don't really have a choice. It's not just pod-to-pod. And we rolled that out on AKS recently, and EKS specifically for that reason. So that is covered in that blog post I referenced. Thank you. Anyone else, questions? First of all, thanks for the talk. You said that you automatically generate the public keys and private keys for every node, and that you can get these public keys if you do a get to the node. Do I as the administrator have to spread these keys to all nodes, or is that automatic? And if it's automatic, how do you prevent some keys to be injected? So another public key getting injected that you don't actually want in there. Sure. So this is very, very similar to how you would manage your SSH keys. So in terms of the first part of your question, Calico looks after all of that for you. So you don't have to manually distribute those public keys and keep them up to date in your cluster. So Calico does that part for you using the Calico data store as the main kind of backing data store for public keys. If you injected a public key, it's not going to work with the private key. So I think that might help with some of the second part of your question if I understand it correctly. The idea was that I inject the public key of an additional node that's not actually something that actually belongs there, of course, as the attacker I have the private key for. I see what you mean. So that's fine, you can do that. You'll also need then to program the allowed IP list and add a new peer to the wire guard configuration in every node in the cluster if you want those nodes to talk to your bad node. Okay, so we don't maintain a list of good nodes versus bad nodes. So potentially if you did that, you'd have to get onto every node. You'd have to program the wire guard configuration to know about your bad node. You'd have to add the pod IPs from your bad node to the allowed IP list in all of the wire guard configuration. What would happen is Felix would just overwrite that itself because it's not stored in the data store. So even if you went and did that manually, it would be overwritten fairly quickly by Felix. Okay, thanks. No, Felix doesn't do any key rotation. All right, last question is over here. Hi, thanks for the talk. I wanted to know if Calico supports like mesh but outside a cluster. For example, you got some, another cluster in another region and you want to peer them. Some service mesh supports it, but does Calico supports it? We don't support it right now. That is a, I should have added that to my future list. That has come up before actually. So can you extend the encryption outside of the cluster to some external service, another cluster or your client, maybe your developer laptop? So that's something that people are definitely interested in. There's companies working specifically on that end to end encryption using wire guard, open source companies working on it. It has come up from one of our users, one of our big users. So it's something we will have to look at. You could do it manually for now, but again, the risk is that what you change in the configuration will be overridden by Felix. Okay, thank you. Not supported right now. Okay, thank you everyone. Thank you Peter, and have a nice lunch. Thanks guys.