 Welcome. Our talk is called CNI or service mesh comparing security policies across providers. So there are a couple of different ways you can enforce policy in your clusters. And we're going to talk about how they fit together, since some of the ways they go sideways. This is my colleague Christine, who is in developer relations at Google. And this is Rob. He is an instructor at Super Orbital. So many of the talks that we're going to talk about today, they're our slides, there are a lot of them. And we're going to be going fast, so just go to the PDF afterwards. So don't try to take pictures, really. They will be a nightmare. All right, so a little bit about what we're going to cover today. So we're going to talk about what's a CNI, what's a service mesh, go back to the basics, what and where and how policy is enforced, some security gotchas that you should probably be aware of, mitigations and how the field is evolving in the future, and lastly, what you can do about it. All right, so the concepts that we are discussing today, they broadly apply to a lot of service meshes and CNIs in the CNCF ecosystem, but today we're going to be talking about Cilium and Istio, because they're the big fish in the CNCF pond. They are in the top 10 CNCF projects by commits, contributors, comments, issues. And both of them have announced large changes that are going to drastically change the fundamental ways that policy is enforced. So before I tell you about these changes, Rob's going to take you through some of the basics. All right, a quick review. Apologies if this is very remedial for you, but don't worry, like Christine said, we're going to go fast. So the Kubernetes networking model requires that every pod can talk to every other pod directly without NAT, but pods cannot just talk to each other. This needs to be handled by the CNI. CNI is the container network interface, which is basically a thing that lets your container runtime delegate network configuration to some other component. What kind of changes could a CNI perform? Well, in the context of a pod talking to another pod, the most commonly requested changes are those that do the initial set up of networking. So we're not talking about implementing networking on their own. CNI plugins don't do this. They stand on the shoulders of giants and just configure your kernel to do the networking for you. Now CNI is just an interface. You need a plugin to actually do the change. A plugin is just something that implements the interface, and it is what answers the requests when the call comes in to make changes to the network configuration. So some examples. One of the first ones was the bridge plugin. This was reference implementation from the container networking GitHub work. You can go have a look at it. Just gets you basic connectivity between pods using virtual and ethernet devices and a Linux bridge. There's no support for network policy whatsoever. There's another popular plugin called Calico. Also does the virtual interface thing. It doesn't bother with the bridges, but it does add support for network policy, and it does so with IP tables. We're gonna use a lot of wizards in this to keep your eyes out for the wizards. Another big plugin out there is Weave. It relies on the in kernel open V-switch implementation for its data plane. And it also has network policy support, does this with IP tables. And the big one, Silium, as we know, uses eBPF for its connectivity, uses a technology called Express Data Path that lets it short circuit much of the network stat for a huge speed improvement. And that is also where it does its network policy implementation. So network policy is implemented by eBPF programs in Silium. So most of the popular CNI plugins out there are going to set up connectivity between pods. They are going to have some kind of support for some kind of network policy, and you can basically think of them as like a cloud native SDM. Okay, so that's CNI side, the stock service mesh. You have microservices, good job. You want to have some standard features across all of your microservices. So let's say you want some observability, you go out, you write some code, and you wanna export Prometheus metrics, for example. Now you want to add security, so you go write some more code. You have authorization, you have authentication. You bake them into your client libraries, you bake them into your service frameworks. You want reliability. So now you go write some more code, you add active health checking, you add automatic retries for failures, you add backups to protect yourself from the thundering hurts. And then one day at stand up, when nobody even asked, somebody starts talking about code duplication. Something, something, let's refactor everything out into a nice standalone client library. Well, everything is going well with this approach, right up until the moment, some idiot wants to add a cool new language to your stack. Or a weird one, whatever, I don't judge. So then somebody has a bright idea. Let's move the whole library out of the process entirely and bake it into the infrastructure. That way you get all the features you want, no matter what is in your disgusting tech stack, you sicko. The most popular service meshes offer observability, they'll set up identity, encryption, access control, they're balancing a bunch of other cool features. So for a concrete example of how this works, we'll use our same two pods who want to communicate with each other. And all that common logic that we just talked about lives inside a container in the pod, it's the sidebar proxy. So when this pod wants to talk to this other pod, the mesh will have already used IP tables to redirect transparently all the traffic through the sidebar proxy, so it can do all the things that we just talked about. Now, because service mesh is not a CNI plugin, it has nothing to do with giving pods interfaces or configuring connectivity. It assumes you have already done that and it just uses whatever connectivity is already built. So we recall the bridge plugin example, we try to mix that in a visualization with a service mesh, it would look a little more like this, so this is a very busy diagram. So for simplicity, from here on out, the rest of our diagrams are going to just assume that container networking and IP tables redirection are already configured, so we're gonna show you a whole bunch more stuff. So in this setup, our CNI could be using EVPF to enforce network policy, or it could be using IP tables to enforce network policy, but whatever it's using, the CNI plugin policy enforcement happens here in the kernel, while service mesh policy enforcement happens here at the sidecar. So this brings us to a place where we now have two different policy enforcement points in the network, where we have one wizard living in the sidecar, one wizard living in the kernel. Now I am compelled by CNCF bylaw to inform you that no large language models were harmed in the creation of this image. Okay, wizard number one, the CNI, what can you actually enforce? Now Calico, Silium, other big CNIs, they often offer their own network policy but better CRDs that have fancier stuff in them. I'm gonna talk about just the vanilla network policy that comes with all Kubernetes, just to keep things simple, but basically you can apply policy to a broad range of traffic types, and then you can make a block or permit decision on that traffic based on some of the characteristics of the traffic itself. So kind of things like don't allow things going to the cider and so forth. So we look at our basic setup. Let's say the pod on the left is in a namespace called frontend, and the pod on the right is in a namespace called api. Let's just say we have a network policy here that says that pods in the frontend namespace are permitted to connect to the pods in the api namespace. Well, IP tables and EVPF don't know anything about Kubernetes namespaces. So instead, what's happening is that the CNI plugin has implemented something called a network policy controller, and that component is going to be querying the Kubernetes API and keeping tabs on all the pods and all the namespaces so that it can keep track of those IP addresses. So once it has those IP addresses, it's then going to be either updating IP tables rules or it's gonna be updating an EVPF map that will implement that same logic. So if the connection comes out of this pod in the frontend namespace, it can be checked against that table and dropped or permitted accordingly. So hopefully not too tricky, but I have good news for you. We have a contrived scenario because don't we all love contrived scenario when we're talking about the way people get into your clusters. So let's say an attacker has popped a shell in a vulnerable pod in some other namespace that does not have access to your API. Now we know that our CNI plugin is keeping tabs on pod IP addresses to update those IP tables rules or those EVPF maps or whatever it's using. Now let's say this pod is in a permitted namespace and it goes away for some reason. And then it will get recreated with a new IP address. This kind of thing happens all the time. That's why we run Kubernetes. So now the clock starts. If there is enough churn in the compromised deployment, this leaves a narrow window of opportunity for a connection from a compromised pod to make it past the check before our cloud native SDN catches up and cuts the cord. Now this is not a new idea. This type of race condition has been around as long as software defined networks have been around. The only thing that's changed is that now it's in Kubernetes and so we need to consider this when we're thinking about our clusters. If you want details, there's a blog post here that will show you the specifics. I will talk about mitigation but not yet because I'm going to switch gears and compare to service mesh. So what can you enforce with service mesh? Wizard number two. Again, you can apply policy on a whole range of network traffic and you can conditionally block or permit that traffic based on characteristics of the traffic itself but because service mesh lives at layer seven, the range of characteristics that you can use to make a block or permit decision is broader because it can inspect the traffic, look inside at the HTTP headers and so forth so you can say don't allow gets but you can allow posts or something like that. I'm going to focus on these two conditions for the moment because the way these two conditions are implemented leads to an interesting situation. So how are the namespace or service account conditions enforced? Well, we look back at our diagram here. We know service mesh enforcement happens at the side card. So we have the same scenario. We've got a pod in the front end namespace. We have a pod in the API namespace and we have a policy which permits pods in the front end to talk to pods in the API namespace. Great. Now, just like IP tables, just like EVPF, side card proxies also know nothing about Kubernetes namespaces. Instead, our service mesh is going to rely on MTLS to enforce policy on source namespace or service account. So when a side car knows it's going to talk to another side car in the mesh, it wraps the outbound connection in mutual TLS regardless of what it is. And the namespace that the request is coming from is encoded in the signed certificate that is used to establish that MTLS connection. So when the request arrives at the destination side, the receiving side car does not need to have a list of IP addresses in order to verify that the request is coming from a permitted namespace. It just needs to look at the contents of the certificate and verify that the certificate was signed with the key that it trusts and then it will believe that that connection came from where it said it came from. So two key takeaways. Unlike CNI land in the service mesh, a policy like this does not prevent TCP connections between pods. And anyone who can present this client certificate will be trusted as though they're coming from a namespace and service account that are in the certificate regardless of where they actually come from. Okay, I see eyebrows being raised. Y'all see where I'm going with this. That's good. The good news is those client certificates are well protected. They live in memory in the sidecar proxy and never get ridden anywhere. Even if somebody popped a shell in your pod, they have their work cut out for them to try to extract it from the process memory. So if we assume that that is safe, great. Where do these certificates come from? Well, all service meshes implement a controller component and one of the controller components jobs is to mint new certificates. So when a pod with a sidecar starts up, it contacts that controller and says, can I please have a certificate? And to identify itself to the controller, it is going to offer up a Kubernetes service account token. So this is how identity is bootstrapped in this environment. Now that token is gonna contain some claims and those claims are the service account and the namespace that the pod is running it. And the controller is gonna use those claims when it mints the certificate. Good news, everyone. I have another contrived scenario for you. Let's say an attacker has popped a shell in a vulnerable pod that does not have access to your API again. If they can find a vulnerability in this pod, which gives them a way to steal the service account token from this sidecar, maybe through directory traversal vulnerability, maybe through server site request forging, maybe they just, bat their eyelashes and say, pretty please. However it's done, they could then use that stolen token to talk to the service mesh controller and request a client certificate and then use that client certificate to make a fully encrypted and verified connection to the pod that you thought was unreachable. There's a repo right here that shows how to do this. You can check it out for step-by-step details. So there are two ways to deal with this problem. Solution number one is to follow the unit's philosophy. A tool should do one thing and do it well and you should be able to use your tools together. Joe, right tool, right job. Use your service mesh to enforce layer seven policy at the sidecar. Use your CNI to enforce layer four policy in the kernel. And then, if an attacker with a stolen client cert, sorry, an attacker with a stolen client cert will be stopped by your CNI plugin because they're coming from the wrong IP address. While an attacker with a stolen IP address will be blocked by the sidecar proxy because they haven't got the right MTLS client cert. And this is not a hot take. This, in fact, is exactly what Istio has recommended doing for years. Defense and death is not a new idea. Now, that is one way to approach this problem, but there is a more interesting approach to problems like this and that is what Christine is going to tell you about. All right, so the second approach is the evolution of the projects itself within the ecosystem. So we're kind of going back full circle. So we all know that the CNCF landscape is continually changing. I blink my eyes and there's like another new project and I'm like, oh, another one. And you know what, they're being supported by brilliant folks in the open source community. So some of these projects have decided to expand their capabilities to provide options for better use of projects. So tying it all back, Istio itself had a large announcement last year of introducing Ambient Mesh mode, a new data plane mode for separating the L7 and the L4 layers. So what is Ambient Mesh? There are more detailed talks around the specifics and there are blog posts online, but a rough overview of the security aspect of what we care about is that now we have a L4 secure overlay layer to layer on MTLS between your applications and they include denial in service to service connectivity authorization policies. And this is done with the Zed tunnel. Pardon my thick Canadian accent. So what does this look like? If we recall, the sidecar model is sitting right next to your application pod, so literally two peas in a pod. And then feedback from the community has shown that maybe this was a bit intrusive in some scenarios and then some people just wanted to layer on MTLS before trying out more service mesh capabilities. So instead of having to wait for a two out of two for your containers and your pods to be ready or restarting your deployment to inject the sidecar, you now have the Zed tunnel per node, which will intercept the traffic before it leaves or enters the node. It layers on MTLS to your cluster's traffic so you know that it's encrypted and I do wanna address that even though the slides here say IP tables, there are investigations being done to use EBPF instead of IP tables to redirect traffic. So you can see that full request at the bottom there. And for example, we have a CNCF sandbox project, I think as of December called Merbridge which implements EBPF with Istio and LinkerD. And also if you want to have richer L7 authorization policies, then you can do that at the L7 layer with the processing layer, which is done with the waypoint proxy and is configurable with the gateway API. So the waypoint proxies are like regular pods that can be auto-scaled like any other Kubernetes deployment. And AmbientMesh also uses HBone, HTTP based overlay network environment, say that five times fast. Connection is established with MTLS and it's based on the ID of the workloads that are communicating with each other. And there was a much more thorough talk around this that was presented earlier and surprisingly the CNCF is so on top of it that they've already uploaded YouTube videos. So please check that out if you want more in depth with security. And on the flip side, Cillium has also announced Cillium Service Mesh pushing upwards toward the L7 world. So Cillium Service Mesh is a sidecarless service mesh still leveraging EBPF to bypass the network stack. So it's really great for performance. But if for some reason EBPF can't handle the request that's coming in, it will fall back to using the Cillium agent that is running as a daemon set on your node. The Cillium agent runs in Envoy proxy by default and will intercept the traffic on your behalf. There are certain L7 traffic management tasks that can't be handled within the kernel but again that goes beyond the only care about security here. So there is also a lot of work being done on the Cillium Service Mesh side of things, one of them being the MTLS investigation. I was talking to Nick yesterday who was saying that even though there is, I think they're still looking for some feedback, so some TLC and get into the discussion, I have linked the issue there. So I want to be clear that there's still a lot of work being done for both projects. Ambient Mesh is still experimental and the design for MTLS is open for feedback. Kudos to both projects for being so receptive to the open source community and actively looking for that feedback, that's so healthy. And this brings us all to the L7 policy support. Istio has been saying to you C&I and L7 service meshes for a long time and Cillium has the ability to do this and now you can see that they're kind of pushing up against each other. But both projects are still involving. So rather the projects are on this path to converging with all that being said, what are some of the takeaways? So now we're gonna have a large array of options to choose from in the future. I'm not here to tell you what your engineering needs are or what the future is gonna hold for your teams. You might still want a sidecar model for more isolation and it's still needed in some scenarios and then sometimes you're gonna want something more generic like a proxy for node. I don't think the sidecar model will go away instead of companies will have to choose from the cells based on their needs. And there is a link at the bottom by a talk by Liz Rice about the trade-offs of the sidecars versus sidecar lists debate. So it's still a topic to be discussed. And there's also something to note. It's a complex cost for your engineering teams. Those are people. There's the engineering cost on-boarding of maintaining some of the risk, the maintenance and blaster radius. And again, this all depends on your specific use case at your team's use. Okay, so let's take a small tangent to EBPF because we mentioned it on our CFP and we're like, oh, we gotta talk about it. So there are a few slides here. EBPF, if you're not familiar again, it's a program that runs in your kernel. It's constrained for safety. It has tight guard rails so you can't go off and just do anything that you want. And it's checked by a verifier before it is uploaded to your kernel. So at the CNI level, you can see that it's a very beneficial tool. It provides L3 observability, routing and network policy. And there is a good talk or a good slideshow that goes more into detail. I am by no means an EBPF expert. I wish I was, but I think that is some big brain energy beyond my scope of knowledge. But the main benefit is mostly around performance from what I've understand. And this sarcastic tweet made me laugh a little bit about the highlights of the excitement around EBPF last year. I mean, it's no S-bomb, but it's still pretty exciting. And it's a cool technology, but it's not a silver bullet. Don't you ever notice that there's actually never a silver bullet, but every vendor wants to sell you a single pane of glass? So takeaways, use defense in depth and know your tools. There was a good talk yesterday, again uploaded onto YouTube already. And it has this diagram of the cheese model, which I love because I love cheese. And you can see that, you know, might as well have as much layers and you have as much safety padded on everywhere. So what can you do about it? You know, get involved. Given Istio's ambient mode is still experimental, try it out, give some feedback. Check out the Cilium MTLS design proposal and all the links are at the bottom there. And I hope you're ready to get involved with all your favorite CNCF projects. All of them, right? So again, show some TLC and some support to these projects. They need security enthusiasts like yourselves in the crowd to make them stronger and more resilient in the future. Again, my name is Christine. And I'm Rob and I am a big fan of a Python teacher named Ray Hettinger, who at the end of all of his talks polls the audience by asking the question, could I see by show of hands anyone who did not learn something new today? For everyone playing at home, I am the only hand up. Excellent work. That means you all picked the right room to be in. So at this point, I will suggest that my company Super Orbital offers Kubernetes training. We teach service mesh. We teach advanced Kubernetes controller development. And we are pretty chill about it. So if you have juniors on your team, if you have folks who need to get skilled up fast, send them to me. I train wizards. Thank you. Leave feedback or not, no peer pressure, it's fine. We have a couple of moments for questions. If anyone has questions, if there aren't questions, I could show a demo of stealing an MTLS client certificate if you wanna see that. You guys wanna see it happen? Yeah? All right. Oh, this will be tricky. I gotta switch back to mirroring. This was us goofing around doing a, okay. So, just quickly reset my environment. So this cluster has been running for a while. I've been working on this test for a while. I have long thought that this was possible, but I had never actually figured out a way to do it until just this last week. So hopefully the Wi-Fi gods will be appeased and I will actually be able to get access to my cluster. It's not looking good. Cannot reach my cluster. No, that's the last handshake. Oh, there we go. Okay, great, great, great, great. It's just real slow. Everybody stop tweeting. So here goes. So what I have are two namespaces. I have a namespace called secure and a namespace called attack. In the secure namespace, I have a service and I have some pods. In this namespace, the legit client pod is using a service count that is allowed to talk to the victim service. In the attack namespace, there is a pod that does not even have a sidecar. Notice there are only one of one containers in this pod. So I'm going to run this exercise. What's happening is it is now generating a new RSA key pair and it is stealing the service count token from the legit client pod. Just cut a copy to that over there. I then verify that the key in the... Oh, sorry, then I make a GRPC call to IstioD using the certificate sounding request that I just generated and using the stolen service count token from the legit client. IstioD returns an MTLS client certificate to me. I then look inside the MTLS client certificate to check that the key is the same as the one that I asked for it is. I then copy the private key and the client certificate into the attack pod and the attack pod is now ready to spoof that request. So I'm going to exec into that pod. Okay, so from in here, if I curl the victim service, it is in this victim in the namespace secure. I expect to see connection reset by peer. I am sending a clear text request to a pod that is demanding MTLS. So it does just hangs up. However, here I have my client certificate and my key. So if I do a curl, pass in the cert, pass in the key, remind curl to use HTTPS, despite the fact that I am actually going to communicate on port 80 and the service that's running inside of that victim pod is just an HTTP bin binary. So I'm going to hit the headers endpoint. And what you see is a 200 response back from that pod I should not be able to talk to. And what is happening here is the onboard sidecar by default when it unpacks the MTLS connection, it examines the certificate, gets the identity out and puts it into a response header so that you as a service mesh user can verify who made the request. That is this is the requester here and who received the request. This is the identity of the receiver here. So you can see that even though this request came from pod called attack in a namespace called attack, it appears to be coming from the secure namespace using this legit client service account. It worked. The demo guns are happy. Cool. Thanks. Yes. Yes. They already did. So yeah. So what's happening, what he's talking about is you can use the service account. There's like a token API where you can ask for what's called a bounded token. So it's a JWT that has an audience claim inside of there. And what the audience claim means is you can say this token should only be received by services who are in the audience. This token is only meant for, for example, IstioD or this token is only meant to be given to, for example, the Kubernetes API. So IstioD actually does have bounded service account tokens turned on by default. And so the token that lives in the sidecar proxy is bounded to only be allowed to talk to Citadel, Citadel is the component inside of IstioD that signs certificates. However, I have stolen that token. So it doesn't matter that it's bounded. Oh, okay. Oh, okay, okay, okay. This is something I'm not aware of. So how does that work? Oh, cool. That's very cool. So for, yeah. Yeah, I'm gonna just repeat that for people who are watching on YouTube later. There's work being done to allow projection of an MTLS certificate into the pod so that it doesn't have to use a token. In fact, I think there's a talk, I saw that somebody had to talk about that today, I think. Yeah, somebody did do a talk about that today? Okay, so I was busy working on this talk so I didn't get to see it. But I do wanna check that out. And to be clear, the fact that service account tokens are sensitive, not a hot take. Everybody knows this. I just think that this is an abuse of a stolen service account token that is not commonly done. That's a good point. I actually, I haven't looked with that. Oh no, actually because not this token because it's bounded, yeah, yeah. Yes, yeah, that's right, yeah. That's very cool, thank you. I have to pick up my hand down now. I learned a new thing today. Cool, thanks everybody.