 Thank you all for joining me here. I think this is the last session before alcohol, right? I think. I'm going to talk very fast. OK, so I'm going to be talking about Service Mesh, specifically to open source projects that implement a sidecar less version of Service Mesh. My name is Christian. I'm a global field CTO at a company called solo.io. I've been involved in the open source for a long time. Service Mesh for the last seven years, I guess, since early 2017. Been involved with Kubernetes since before it was 1.0. I worked at Red Hat before this and was very involved in messaging and integration and all kinds of stuff before that. So distributed systems is my background in specifically application layer, application level distributed systems. So Service Mesh is part of the cloud native networking journey. It does provide a lot of value. And we've been working at solo and even before that, working with organizations that adopt this technology for reasons like security, compliance, zero trust mandates, multi-cloud initiatives, capabilities like, obviously, some that we'll talk about today, like MTLS and mutual authentication, observability, things like traffic control. So you deploy into these public clouds. And you want to deploy new APIs, new services. You need to control traffic, split traffic, blue-green, and canary traffic, implement things like locality-aware and zonal-aware affinity. And the Mesh helps at the application layer to do that. But we can't start talking about Service Mesh and some of the architectural trade-offs that we'll look at in these two projects without looking at a little bit of history. LinkerD was the first service mesh or modern service mesh that we know today. And it was actually the first sidecarless service mesh that we know today. LinkerD 1.x, to be specific, was implemented as the finagle application libraries that were wrapped and deployed into a container. And in Kubernetes, what this looks like is you deploy it as a daemon set. Again, this is LinkerD 1.x. This is not the current iteration of LinkerD, which is 2.0, something, 2.15. But in the original architecture, it was deployed as a daemon set. Applications sort of opted in to routing their traffic through the proxy by using things like the HTTP proxy environment variable. And it would, traffic goes to the proxy. The proxy would then do things like timeouts, retry, circuit breaking, service discovery, TLS, that kind of stuff. Now, this proxy was built in Scala or ran on the JVM. So there are some drawbacks and challenges to this architecture and to this implementation that are specific to that. You couldn't size it. The garbage collection in the JVM is too big. You wouldn't use it as a sidecar. But one of the bigger challenges and bigger problems that both drove the LinkerD community to rethink how they implement LinkerD, but also modern service meshes in general, was around this idea of a shared multi-tenant per node layer seven proxy. And what that means is each node has its own proxy that handles all of the capabilities of the service mesh for the applications. And running on Kubernetes, you're not going to know ahead of time what applications and what containers, what API services are going to be scheduled on each of the nodes. So the proxy kind of has to handle everything. And so that makes it very difficult to size and to understand what resource usage will be. You're not sure what configuration each application will give to it. And it can vary wildly. It makes it very difficult for the proxy to provide any quality of service and prevent starvation and all this other stuff. The typical noisy neighbor problems that you would expect in computing. Other things like upgrades or taking down that proxy would cause the node to go off effectively because the networking is down. Jamming all the security credentials and stuff into that proxy made it a target. These are pretty well known limitations of running a basically unbounded layer seven per node proxy. So what we did to offset that is go to the other extreme. We built other proxies, Envoy proxies one. The LinkerD community built a different one and deployed it very closely to the application. Now there were benefits to this. Ideally it being a little bit more transparent than what LinkerD was doing or others were doing with the HTTP proxy environment variable. It became part of the application life cycle. So it was a side car, another container deployed in the pod and pod started up, went down, the proxy went up and down with it as well. It was single tenant. We didn't try to share multi-tenant concerns so each application could have its own configuration without worrying about what another application might do. So it avoided a lot of these noisy neighbor problems. We were able to assign cryptographic identity to the workloads and have the side car handle all that stuff for us without trying to share a bunch of keys with a central component. So these are some of the benefits. I put an asterisk next to the transparent bullet because it's not exactly as transparent as we want. But you could argue that the side car pattern, side car approach before Kubernetes formally introduced it recently was sort of a necessary evil or a necessary point in time implementation to get the value out of application level networking that we needed. API gateways, API management, all that stuff wasn't cutting it for these Kubernetes based workloads that were coming and going and scaling across on-prem and public cloud and so on. So side cars, they were what we used. But it doesn't matter what technology decision you make, there's always going to be trade offs, there's always going to be pros and cons. In the side car world, before, like I said, there was formal support in Kubernetes, we had to deal with container race conditions and the life cycle wasn't handled appropriately. Things like con jobs or jobs, they'd keep running because the side car would continue to run. The apps kind of needed to be aware of these proxies being injected. You have to use side car injection and change the YAML formats and all that stuff. Some security conscious organizations said, we don't want the private key and secret material co-located with the app. The app, whether you're using Java or whatever, Ruby, whatever, go lang, they become the biggest targets and they can get compromised and then you get access to the key material. So we want to separate that out. Things like upgrade, since you're bolting a piece of the infrastructure into the application, the applications have to be aware of when you're doing upgrades. Ideally, you would upgrade and cycle and patch your infrastructure separately from the application. So the side car's got us to where we are now. They're not without their challenges. I think there's still work being done to improve them and I don't think they'll ever go away but they can be improved. The folks at Isabel also covered some of these challenges in a blog that they published, I don't know, a couple of years ago, I think. And they pointed out some of the drawbacks to the side car, they touched on performance implications but then they kind of also introduced a new concept, EBPF, an interesting yet powerful way to extend the kernel and maybe start to push some of these things into the kernel. The only thing is EBPF can't do or shouldn't do a lot of the things that the modern Layer 7 service mesh proxies do. And we've covered this, I've covered this before, we've had sessions actually at KubeCon EU last year that talk about where EBPF is appropriate, where it's not, what it can't do, where capabilities in the service mesh should live in a proxy or not can be done in the kernel and what this boils down to is not whether there's a side car approach and that there's the shared note approach but there's also a gradient in between. The real question is where does that Layer 7 proxy run? Because you still need it. You still need to do request level things. You still need to do things like retries and timeouts and circuit breaking and handling jocks and validation and claims and all of the request level, traffic splitting, all this stuff, it needs to be done somewhere. EBPF is not the appropriate place, so you need a proxy. Now, some of the stacks, some of the service mesh capabilities are appropriate for deploying and implementing into the kernel. And that's why you see, for example, Cilium, which we'll talk about, which is a CNI, which has a lot of that capability started to build it that way. So let's dig into what the Cilium approach is to service mesh and how Istio, an existing established service mesh, has also started to implement this pattern. Okay, so first of all, let's start with some of the benefits of going with a side carless approach. These benefits were apparent also in the earlier incarnations of the side carless approach, but we've sort of worked out some of the drawbacks of that previous approach, which we'll talk about, but these are the benefits, and this is how it sort of overcomes some of the limitations of the side car. The one of the biggest things is it becomes more transparent, the applications cannot opt out, right? The applications look like they're just talking to the network, and in the network, somewhere along the path, we apply the service mesh capabilities. You remove some of these annoying things like the container race conditions, and in certain areas, you can improve performance and optimize the routing because it's not tied to the application anymore, and each individual implementation may have its own benefits, which we'll look at here. So the first is Cilium. I assume, how many people are familiar with Cilium? Can you raise your hand, please? I'd say at least half the room. Cilium is an open source CNI. It's built its network policy and routing data plane on EBPF, which allows you to extend the kernel, bypass some of the IP tables, net filter kind of stuff in the kernel, and just be very specific and very fine grain about the behavior you want in the kernel, in this case, with respect to networking. I mentioned network policy, layer three, layer four, network policy. We'll look at some of the layer seven stuff it can do. It can act as a replacement for Kube proxy, and it lays the foundation because of its layer three, layer four capabilities for what a service mesh could be on top of Cilium. In the Cilium implementation, there's a gateway API implementation for ingress, so allowing traffic into the cluster, which can then be handled by the network policy. There's an implementation of mutual authentication, which we'll talk about. I think I mentioned in the beginning, security MTLS mutual authentication is one of the biggest capabilities, biggest features that people turn to a service mesh, so Cilium community is implemented of that. The Cilium network policy resource is a big way for configuring some of these layer seven policies and service mesh capabilities, and then you can also configure Envoy directly, and we'll talk about what that looks like. Istio, on the other hand, is started as a service mesh. Started back in 2017 or so, and started as a side car service mesh based on Envoy and implementing workload identity with Smithy and has the normal observability, logging, tracing capabilities that you would expect, including MTLS. Like I said, again, I'll reinforce MTLS, mutual authentication, security, zero trust, networking. These are the main reasons why people come to a service mesh in the first place. Now Istio is not super opinionated about the layer three and layer four underneath it, so it can run on any CNI, and when we start to look at the side car list mode of Istio, it also takes that trait. It doesn't matter what CNI it's running on, it can run on Cilium, runs actually really nicely on Cilium, and implements MTLS, implements layer seven authorization policies as well as tries to be as backward compatible with the side car and interoperable with the side car approach as possible. Gateway API is also supported in Istio and has been for a little while, same as Cilium, and will be upgraded to production ready in the next version of Istio 1.22. If you look at the Istio documentation, it's gonna say beta. Beta is the wrong word in the Istio community, I don't know why we don't change that, but if you look at just about every feature in Istio it's marked as beta. When it gets to beta, it's safe to use in production. Side car list approach and architecture that we're gonna look at today is through the lens of these five, five different bullet points. Depending on the time, I'm gonna try to get through all of it. There's a lot here in each of these sections. I will post the slides online, and I'll make them available, I'll share them on my social media or something and through the KubeCon event as well. But so we'll look at the control plane for sure, we'll have time for that. We'll look at the data plane for sure, and we'll look at mutual authentication. The last two, just again, depending on time, we'll try to get through. Okay, so the control plane architecture in Cilium is actually fairly straightforward. There's a component called the Cilium agent that lives on each worker node in Kubernetes. And that Cilium agent is responsible for a couple of things. First of all, watching the state of the rest of the cluster by communicating with the Kubernetes API. And based on what gets scheduled and some of the pod startup, it'll recognize that there's a pod to get started, it'll create it's networking, it'll create resources like the Cilium endpoint and Cilium identity and so on, and share that back with the Kube API server. Now each agent sort of runs independently, it is watching the Kube API, and it's programming its local data plane. All right, so they run independently, they're not trying to communicate and share anything in between each other. So if they fail or whatever, then they can fail independently without depending on other parts. The control plane API is consists of the Gateway API, like I mentioned, the Kubernetes Gateway API, the Cilium network policy, which is responsible for configuring things like their seven authorizations, path-based, header-based, hey, you can talk to this other service, but you cannot, et cetera. And then I believe Cilium originally was planning on having other meshes and their control planes live on top of the Cilium data plane. So they really haven't built out a super featureful control plane API, but they do give direct access to Envoy. I write in caution because the Gateway API also translates to the Envoy config. Envoy config can get a little hairy, so be careful if you're doing that, they don't conflict and all kinds of other stuff. The SDO control plane is slightly different. Each of the data plane components that live on the worker nodes, they communicate with an intermediary component called the control plane, but it's the XDS relay or XDS implemented control plane. XDS is a protocol that is used to deliver updates on end points and services and routes and et cetera. This component talks directly with the Kubernetes API, watches for changes in the system, makes updates and pushes them out to the data plane. Now the reason why this is kind of separate is fairly sensitive to sort of stuff that it's dealing with and we wanna secure it and harden it and keep it away as much as possible and just make it read only to the data plane. SDO has been around for a lot longer as a service mesh. It has a more full-featured API for its control plane. Things like specifying routing and traffic splitting and fault injection and load balancing and so on. Authorization policies and handling Jot tokens and validating them and so on. So there's a little bit more, you don't have to deal directly with Envoy at SDO's control plane. Although you can break glass in the sidecar approach for configuring Envoy directly. In the ambient or sidecarless mode, there's a different approach that I'll touch on. Okay, so that's the state of the world. That's how configuration gets to the data plane. Now let's look at the components in the data plane. So in Cilium, like I mentioned, the Cilium agent which lives on each of these nodes communicates with the control plane tries to understand what the rest of the state is. It also takes care of the pods that are starting up on its local node and it's configuring the EBPF layer, the EBPF data plane that Cilium implements, which is used for connection handling and load balancing and network policy and so on. So capabilities that need to exist in a service mesh. For things that live in, if you remember that chart, above in the pink or red rows, the layer seven rows, Cilium ends up using Envoy and an Envoy proxy. And when traffic needs to, we need to enforce layer seven policy on it or we need to collect layer seven observability on it, that will be coming and it'll be served and implemented by the Envoy proxy that runs as a data set on each of the worker nodes. You can see that in Cilium we've separated out the handling of layer three and layer four from that of layer seven. We're gonna rely on layer seven for the layer seven proxy and all the EBPF and inside the CNI capabilities for layer three and layer four. For multi node, cross node communication, we have the same approach. If it needs layer seven authorizations on ingress or egress, it'll pass through Envoy proxy. I don't depict this here, but if it doesn't need layer seven, it'll bypass that completely. Layer seven is a fairly compute intensive or expensive operation. We wanna avoid it if we can, but a lot of the service mesh capabilities do exist at layer seven, so it's not that easy to avoid. In Cilium, we can encrypt the traffic that's transiting the network between nodes by enabling a feature called transparent encryption and that's delivered by an internal wire guard or IPsec. In Istio, we also have this separation of layer four and layer seven. So this is Istio ambient mode or the sidecarless mode and what we do with that is we run a component called the Zetunnel as a daemon set which handles only layer four and MTLS for the workloads on that node. What ends up happening is the pod when it's communicating with the outside world with MTLS enabled, for example, the traffic leaves the pods network namespace with MTLS enabled. So that traffic can then flow through to the rest of the network, the CNI. Like I said, Istio supports any CNI and eventually to its target destination. This is a little bit zoomed in closer look of what I mean by that. The Zetunnel, like I said, it runs in the daemon set, it's a layer four proxy, but its ports are mapped directly into the pods individual network namespaces. So when traffic exits the pod, it's already been encapsulated or encrypted with MTLS. Zetunnel knows which certificates to associate with the connection that represent, which in this case pod A or pod B or whatever, and knows to accept a MTLS or to make an MTLS connection with the other side. This is what it would look like in a non layer seven mode. So just in layer four mode, everything's encrypted and goes through the network from the pod, it terminates at the Zetunnel on the other side at the pod, maps directly as the converse and maps directly into the network namespace of the other pods. Now, like I said, we separate out layer four and layer seven. So that means at some point, if we need layer seven capability, we need to enable that somewhere. And in ambient, we enable layer seven in the network and that can be deployed in a namespace. It's typically running on a Kubernetes cluster somewhere, but the point in ambient is it doesn't matter. It's not mapped as a side card, it's not mapped as a per node proxy. It can be scaled independently, as I think I depict here, where you need layer seven, the Zetunnel initially will know where to route to that layer seven proxy. That layer seven proxy can then be scaled, like I said, independently and sized for the traffic that it needs to accommodate. Alternatively, in the sidecar mode, if you wanted to scale up, you scale up your workloads and that will scale up the proxies, but you might have more proxies than you need. And so with the waypoint approach in ambient, we can sort of get the best of both worlds. We don't use a lot of resources to support this. We use just the right amount and then right size it to the traffic load. Now, you end up taking extra hops, but what we found in testing is that instead of going from one sidecar to another sidecar, which parses layer seven, like I said, layer seven's pretty expensive computationally, we just exchange the two layer seven processes for one, and we take two layer three hops, and those are actually cheaper than processing a layer seven. Now, hopefully you have a good understanding of what the data plane looks like. This next section is gonna get into a little bit more detail. I marked this session as intermediate, hopefully it's not too much detail, but we're gonna look at the way mutual authentication specifically is implemented in both Selium and in Istio. Can you see that okay? It looks small on my laptop, but hopefully on the larger screen it's right. Before we understand how mutual authentication works in Selium, we kinda need to understand how the layer three, layer four network policy gets applied in Selium. When pod A wants to talk to pod B, that traffic will make it into the kernel, into the EBPF data plane, and what the policy engine ends up doing is saying, hey, what IP addresses this and what Selium identity is this mapped to? I don't have time to explain Selium identity, but hopefully you're somewhat familiar or could go look it up, but it's basically an integer that is associated with a set of labels. And in Selium, what we do is we associate IP addresses, for example, the IP addresses that are associated with a group of pods, to an identity. And what we do in a policy engine then is look up the identity based on its IP address, and then we say, can this identity talk to this identity? Right, and if it can, then we let the traffic proceed. If we can't, then we drop it. In the mutual authentication implementation of Selium, we do the same thing. However, before we get to that part, we ask the EBPF policy engine, hey, has this connection or has this traffic been authenticated? And if it is, then we go through the rest of the process I just described. If not, then we drop it and then we trigger another process in user space to go do the authentication. Right, so now, at some point, so there's A's trying to talk to B, at some point, the user space process, in this case it's the Selium agent, will go make a MTLS connection. It'll use spiffy-based Svids or certificates that it gets from Spire that represent these workloads, and it'll make this connection. If the connection succeeds, then the authentication part of TLS has happened, successfully, it'll close the connection and then go mark in a EBPF map, or authentic, off-cache, that yes, the flows between A and B have been authenticated and it'll put a time to live there and so on. From there, the EBPF engine will be able to go back and say, oh, this has been authenticated, okay, let's proceed to the next step. And when we proceed to the next step, we again try to figure out what the Selium identity is and decide whether or not to proceed there. Now, this is quite a bit different than what Istio does or what other service meshes do. The idea behind doing this is to be able to support multiple different types of protocols to pay the cost of the MTLS handshake once and then just kind of push things over the wire. For things to be encrypted after that, we have to go back and use the transparent encryption that Selium can do already from node to node using WireGuard or IPsec or whatever. The challenge here is, and I've written about this and I'll share the link here, is that this IP to Selium identity mapping can get confused. It is a stateful process. The caches on both sides of the nodes have to be up to date and understand exactly what the IP to identity mapping is and this can get mistaken. The Selium project is working on this. I think they mark mutual authentication as beta. They recognize this, they're working on this, but at least at this point, it's something at least to be aware of how it works and some of the known architectural issues with it. In Istio, we use MTLS for mutual authentication and like I said, Istio's always used Spiffy for its implementation and it can plug inspire but Spiffy is the identity for the Svids and it proxies that through the Z-tunnel. The Z-tunnel actually will open up a, what's called a HTTP based overlay network or H-bone transport between the two Z-tunnels so that let's say pod A is trying to talk to pod C and they open up 10 connections. You don't want 10 individual MTLS connections, you end up having one and they get multiplexed over that transport. Z-tunnel is responsible for going to get the certificates that represent pod A or pod B or whatever is running on its specific node. The ambient mode uses standard MTLS based on Spiffy. There's no caching or state that needs to be propagated and like I said, can be combined with the Cilium CNI. All right, we're running a little bit short on time. Let me do the observability part. Cilium has already, even before the service mesh components got added, has a very powerful observability implementation. The layer three and layer four implementation of metrics collection and what's happening in the kernel can be shared out into a component called Hubble. Hubble has a really nice UI for displaying the flows for the traffic in between the various pods. Envoy is used for the layer seven metrics and all of this stuff can, if you're using Prometheus or Elastic or Datadog or whatever, all this stuff can be piped back to your existing observability stack. Istio also supports these, being able to get metrics out into Prometheus from both the Z-tunnel component as well as the layer seven proxies, the layer seven components. That's something. Let's go, we'll get through the gateway one. The last one is around getting traffic into the system and various traffic control mechanisms in the east-west direction. In Cilium, we use the layer seven Envoy proxies on each node as the ingress mechanism. So traffic, you specify gateway and specify using the gateway API, HTTP routes and all kinds of stuff. And then Cilium's control panel will go program the Envoy proxies to handle traffic for ingress. Right now, like I said, it is sharing the same proxies as the east-west mechanism, but that is, I think, going to be split out into its own proxies so that you can handle that separately. I know a lot of folks who use ingress, they kind of want to have dedicated ingress nodes for ingress proxies for them. That's something, I don't know how well you can see that, but Istio, Ambimesh just doesn't do anything differently from what Istio was doing before. And Istio had an ingress gateway and ties in with the MTLS, the mutual authentication parts of Istio already. All right, so if we do a quick recap of the Service Mesh architectures, where each of them is today, they're continuing to evolve and continuing to expand and improve. So this is just recap what basic Service Mesh capabilities likely exists in a mature Service Mesh. Things at layer seven, observability, MTLS or mutual authentication and where they're kind of implemented or suggested to be implemented. In the Cilium case, for those routing controls and things like traffic splitting and retries, in Cilium, you're expected to use the Envoy CRDs directly. I do expect that to evolve and to mature, however, whether adopting an existing control plane or building out control plane API for that. But then request level authorizations can already be done nicely in the Cilium network policy. Observing layer seven traffic can be done and it's pulled from Envoy today in the open source project. And then in mutual authentication, like we walked through, that's implemented using a combination of EBPF and the user space Cilium agents. Okay, so this shows specifically what component. And then if we look at the architectural recap and compare it to some of the different patterns that I pointed out, actually one of the blogs, we didn't walk through it, but the resource utilization resource overhead of a node-based proxy system is actually really good. But you trade off feature isolation and sort of competing for that resource on the particular node. The security granularity and upgrade impact is sort of medium in this architecture. If you take down that proxy, it takes down the entire node. If you take down or the proxy gets a CVE or vulnerability or somehow attack, it can impact the workloads on that node. In Istio, in the ambient mode, we look at everything is implemented in user space except for the CNI components. The Envoy proxies used for a majority of the request and traffic management, MTLS, and that's implemented in the Z-tunnel, which is the layer four components, connection handling and metrics, et cetera, is also implemented in Z-tunnel. Now in Istio, we've eliminated the side cars, so we get better resource overhead. It'll just depend on your workload needs. The feature isolation is a lot better because we don't have any shared tenancy. Everything, all those proxies are single tenant. The security granularity, since we're not using layer seven on the individual nodes, I'm running out of time here. Since we're not using layer seven, we have a much smaller attack surface on each individual node. The upgrade impact is still, like if you take that Z-tunnel proxy down, you will lose the traffic on that node. I do wanna point out that this is the first service mesh talk of KubeCon, that there's a number of service mesh talks coming up on Friday. I think the first one's starting at 11, and Christine Kim from Isovalent will go through how the CNI and how the service mesh interact, or intersect on Cilium, and we also have a good talk on larger scale deployments with Istio and Inspire. The last thing I'll point out is these technologies are composable, they're intended to work really nicely together. Cute little acronym for the stack that can be made using these technologies as cakes. But yeah, I appreciate you coming out this afternoon, go get some beers. And if you have any questions, any comments, pushback, different opinions, whatever, please reach out to me. Happy to chat offline, or over beers. Thank you.