 All right, sorry about that. Let's get started right now. First of all, thank you all for coming out. We're going to be talking about Istio Ambient and its security properties. My name is Christian Posta. I'm a global field CTO at solo.io. And I've been working on Istio since actually before May 2017. So I've been part of the founding community, been around for a little bit. Mr. John? I'm John Howard. I'm a software engineer at Google. I also work on Istio. I've been working on Istio for about four years now. I'm a member of the Technical Oversight Committee. I'm excited to talk about Ambient Mesh a bit more. Awesome. Before we get started, I want to ask the audience a couple of quick questions. The first is, how many people that are here are using Istio today? Okay. When I first asked this question in the summer of 2017, it was like, blank stares. I think for each one of you that raised your hand, there's probably like a few hundred out there in the open source project that you represent. And I want to thank you all for using Istio and pushing on it and reporting issues and helping make it better. Kind of covering some of those edge cases that we otherwise wouldn't have seen in an enterprise setting. So thank you all for using Istio and your contributions. The second question I have is, how many people have heard of Ambient Mesh or Ambient Mode? Okay. All right. Good. We will leave resources and links at the end for those of you that haven't. And I think in the roadmap session, we'll cover it a little bit more. But this session is specifically going to focus on the security properties of the Ambient Mode and how it compares to what you would see in sidecars and the default mode right now. So the question and the title of the talk is, is Ambient Mesh secure? And the answer is yes. So thank you all for coming. We can take questions. All right, but let's go in a little bit deeper and dig into it a little bit. Like I said, there's not an intro to Ambient, but I do want to describe what it is for a second and why we started working on it. And so Istio Ambient Mesh is a sidecarless data plane option for the Istio service mesh that we know and love. A big part of why we started building this was to simplify how we onboard applications and get workloads into the mesh without having to inject sidecars and make changes to the deployments and so on. And how we run those workloads and eventually operate on Istio in day two, how we do upgrades and CVE patching and so on. Solving for some of those pesky little corner cases where you're deploying your applications and you see a race condition between the sidecar or psychoproxy and your workloads. So by eliminating the sidecar, we can get rid of some of those issues and make it easier to patch and upgrade Istio because it's all running outside of the application. Some secondary benefits that we do get from Istio Ambient include things, since we're running fewer proxies, we don't have to reserve as many resources for the sidecars and in certain cases we can improve performance. And I think we're going to talk a little bit more about that and share some numbers around performance in the road map. So those are some of the high level reasons why we started working on this approach. Like I said, I'll leave material on links at the end for you to dig in more. But we're going to talk about it from the security standpoint because the way Ambient Mesh has been implemented to facilitate the goals that we were looking to achieve is we separated out the data plane into two distinct layers. The first layer focuses on the security aspects of the Mesh and the second layer which lives on top of this secure overlay focuses more on the layer 7 capabilities that you would expect out of the Service Mesh. Now probably the number one reason why people start looking at adopting a Service Mesh is around security and achieving zero-trust type posture for compliance reasons, regulatory and so on. And so what we've done with Ambient is make it fairly straightforward and easy to onboard the security aspects of the Mesh and that's the secure overlay. And as you start to look closer at some of the details you'll see that that component that's represented with the Z-Tunnel that lives on each of the hosts in a cluster actually starts to push that functionality a little closer down into the CNI or into the networking layers. So we remove it from the applications. We don't have to deploy it as a side car and it runs on each of the hosts. And like I mentioned, the layer 7 capabilities are implemented in this waypoint proxy layer. I'm not going to go into too much detail but just note that this is an additional layer that goes on top of the secure overlay and for time reasons in this talk we're probably not going to cover it too much but again I'm going to leave behind. So let's take a look at what a request path or a connection path might look like in the side car approach to the data plane versus what we've done now in Ambient. So the first thing that you'll probably recognize is that when you install the side car or inject the side car next to your application the application needs to somehow force its traffic through the side car so that it can apply the Mesh behaviors. And in the most default mode, the most user friendly expedient mode when you deploy the side car we run some IP tables that does some redirection in the pod so that the traffic from the application container will make it to the side car. Now there's other modes that you can run. You can run a CNI plug-in that will take care of that stuff outside of the pod ahead of time but generally it's the redirection that happens inside the pod that forces the traffic through the side car. And when you look at a diagram of how traffic moves from one pod to the other certainly across the cluster you'll see that the things like the zero trust, the MTLS and authorization policies these are enabled at that side car level and so this is kind of what that diagram would look like. Now in ambient mode what we've done is we've replaced, we've removed the side cars as I mentioned we've replaced that with a component called the Z-Tunnel and the Z-Tunnel is not a full representation of what you see in the side car. It's not a full blown Layer 7 proxy. The Z-Tunnel just handles opening connections and establishing mutual TLS and so obviously it will need workload certificates and it will map the workloads to certain certificates and then open the connection across the network to destination workload that terminates with another Z-Tunnel living on a host. And this tunnel that we see here is created using an HTTP based overlay. You might hear the term H-Bone, that's the acronym for how we've implemented this but the Z-Tunnel, all it does is it's a very, very focused small piece of infrastructure that is just responsible for opening these connections and establishing mutual TLS. Now the interesting bit in terms of implementation details you may be hearing about, may have seen a blog about is that we didn't use Envoy to implement this. What we found is actually writing this as a custom component to solve just this problem and keep the surface area, surface of attack very low and tight is we've written this component in Rust and again written to be very focused on this use case. And so if you compare what you have deployed in a side car and all of the capabilities that come along with it to what the Z-Tunnel looks like you'll see that we don't do any of the complex Layer 7 handling. All of the various protocols that you'll see and Envoy, MongoDB and Kafka and Redis, all that stuff, that doesn't exist. And typically where you see this complex Layer 7 handling and the complexity is where the opportunity for vulnerabilities might show up. And as I mentioned, we want to keep this Z-Tunnel component as small and compact and focused and keep the attack surface as small as possible. Can't remember is that what you're talking about? Yeah, so we're going to go in kind of a deep dive of the different attack surfaces and compare and contrast how side cars and the ambient mode handle these and what security properties they do or do not hold. But when talking about security it's always useful to kind of contextualize what is our actual attack vectors? What are we trying to protect against? An analogy that I like to use is you can have a block or a super secure lock but if you have an open door next to it, it doesn't matter, right? So we actually need to understand where are the attack vectors? What are we trying to protect against? And then we can accurately compare and contrast them. So kind of the system boundaries that we typically talk about here are the node, the pod, and then the actual application containers. So the node provides one of the strongest boundaries in this. Each node is very strongly isolated, right? They can only really connect to other nodes over the network. That's a lot of where Easter's value is that across network or across node traffic is M2 less encrypted, we can apply policies, et cetera. When we start moving down into the other layers the lines are a bit more blurred. So a pod is generally seen as kind of this isolated unit but really containers in Linux and Kubernetes actually don't provide a super strong security boundary. They do provide some boundaries but it's not as strong as a node boundary, for example. And within a pod, the actual containers if you have multiple containers have an even weaker boundary. So in a typical pod with Easter you would have a sidecar and then the application container. They share a network namespace which means that they actually are completely on the same network. They can access each other's local host, et cetera. And while by default they share or they have their own file systems and process namespaces, et cetera, those can actually be merged and the lines can really get blurred. So there's kind of this decreasing amount of boundaries as we go into the system, which is important to understand. The other thing is like the last radiuses of these boundaries. In general, we're looking at one is like a node attack. If someone malicious gets access to the node, maybe they have root access on the node, what kinds of things can they do? Another one would be if the actual data plane itself, the Z-tunnel or the Envoy is compromised, whether that's some remote, you know, arbitrary code execution vulnerability or something more partial vulnerability, some way they've compromised the data plane. Another one is the application. Very similar just with the application instead of the proxy. This could also be things like a supply chain exploit where they've injected malicious code into the application, you didn't notice and you deployed it to prod and now it's doing unexpected things. So we're going to go through each of these situations and kind of compare and contrast direct head-to-head of Ambient and Sidecar and what kind of properties they get. So if we look at a compromised node, like I said, the node provides kind of one of the strongest security boundaries. So compromising the node is a huge target and gives you a lot of privileges. This is something you do not want to happen. So if you have access to the node, you can view all requests without encryption. You can do a TCP dump on the node and see what's going on. Why this applies to Sidecars is not super intuitive. I have another slide right after this that's going to do a deep dive into that. If you're rude on the node, you can also do anything you want with the proxy. You could stop it. You could start your own one. You could change the code that's running there. You could start your own pods. You have access to Kubelet, which is a highly privileged component. And because, like I said, you have access to Kubelet, you could also do anything with identities running on the node. So Kubernetes does scope the privileges of a node to only things on that node. So it's not like you compromise one node and you have control of the entire cluster. But it is, you can do anything on that node is within your purview. So Sidecars and Ambient really do not provide much difference here in protection. If your node is compromised, generally everything on that node is also compromised. So specifically with kind of the network inspection, if you look at this diagram, it intuitively looks kind of like Sidecars are more secure in this aspect. Like the green line is longer or the red line is longer on Ambient. So it must be less secure. But in practice, this is not really the case because if you have access to the node and you can do like a TCP dump on the node, for example, then you also have privilege to go enter the pod and do a TCP dump within the pod network namespace as well. So while intuitively it may seem like there's this boundary, the fact is that the node compromises an extremely high privilege and it can do a lot of things on the node. So in practice, there's no tangible difference between these two from a security posture. Next thing would be kind of a compromise of the data plane. They don't have access to the entire node, but they found some sort of exploit in Envoy or Ztunnel and they're using that to do malicious things. So in general, in this case, they can configure that data plane to do arbitrary things potentially. So they could send requests that you didn't initiate, they could mutate your request, they could block your request, whatever. This is generally the same in Sidecars and Ambient with one caveat that there's kind of different types of exploits. So if you have a complete remote code execution, you can potentially do anything. But like Christian mentioned, the Ztunnel itself has much less code in it that's doing many fewer things. So there's a lot of areas if you had a partial exploit that you could use existing code in Envoy to do malicious things. While the Ztunnel just doesn't actually do that many things. So if you have a partial exploit, it may be harder to exploit it. The other issue we're looking at is certificates. So in a Sidecar, the actual proxy has access to one certificate for the service count of the pod. And so if the data plan is compromised, they could mint new certificates for that identity, but they don't have access to other identities running on other pods. The issue though is that generally if you're able to compromise Envoy, it's not specific to one specific application. It's an issue in the data plane itself which is run in all of your applications. You run the same Sidecar everywhere. And so the attack is most likely replayable, such that this scoping of which certificates you're able to mint new certificates for is unlikely to remain scoped. In Ambient, it's a pretty similar story. The difference though is that in Ambient, the Ztunnel itself is responsible for all the identities on its node. So if you're only able to exploit one single data plane instance, you would have a larger scope of certificates that you could compromise. Again though, this is scoped to a single node, not the entire cluster. That is all the pods running on that node at that time. Alright, next up would be a compromised application. So like I said, this could be someone remotely exploited your code, or even you just had some supply chain attack where they injected some code that you didn't know would be there. So in this case, the application can do completely arbitrary things depending on what the attack was. There's nothing that a Sidecar or Ambient can do to the behavior, right? We're just there to sit at the border of your application. However, in the Sidecar, because the application and the proxy are running in the same pod, they're in the same trust boundary, and like I said, that boundary is very blurry. So the proxy or the application can do a lot of things like stop the proxy, replace the proxy with their own, bypass the proxy so their request won't go through it. In Ambient, this is not really possible because the Z-Tunnel isn't a completely separate trust boundary. That's enforced at a different layer. So the pod actually doesn't have permission to stop the Z-Tunnel or to stop redirection to the Z-Tunnel, etc. The other thing is that because the boundary for identity in Kubernetes is a pod, while only the Sidecar proxy actually needs a certificate to do things, the application implicitly gets permission to mint its own certificates. So they could similarly mint new certificates just like the data plan exploit we talked about previously. On the other hand, in Ambient, the actual application pods do not need any privilege at all to get MTLs certificates. Only the Z-Tunnel and the waypoints, which we'll talk about later, need this privilege. So generally we can remove that privilege from applications. Another one is if a completely separate node or application from you is compromised. In this case, luckily both Sidecar and Ambient provide the same security guarantees that that application may be doing things that you didn't expect, but we can at least cryptographically verify who they are and apply policies to prevent them from doing things that we don't expect. Finally would be a compromise of the control plane. This is mostly for completeness. The control plane completely programs the data plane's behavior and so it can make it do arbitrary things, remove all your policies, etc. And this applies both to Sidecars and Ambient. Finally, we get to the waypoint proxy. Like Christian said, we don't have enough time to do a deep dive into the waypoint. But at a high level the waypoint proxy is doing one, it's doing end-to-end MTLs from the original client through the Z-Tunnel, through the waypoint back to the Z-Tunnel into the application and it's within the same trust boundary as the server that deploys it. So that's a very very high level summary, but we also feel the waypoint proxy is as secure as Sidecars. If you want to learn more, there's a link here at the bottom and I think also at the end that gives kind of a deep dive more in discussion about the waypoint proxy. Awesome, thank you John. And so wrapping up here in the last few minutes that we have, hopefully it's come across that in a lot of these different attack modes or the blast radius between comparing between the Sidecar and Istio Ambient mode is that the Ambient mode is as secure as the Sidecar approach. However there is actually one big difference and I mentioned it in the beginning I'm curious how many people picked up on it. If you look at a deployment of let's say in a single cluster across multiple nodes with Sidecar you have the proxies deployed out with all the workloads you have the control plane to compare it to a similar diagram of how Istio Ambient is deployed. What's one big difference between those two diagrams? Shout it out. Fewer resources yes? What else? Fewer attack vectors? The data plane is not running with the applications right? And as I mentioned in the beginning the motivating reason for building Ambient was to improve operations and to improve our ability to upgrade and patch the data plane and the service match. And that's extremely important being able to quickly patch vulnerabilities when they're found without impacting the applications is hugely important for keeping your security posture safe. So there is a big difference between Ambient and the Sidecar ease of operations is that difference. So we've run out of time I want to leave some additional resources and how to reach us we will be around here for the rest of Istio Day so happy to take questions and I want to thank you all for coming out listening to this first session.