 Yeah, I'm Aaron and this is my colleague Nina and we are going to be giving the talk on developing a mental model of Istio and so at At solar organization There are you know a number of people who are contributors to Istio or leaders like we are on Definitely in the user camp and so this kind of chronicles our journey for understanding, you know, how how ambient Istio works And how to interact with it So yeah, let's get started So Yeah, so we will so in this talk Yeah, we'll talk about our journey of how we learned The differences in Istio's API so ambient is not a drop-in replacement So there are differences in how The API is used in sidecar versus ambient mode and this was a difficult thing for us to wrap our heads around and So it kind of helped to dig a little bit deeper and understand what's happening under the hood in order to develop that mental model of exactly why Why we're seeing what we're seeing and like how to effectively use ambient so that hopefully you should get that out of this talk So I'm gonna start here just thinking about some of the abstractions of kubernetes And we're gonna focus on the service abstraction since it's the most relevant for service mesh And so these abstractions can be implemented in a number of ways under the hood We'll use that to motivate our journey so we start off with a service in plain kubernetes And even here there are many different ways to implement it, but this is one that I think We're all familiar with and so What I'm doing here is I'm actually breaking the abstraction a little bit And so how many of you have programmed with the go programming language? Yes, okay, so when learning that Learning how interfaces work and learning how slices work The way my mind works I kind of had to think of them as Like the structures that they were under the hood like an interface being like a pointer to data and then a pointer to Like type information stuff. I don't really care the details, but but just knowing that kind of explained some of the weird behaviors that that that go can can That you can see and go so this is kind of similar like knowing what's under the hood can explain a lot So here we go with with plain old kubernetes And when you have a service you typically interact with it via a URL right which is written according to some convention and the idea is that your application will hit this URL and Kubernetes will route it correctly So a service is an abstraction for a number of pods and you don't care how many pods there are or What's happening to them? You just wanted to go to one of those pods and kubernetes delivers right and so the way that works under the hood for example is By writing this URL that follows a convention. So it's like the service name dot namespace dot SVC dot cluster dot local or whatever the the you know domain is Your application makes a DNS request and kubernetes has a DNS server Which answers that request and it returns a fake IP address? so Called of a VIP or virtual IP right and so there's nothing in the cluster that actually has this address like no pod No node. It's just a value It's a value that kubernetes picked in order to associate this this value with with a service, right? so you you get this this fake IP back and Then your container makes a request and this is where the magic happens under the hood So there's another component called kube proxy which configures the Linux kernel to take that Fake IP address and then do something interesting with it in this case. We load balance it across a number of services so some complexity on the hood, but just having a General high-level idea of what's happening under the hood is helpful here Now let's introduce Istio with sidecars into the mix So the picture looks fairly similar But there's some key differences number one our pods grew so we have our application container in it But we also have a few other containers One of which is pilot agent and envoy But they serve some of the similar roles so you make a request to the URL hasn't changed, but this time you go to Istio's DNS you can enable this DNS capture in Istio So this is Istio's notion of what DNS should be and so it can override what kubernetes thinks and this is actually a rather powerful thing But in this case, we're just looking at the service of action. It doesn't so let's say it returns the same virtual IP and Then some the same magic happens under the hood with IP tables rules and The traffic goes somewhere else so it goes to a kind of proxy So instead of going to kube proxy which load balances it it goes to an envoy instance This is the sidecar that's in the pod and that envoy instance does a similar thing It load balances that request and picks a pod to send the request to The key what a key difference here is that it just doesn't send the traffic Unaltered it wraps it in an MTLS connection. So a secure connection to the other pod, right? So it's encrypted It reaches the envoy instance the sidecar in the other pod is decrypted and once it is decrypted it is sent to the To the destination container. So We added some some value add here, but it's fundamentally the same pattern and he uses the same abstraction So it's it's not all that different. Okay But now we can push the boundaries of that abstraction Let's say we want to have this URL Linux foundation org slash Baz, right and we want a container in our our mesh whenever Makes a request to that URL We don't want it to go to like whatever the public DNS is for that and through an ingress gate We mean what have you we want a short circuit that and just send that to the pod that will that will answer it You know, we just want that to stay in the mesh and be simple So that's something you could do in Istio and it's a very similar path, but you have a different URL The request goes to DNS and maybe it returns like a different a different virtual IP But ultimately at that point Istio does the the same thing as before and the request goes to its It's it's it's it's intended destination But this starts to like then we can start to break out of this this this service Substraction because that's not really something that you can easily do with with the with the service The service custom resource in Kubernetes and then we can add things to this like let's say that there are 20 pods that that That's like implement slash Baz and let's say another path Foo or something has a different set of pods and we want to route traffic to you know different You know different sets of pods based upon the path And so that's like a form of routing and that definitely brings us out of the the comfort zone of the service abstraction in Kubernetes So this is where we get into Istio's API itself So like these are some selected Resources in Istio's API in this talk. We're going to focus about on virtual service and authorization policy But when thinking about Istio's API, it is very helpful to Consider where this API is being implemented. What proxy is this happening in so these on On one side the virtual service destination rule and service entry These all happen in the envoy instance of the pod that is making the request the client or consumer Sidecar there's various different terminologies that are used so like that routing decision Okay based upon based upon the path or maybe maybe you're doing a canary rollout or or like blue green testing and have Shifting traffic to some percentage like that in in the sidecar base mesh that happens That happens on the requester side in their envoy instance and it's doing something, you know more complex than just plain old low balancing Now as far as the destination pod the server or producer sidecar These apis pure authentication authorization policy request authentication are kind of inherently Inherently happen on the server side. So like we'll be focusing on authorization policy You know it makes it just it makes sense for the the sidecar of a pod to Determine who gets who gets in it. That's something you would not delegate to the requester so With sidecar based Istio, this is kind of your mental model of the API and where it's implemented and all is good so ambient changes things a little bit and You know with with with Istio With with sidecar base, you know the envoy proxy is kind of doing both L4 level like not like just just IP based and Layer 7 Processing in the in the envoy instance Istio splits that apart so It has a secure overlay layer which is implemented through Z tunnels which basically is Responsible for point-to-point security and not a whole lot much else and this is this is kind of I think that's where the name Ambient came from I mean it's just around these each tunnels are just around and you know magic happens under the hood and you get secure Communication point-to-point between pods in the mesh But layer 7 is where they really start to differ and layer 7 is where you do policy that requires Knowing about things like what HTTP is what an HTTP method is what a path is and all that sort of stuff So like routing is an example, but let's go back to our service abstraction, right? So here we have Here we have our service abstraction in an ambient and Fundamentally it doesn't look all that different from the others the pods shrunk We no longer have sidecars in it, but we have a DNS. We have a virtual IP We have some IP tables rules or something which is directing these fake IPs to to to a Z tunnel, okay, and the Z tunnel is what does the load balancing and encryption it does a slightly different spin on MTLS called H-Bone Takes HTTP based overlay network, and I don't remember what the e is so let's just say it's excellent So so yeah, so basically it's it's it's it's essentially just tunneling the traffic Still MTLS encrypted and it goes to a corresponding Z tunnel and then to the application There are some differences here in the sense that in In ambient Z tunnels are a daemon set so they exist on each node But as far as our mental model is concerned like we don't care all that much like the Z tunnel like Could be could be a sidecar for all we care could be like who like who cares where it is it basically is responsible for For providing this point-to-point Secure communication it does not do any l7 stuff and you know We really don't need to think about it all that much or at least as far as like our mental model is concerned, okay Now when we think about l7, that's when things really start to get different So the sidecar model up in that corner We have one pod communicating to another and we see the client has virtual service destination rule, etc That's the old sidecar model now with ambient you have to explicitly opt into l7 and furthermore That occurs Only on the server side, so there is no sidecar on the client side So if you look at the traffic it goes from your pod to a Z tunnel and Then if the destination has a waypoint it goes through that waypoint Policy is enacted and then it goes on to the pot. So if you look at the API That's implemented in the waypoint. We see authorization policy peer authentication request authentication Everything that previously was on the server side, but now we also see virtual service and destination rule on the server side But they're not exactly the same okay, they they Are scoped differently So the waypoint in ambient is scoped to a service account or a namespace So it's like a given app. This is my waypoint. I deployed it It is it is it is my security umbrella or such It's in my namespace and I control it I'm the only one that controls policy for us and all traffic has to go through my waypoint And so that does place some limits on what you can do with virtual service or destination rule, right? But here's the analogy that like really worked for me at least when trying to figure out We're trying to think of this thing So like imagine if some of the patterns we see in service mesh happen in the outside world, right now Obviously it doesn't but let's let's just think about it this way. So we have Lars and Anna here okay The top scenario Lars's scenario is similar to the sidecar world in Istio and in that world Lars's computer would need to know would need to know stuff like routing rules Like it would need to know that if you're going to cncf.org slash project slash spire There it would need to know the possible endpoints for it and like maybe other policy like maybe there's a canary rollout going on So it would need to know to send 90% of its traffic here 10% there like all the routing decisions are made on the client side, right? But that that really can't work in the real world right because like an internet scale like you would you know You would have almost infinite policy that that that the clients would need to Need to implement and like there'd be a lot of a lot of trust thrown around like you would need to Believe that the clients are are Enacting this policy so this this wouldn't work in the real world in a constrained environment like a mesh Yes, real world. No, so the real world is more like the lower picture Anna. She Follows a URL and it goes through an ingress and that ingress then knows the the various Policy routing rules etc. It knows that the slash spire endpoint will load balance between two endpoints and it's in a canary rollout and 90% of the traffic will be going here So, you know that analogy is kind of powerful for understanding, you know, what happens in in in ambience So in the sidecar based API you can create a virtual destination and and say, okay I want you to route your traffic in this way and where you is any other pod in the mesh any other namespace Okay, so like so me You know can can can tell you what to do in the Ambient way If you opt into a waypoint, I mean it is basically an ingress and it's something that you control and You create policy in it and any traffic that comes to you Goes through your waypoint and you are solely responsible for the policy that is in that waypoint And like there are various aspects of the of the existing Istio API that like that don't that don't work in that model like Export to for example is a field on virtual service. That's that's the one that tells like hey you namespace you have to route it This way etc So like this is this is for me at least this was kind of the the key analogy or insight which explains Which explains a number of things like this also explains why Istio decided to use the gateway API Because that's basically what you're doing You're essentially creating an ingress to a service a gateway to a service as such and so it's it's a it's a really it's a really good fit But you know, I think in general it You know it helps to understand Given a service mesh and an API Where is it implemented like find the proxies and and figure out is a client side server side and then like Once you go through that in your head you can develop a mental model of how things work so for like, you know For sidecar for for ambient for something like linker D for something like silly a mesh when that's out You know that type of thinking for us at least was very illuminating for understanding how to How to use the API of the mesh and why the API of Istio Differs a little bit between Sidecar and ambient So if this is a little bit cerebral Nina's going to take it to with some concrete examples and show like some of these differences in action And we have a couple of local VMs running and we'll give a little demo here. Yeah, so Yeah, let's so all of this hopefully demo gods are with me today But I have so let's reload or is that I have a repo if you want to follow along or do it on your own later Which walks through the examples I'm going to do today And if everything falls apart we also have recording so Without further ado, let's get started. So like Erin mentioned we have two VMs both running to a kind cluster each So on the left here I have my sidecar cluster and this is running Istio in sidecar mode So if you notice the example I'm going to use is the classic Istio one so book info And I have everything in book info deployed and all of these have a sidecar right so the Istio proxy here has been injected already and it's injected for everything in book info and then Looking here. I also have Istio D. So that's that's my control plane. That's running Then if we take a look at cluster 2 we have Ambient installed here. So in ambient mode the first thing you might notice is that there are no sidecars So the only things I have for everything in book info is my curl container and the actual book info app Cool, and then In this example, we actually have two nodes So on the first node you might notice because the z-tunnel is running on a daemon set each node has its own z-tunnel and the Book info has been split across those like that cool, so now demo time so hopefully sidecar And then So I scripted it up so I don't have any butterfingers But again like you can look at the steps on the GitHub repo. Is that fun too small? Let's let's make it a little bigger Okay, cool So what I'm gonna do is apply the same policies in both clusters and see what happens So the first thing we're gonna do is apply and a layer 7 authorization policy So the layer 7 authorization policy we're applying in both clusters is completely identical And it's only gonna allow product page to hit ratings. So let's apply that now by this one So the first run in parallel We're gonna send our first request from product page and hit ratings So this should go through because the l4 policy will allow it So we're gonna run that and run that in a ambient and we get the same response and now we're gonna do the same thing but you know try curling from a Pod that doesn't have permissions, right? So we're going in this case from reviews, which isn't allowed to ratings. So let's see what happens So in the sidecar mode, we get an RBAC access denied 403 in the ambient mode We get this command terminated with Exico 56. So what's happening here? If we go back to our ambient cluster, you can notice that we don't have any waypoints right now So all of this access policy is getting enforced at the z-tunnel level So that's why you're getting the different response in sidecar versus ambient Cool, okay, so now let's let's go up to level level seven later seven So The policy we're gonna apply first is gonna be completely identical So we're gonna try matching on the ratings app and then We're gonna try adding a header match So if Istio is cool is present in the header and we're coming from product page We allow the traffic if it's not then we block it So let's apply it. So we applied it Put it there and now let's send some traffic. So remember right now in ambient We don't have a waypoint. So there's no way of actually enforcing that this header is there But in sidecar we don't really care because the sidecar is injected to everything. So If we run the sidecar request first going from product page to reviews with the header passes And then when we go from product page with the incorrect header, this should be blocked So we see that our back axis tonight again But in ambient if we go from product page to ratings with no waypoint We get the same response were blocked So before when we didn't have the header we were allowing that traffic now because we can't enforce it We're blocking it. So and you can see the same thing when we do a Similar experiment going from reviews to ratings. You're still being blocked there as well All right, and then the last thing that we want to check on the sidecar case is can we still go from reviews? So is this still being enforced the principles? We're going from reviews to ratings and then having the correct header and then this is also denied So let's fix the ambient case the way we we're gonna fix this is we have to change first We have to apply our gateway. So the we're gonna use the gateway API to create a waypoint So the waypoint is gonna be the fault policy enforcing point. That's gonna enforce our L7 authorization policy. So let me apply that And you can notice that the one I'm gonna use for this example is per service account So I'm gonna select the book info ratings service account and scope the waypoint to that Well, okay, and then the second thing we have to do is I will first let's check that the waypoint came up because Cool, so we looks like we have our waypoint already running since five seconds ago But if we go back to canines and look at the waypoint YAML We can see that we're actually in our authorization policy We're gonna select the waypoint and apply the L7 authorization policy there instead of using the app ratings Like pot labels, right? So we're gonna use this Istio gateway label to apply a policy Let's apply it So this is the policy. We're gonna apply again. Like I mentioned, we're switching the the label here and Then let's let's on some traffic. So let's do the same three tests. We did before product page to Ratings with the correct header first. So this should go through right? So we get our ratings response and now we're gonna do the same thing product page to ratings with the bad header so not Istio We get our back denied So this is again like you can notice that now that's going through the waypoint we actually get the 403 response It's not the exit error that we're getting earlier. Okay, last test. Let's go from reviews to ratings with the correct header just to sanity check that nothing broke and We're denied again Well, so the next thing I'm gonna show is how the authorization policy and an L7 policy works together So I'm gonna create a virtual service and again It's gonna be the same virtual service in both the sidecar case and the ambient case because the API hasn't changed, right? So we're gonna create a virtual service for ratings where we're gonna add a fault injection to turn 14 when we hit ratings So let's apply that and we already have the ratings waypoint, right? So the waypoint is what's gonna be enforcing it in the ambient case So applied it now. Let's send some traffic So the first thing we're gonna do is same test again product page to ratings with the correct header And in both cases, we're gonna get the correct response Now we're gonna do the same thing but go from reviews to ratings With the correct header too, but doesn't really matter for that So What do you think is gonna happen in the sidecar case? How many people think we're gonna get fault filter report? No, how many people think we're gonna get the RBAC? Okay, let's let's try it. So in the sidecar case we get fault filter abort But in the ambient case we get RBAC access denied So the reason this is happening is what Aaron explained instead of having the sidecar do the routing or apply the You know fault injection policies and things like that on the client side Now we have a waypoint that scopes it to the service account So you don't have a sidecar which applies the fault injection before you hit the server Side sidecar now all of that is getting applied at the waypoint. So this is why you're getting two different responses in both cases All right, last example we're gonna run through is we're gonna do a traffic shift so We're gonna apply a new virtual service to reviews and reviews is gonna go 90% to v1 and then 10% to v2 so let's go apply that and The other thing we have to do in order to get this to work is we have to apply destination rule So the destination rule is going to select which subsets we care about on the service So the same destination rule in both cases again So we're gonna apply that And the extra thing we have to do in the ambient case is we have to create a waypoint for reviews because the last gateway We created the waypoint with was only scope to the ratings service account, right? So we need a new one for reviews so the same idea as before we're gonna use the the service count here Don't need create a waypoint for reviews And let's apply that cool, and then Let's double-check that the waypoint actually got created so it looks like now we have a new waypoint for reviews running and Now let's actually send some traffic. So I have Prometheus port forwarded here on 9090 for ambient and then I have Prometheus port forwarded here for the sidecar case on 9091 So let's go go to Prometheus Copy that over and we're gonna see how many is do total requests that we're getting from going to reviews v1 Versus the total so when I execute this might take a couple minutes to get Yeah, so Like we looked at the policy before 10% is going to v1 or to v2 And then 90% should be going to v1. So if I execute that that's about it's hard to see let me zoom in about 90% going to v1 and then 10% to v2 and then in ambient the same thing so v1 About 90% and then the same thing for v2 About 10% so that's all I had to show demo wise I encourage you to try it out on your own. This is based on the getting started with Istio ambient And uses the 118 alpha version there and it has instructions on how to set everything up And you know set up a local client cluster yourself So without further ado, let's go back to the slides I Think all we have left is like is the QR code for feedback. Yes Well, I think we also have a we're promoting our panel so Getting started with Istio and then if you're interested in asking more questions. There's a panel coming up About the future of service mesh. So feel free to also attend that but I think we have some time for questions, right? Yeah, I think so Great. Yeah, so are there any questions? Question there Is there some logistical necessity for mics? Oh, there's mics right there. I think standing in the hall Okay, in the first seat tunnel slide you show the IP tables through probably you are there somehow configuring the Not entries for the services and redirecting the traffic To the Z tunnel at the way point. Yeah. Yeah, right And so like it's the Z tunnel that knows whether a way point exists and whether it needs to Send traffic to the To the way point but also like even IP tables is kind of an implementation detail because like there's different ways You can implement it you do that part in ebpf you can do like you can do it in it tables Just I think that's the first implementation. That's there And Does this also work on it? It's so it's the so I guess the question is like Could it go to waypoint if you have like another another C&I or something in there? So the thing is that so the Z tunnel is is specifically the component that has the logic in it that That so it knows whether there is a waypoint and it knows if traffic needs to be directed to that waypoint So if something else is like it exists that can have such logic that in theory like you You know that component could send it to the waypoint instead But as far as like where ambient is right now Z tunnel is right now the one thing that that that knows that logic and knows where Like if it needs to send the request to another Z tunnel or to a waypoint so there's actually a number of of opportunities for for implementing it differently or like putting other twists on it But it's still the early days and so like this so what I described is you know how it exists now So I think Yeah, we have a few more minutes Just a request to go to this line with the github Yeah, I have a question. When when do you? How do you choose where to scope the waypoint service account or the namespace? Ah, yes. Well, I mean that is so as it stands now You are allowed to choose both and I think so it depends upon the layer of the level of granularity You want in some cases? There's really no distinction between them like you have one service account for a namespace and so like a namespace Wanted to be sufficient so it's it's in those cases where you have a namespace and for whatever reason you do have Pods from a number of service accounts in them. That's just an additional tool for providing the appropriate granularity the the underlying concept is that the Is that like the granularity of So thinking of the waypoint as a kind of in the ingress I mean you are kind of at the whim of the granularity that it defines for that for that waypoint and the tools that are given to you right now are namespace or service account, but you know who knows what where we'll go in the future But you would choose for example this the namespace, but you have service accounts And you want to enforce policies in between the service accounts. Can you do that still? Well, yeah, you mean you would need to deploy the waypoints with with service across granularity for that And I don't know I don't recall if any of our Workshops have that as as an example No, okay, but no they do with book info because there is we do have an example with book info and Different service accounts for product page ratings and reviews. And so like in that scenario Maybe we'll be we'll be in touch because we did we did we had our booth We did have an instruct workshop that I think showed that scenario where you do have Multiple apps with different service accounts in the same namespace and enforcing policy between them, okay Yeah, okay But because if you would deploy a peer service account and you would have in every single service service account And it'd be the same as like kind of side scar because you would have the same amount of waypoints as well not necessarily Yeah, you mean you might I Mean it's it's decoupled because like if you have one service per service account Then yes, it reduces to something that is very similar to sidecars But in in the case where you have like 20 pods or so that that are behind that service Yeah That's where you you know and imagine the ability to scale the waypoint differently like suppose that one waypoint will serve 50 pods just fine or maybe in your use case You're heavily l7 biased and so like you have to configure it so that like maybe Five waypoints can serve like a pool of 50 pods you you have the ability to dial in the the you know what what? the the amount of Waypoints to pods which can totally vary based upon Situation and the example I gave you could also have it like on the book and phone namespace Because all yeah all the virtual services are the things doing the routing and applying the authorization like you know You're applying it to ratings in that example. So I could have shown the same example and just created a Gateway for the namespace and it would have worked in that case It just depends on like you're the way you want to scale right? Okay. Thanks Hi, it's still time to ask a question. Yeah. Yeah, thanks for a joke. First of all, I really really enjoyed it My question is with is the own sidecar mode Usually if you have some kind of batch processing in kubernetes like a job You have to wait for the side cat to come up before you can actually do your thing and make requests Is that something I still have to care for with the ambient mesh or is it so fast that I well? So so what happens is that? You know in an ambient you label namespaces So the question is like do you do you need to wait like with sidecars? You kind of need to wait for the sidecars to start before before they can start an acting policy in waypoint because Because the z tunnels and because the waypoints already exist It's actually very very fast Loop because you you you label a namespace for ambient and then like the CNI is constantly watching for pods that appear and disappear and you know basically at the point that a pod Comes into existence and you know gets its networking all set up then you know the redirection to the z tunnel Which is already up or waypoints if the z tunnel needs to route to waypoints. That's that's all That's all already there So it's really just that initial step that sets up the capture like either an IP tables rules or whatever it is That's implementing that and that's that's really fast. So there's nothing There's no pod that you need to wait for it's just you know waiting for the you know the correct read the redirection to the z tunnel Thanks Thanks for the do you still have time? Yeah, thanks for the talk. I really appreciate it and it gave me a good insight to Istio My question is when would I want to use ambient mode versus sidecast? Till now I only see I get less pots less memory usage. Are there any other benefits? Yeah, you should definitely attend the panel Late late later today, but but yeah resource usage is you know is is a major one You know, there's like there's you know, what's that my greeting? Yeah So also like you in sidecar mode you you know inject everything with the sidecar here You can like add you know l7 as you need it, right? And add you know things to the mesh more gradually So if you just care about like basic l4 observability and like MTLS Then you can just have you know no policies in place in all seven But get like pretty good, you know performance less memory consumption And not have to you know inject pods everywhere. So it's There's also other benefits of like slowly migrating like as you need you add the pieces you need And so we're heading into break and so we'll have time if you would just want to you know Cluster together here to answer questions, but I think you know the session is officially over