 and how we make the routing rules, along with this, in order to not achieve what the original just showed, the routing rules, and how traffic gets routed. So, this is how the mesh is actually. So what we have here is like, as you showed earlier, that there's an ANWY in every part, and traffic enters the ingress controller, which is also running ANWY, and then traffic exists also through ANWY, which is a dedicated ingress controller. And the idea behind that is that you would actually be able to apply policies as controller to control, or it says reconnecting, network connectivity issues. Yeah, yeah, yes. This thing is still trying to connect to the network. It's still trying to connect to the network. Yes, it says it says network connectivity issues. Okay, so as I was saying, the ingress ANWY, the role for ingress ANWY is that all traffic that exits a Kubernetes cluster and tries to access external services, for example, that you might want to consume from outside the Kubernetes cluster, they would also be subject to rate limits, policy enforcement, and so on. So that's the idea behind using an ingress ANWY to direct all traffic to that. And then traffic in the cluster gets fully encrypted, and it's like with help of Istio Authentication, Istio Auth module, which basically installs certificate per service, and then we set a mutual Taylor authentication between every pair of services, certificates automatically rotated periodically. And so all of this happens behind the scenes. So what you see is HTTP2 or HTTP1 communication between services without, it's mostly with TLS if you turn on Istio Auth, and the traffic between services always goes through. Now, how does this work in practice? What we do is that like all traffic, I mean, we have an init container that basically goes and installs a bunch of rules, IP table rules that literally traps all traffic that enters and leaves the pod and routes them through a specific code. It is basically where that's where ANWY is actually listening. And once traffic gets to ANWY, ANWY basically decides where the traffic should actually go to and then processes it as either a TCP request or a HTTP request and so forth. And in between, and in the course of this process, like what happens is then ANWY has to forward a request, it talks, or when ANWY receives a request, it actually goes to the mixer, it checks if this request is actually allowed to enter the service or not, if it's not, drop the request if it is allowed, then it actually allows the request to pass through. That's the overall flow. And in this context, the pilot basically interacts with the platform. In this case, Kubernetes, like we are also working on adding other support like console, Eureka and so on. It kind of gleaned information about what are the services, what are the pods that belong to the service, what are the various labels that attach to the services and all of this information is gleaned. And with that, we actually like configure ANWY accordingly not to route traffic. The way that actually happens is the way we've structured pilot is like, we have a plugin model, so there's an abstract service model that we have in pilot, which is like essentially saying that here are the services and here are the pods that belong to the service and each service and each pod has one or more set of labels that are attached to it. And then we map this information to, I mean, we extract information from the underlying platform. So in this case for Kubernetes, we extract information, we map it one-on-one to our internal service model. And once we have all the information that we need, then we generate the routing rules and the configuration for ANWY accordingly. And then we serve the configuration to a dynamic API which we will talk about. So the idea behind this is that we do not want to do any sort of hot reloads. For those who have actually done any kind of reloads, you'll actually know the pain of it. So we decided to not do any sort of hot reloads. So ANWY has like almost 95% of configuration ANWY can be dynamically reloaded without having to restart anything. And so there's actually no connectivity or there is no interruption at all. And once we've set up all the ANWYs such that they talk to pilot, they fetch their configurations periodically and they get reloaded. This is how we actually like update ANWYs for the case. So somebody was asking like, how long is the propagation delay for our tools? It basically depends on when an ANWY currently pulls pilot to get the new route tools. But with the newer version of ANWY, what we're trying to do is they just change in the entire thing to GRPC based communication such that whenever we have a new configuration, we can push it down to all the ANWYs. That becomes much more responsive and you have much more control in terms of when ANWY should actually see the new traffic. Now, the, so getting a little bit more deeper into this, in order to, the way we do like service discovery is we do not actually do service discovery. We literally like upload it down to Kubernetes and the idea behind this queue is that service discovery is part of the platform and literally like in a manifested in the form of DNS. So you go, you talk to whatever service discovery it is, you don't really care what IP address of your service discovery returns. It goes and talks to your Qoop DNS, Qoop DNS returns particular IP address, service cluster IP address. See you, I mean, we just need the application to be able to do a DNS resolution and then send the actual IP traffic. Once the IP traffic comes on the wire, we can actually capture that and then from then on, we can look at the HTTP headers for HTTP traffic and if it's TCP traffic, we look at the source destination in order to route the traffic accordingly. So that's the role of service discovery. And what we do is that like, you know, once we get in terms of service registration, so there are two parts to every service management system which is service registration and discovery. Once again, service registration is also not part of it. It is part of the underlying platform, essentially like if it's Kubernetes, it will ensure that the pods are immediately registered as they come up and it's also responsible for maintaining the liveliness of pods and automatically managing the pods. So we can use any other orchestrator as well. So that's the idea behind the adapter model that we have at Pilot. So today we have a Kubernetes adapter and we are also working on a console adapter so you can add an Eureka adapter. So you can imagine that like add an adapter. This was all standard VMs. Yes, yes, that's the underlying platforms typically have this and or like, bring your own. What service? Oh, no, the ANWI and services does not really need to know what service that is. Because we will configure that ANWI from here. So this service registration is just necessary for like us to know what are the services that this is. Can you, today? Yes, because I mean, the way we kind of like we, we assume that once you run Pilot within a Kubernetes cluster as a Kubernetes service, we actually go talk to API server within that cluster and kind of like fetch the information. But I think it's very trivial to actually point Pilot to here's a Kubernetes API server and here are the credentials. Okay. So once like, you know, we have all the information we've propagated down to ANWI and then the service here we expected to like, you know, use standard HTTP contact to be code on service b.example.com get resolved, you get the IP and when the call comes here, then we look at the host headers and then we decide how it's about traffic. And basically, since this ANWI has the list of all pods that are actually serving service B, we do load that at the client side. It goes directly from ANWI, other pod. No, I mean, we do not use a coop proxy or the IP tables that is actually there in Kubernetes to do any sort of load balancing. Because the crux of the system is taking control of load balancing and that's where everything and it's just falling into place. Basically like, you know, starts with load balancing but then you can actually apply richer traffic rules policies, so on. And then the other part that we actually have is ANWI also has concept of active health checks where each ANWI will actively hold other ANWI in the system for, you know, the health information and like whether it's live or not based on that, it'll actually make load balancing decision. The, there is all obviously some sort of a contention here because Kubernetes also provides some health checks and it'll automatically re-register unhealthy pods system. And then there's also ANWI which does its own health checks. The question is like, you know, who should I believe or like are both required? And the answer is yes, both are, I mean, it's not a definitive yes, but it does not hurt to have both in place because Kubernetes is basically a platform level thing. It detects an unhealthy pod, it's actually gonna re-register the pod from Kubernetes API server and then pilot is gonna detect the change and then it's gonna generate the new configurations and then it's gonna push it down to all the other. On the other hand, when all of these ANWIs are actually doing active health checks, they kind of have like first hand information about who, which of my dependencies are actually healthy or unhealthy. So this, I mean, the convergence is much faster when you actually enable health checks on the ANWIs side. Eventually you would push down the updated configuration but until that unhealthy pod goes away, the ANWIs that are making the call would actually know who's healthy and who's unhealthy. That's the idea behind having health checks from ANWI as well as health checks on the platform level. Yeah, there will definitely be a delay, but that delay is bound by how quickly does Kubernetes recognize the fact that the particular pod is down. So that once that information goes back to the Kubernetes API server, then and pilot basically gets a notification because all we do is we have a watch notification set up on the Kubernetes API server. We get notifications anytime there's a change in the service membership. So we would know what pods are down, what pods are coming up, so on. So that's, there's an eventual consistency delay in terms of when information propagates, which is why we have, we encourage the health check on ANWI as well. So essentially if you have like a hundred pods here of which 10 of them will become unhealthy and they get unregistered, it might take like a few milliseconds at top self to in order for that information to propagate back to the caller ANWI. At the same time, if this ANWI here is actually doing periodic health checks every millisecond, every 10 milliseconds, that's an example, then it is gonna have first time information about the fact that like all of these hundred pods only these nine get actually alive. And so when it has to make an API call, it would automatically do the load balancing such that it would send it to the remaining 90 pods and not the 10 unhealthy pods. And in the meantime, that information will propagate to Kubernetes and then which will come back to pilot and pilot will actually push the updated concept. Yes sir. It is, well, no, it's localized where each ANWI is polling and like right now it's all to all things where each ANWI is polling, other ANWIs are the dependent. So whatever they're in the load balancer configuration they'll poll all of them and not to keep track. And they're actually working on a different version of the health check API where we have delegation. So we'll actually delegate some ANWIs to be responsible for a bunch of ANWIs so that not avoid the one cross traffic. Okay. So there are a few key concepts that come up with pilot the traffic management aspects of pilot. And one of the first one, the one of the most important ones is label-based routers. So this is something that is not for Kubernetes or mesos or most of the other platforms. The idea here is that, like, you know, you can decide like, okay, so given that this is an example of a route tool that you actually set up very safe from this destination service or traffic going to this destination service and originating from this particular service call. So the take, route, 99% of these requests to pods that have these two labels and route one person to request the pods that have these two labels. So you can actually achieve a version of this using, let's say, deployments in Kubernetes where you can actually do a rolling deployment and you can start like rolling out a new version of a service. The problem there is it's an in-place upgrade. So it's not an active, active configuration. You actually literally go and, sorry, it's not an active passive configuration. You actually go and literally replace a pod in place with newer labels such that when the IP table stuff here on the caller side is actually trying to send requests, it would automatically get statistically load balanced across the pods. The problem there is that if you want to get a 99% you need to have a 100% pod. Whereas in this case, since we control load balancing, this is what I meant by load balancing that's not the bandai system. We get to pick like what percentage of traffic should be sent to pod one versus pod two. And if you can just have two pods and decide to split traffic 99, 90, 10, or like 50, 50, whatever way you want. So you get fine-grained control over traffic. And now the thing to remember here is that in this case, service B has like, it's just one single Kubernetes service. It has like a bunch of pods. Some of them have the 1.5 and use broad labels and some of them have the v2.0 alpha label. So what we do internally at Pilot is we group pods based on the label clusters and we create like a bunch of pods like, okay, these are pods that share the first two labels and so we create one sort of cluster based on that. A cluster and an unvoiced end. It's not the Kubernetes end. And then we get another set of pod based on, sorry, another cluster based on the other label set. And then we program, we set up the configurations and unvoiced such that whenever traffic has to go to the appropriate, this label set, we will, I'll actually show a full example of this in the next slide. But it is the high level concept, yes. Oh, you mean like from, it's actually on the per pod basis. So from every corner on we will actually have a 99 one split. So yes, but over a span of like a few hundred requests, it will actually be the same thing. So to give you a little bit more detail on this, actually I can't see the next slide. There's one more thing that we actually do, which is the ability to look at the HTTP headers as well. So because the unvoiced proxy at HTTP level, we can actually look at the content of the request and try to route traffic based to the appropriate pods. Now, the way we do this is as follows. I mean, the key to understanding pilot is actually to understand the unvoiced configuration because this is where like everything starts. So if you, how many of you are familiar with engine X configuration or the Google guys or HAProxy or, okay, so most people actually have some idea of engine X configuration. So if you look at an unvoiced configuration, a listener is basically equal to server and engine X. So you define a listener, which is on a particular port and for that listener, you define here are the SSL certificates and so on and so forth. And within each listener, you actually have one or more, what do you call server blocks in engine X. For example, you have a HTTP block, you have a TCP block in exactly the similar fashion, you actually have a HTTP proxy configuration, you have a TCP proxy configuration and probably have like Redis configuration, Mongo configuration, so on. And then the upstream in engine X where you define like here are the different upstream clusters that's equal into a cluster and unvoiced. And that is, so that gives you a high level overview. Now, within each cluster, you actually have a bunch of other configurations. You define the load balancing policy, you define the SSL certificates to talk to the upstream clusters and you can define other things like circuit breakers and so on. And then as an engine X, in each upstream, there's a whole bunch of IP addresses which correspond to the pods in the upstream. And so these, like so that's part of the cluster. Now, the dynamic reload part of unvoiced is I mean the configuration load part of unvoiced is that the listeners configuration can actually be loaded by a listener discovery service. I know it's fully named, but it's called listener discovery service that listeners configuration can be dynamically loaded. The whole of unvoiced configurations that's a bit of JSON. It's like one giant JSON and each of these blocks can actually be dynamically loaded by different services. And then the clusters together can actually be loaded with a cluster discovery. And now within each cluster, you have a list of upstream IPs corresponding to individual cluster. That in turn can be loaded by a server discovery. The idea to actually have a cluster discovery service and a server discovery service here is that the information on the list of all upstream nodes for a cluster can change much more frequently and dynamically. Whereas the number of upstream clusters, they don't change that frequently. And similarly, the listeners itself, you don't add and remove 100 different servers every second. Whereas you might do the same thing for like number of upstream IP addresses. And now with the listeners, the hashETP proxy configuration is something that's much more dynamic compared to a standard listener. So that hashETP proxy configuration can actually be loaded by something called a start discovery. Which will dynamically load conditions on which you can match the request, like match based on these hashETP headers and then route to this. Or the traffic split, for example, the traffic that comes to this particular route like slash foo should be split 90, 10 across cluster A and cluster B. This is some configuration. This is configuration that can actually load dynamically with RDS. Is there any question on this? Yes, sir. Yes, I mean, yeah. So this is all like the proxy configuration for like traffic exiting. Yes. Oh, the upstream IP is the target IP. So what about the backend IP addresses that you actually have? Yes, outbound. So now this is the configuration for Anway. Now, if you look at Kubernetes, now that throws a few like complications as well, which is you can have 10 different services listening on the same port. Kubernetes allows you to differentiate based on the domain name of the service and all the services cluster level IP addresses. Now, for hashETP, we can definitely differentiate different upstream services by looking at the host address. HTTP host address, suppose is mandatory and that actually indicates the service that you actually want to reach for TCP. If you look at the cluster IP address with which a service is being reached, you can differentiate what is the destination that it's going to. So you could have three different services on port 5506. And depending on the target IP address to which a particular connection is going to, you can decide to route to the appropriate upstream cluster. Now, but then what happens within a port? How do we capture traffic from within a port and then start routing? Because the, so if you look at it in a different way, what we do is we actually, we can, if we don't have an IP tables based like traffic captioning thing, we could just run an ANVOY and we could make ANVOY listen on all, but all possible ports for other services. But then what happens if the app on that service is listening on port 5506 and is also accessing other services on port 5506, then it starts contending. So then you start actually having like a messy conference. I mean, it's not even possible to do that. So this is where we actually added something called us, like a virtual listener and a real listener. What we do is, is that like, you know, we have one ANVOY, I mean the physical listener actually listening on a particular port. Like it's basically port 5000. So in every port, if you go and check today, you would actually find that there's only one ANVOY on port 5000. There's no ANVOY is not listening on any other port. But what we do is that when we do the traffic redirection with IP tables, the kernel will actually preserve the actual destination to which that connection went. So it's port 5506 with address 1.1.1. That'll actually be preserved. You can actually obtain that information through a special system called with the IOXL or is it a system called by which you can actually obtain what was the original destination to which this connection actually went. So we use that, we obtain that information and then we basically multiplex and see, okay, pass that connection on to one of the virtual listeners. And from there on traffic goes out to get processed in the usual way. So this is actually how we like capture traffic all in one in bulk. And then this is the key behind the whole transfer and processing stuff that happens in S-Tube, capture traffic as it is. And then like, you know, from there we just decide to split it to the appropriate listener. Destination IP. No, no, no, no, no, no, no, no, no. So the client sends it, I mean, the client does not even know Anwar is there. It just simply sends it to service b.example.com colon 5506. Traffic gets trapped by IP tables and it gets redirected to port 5000. When it comes to port 5000, Anwar basically like, you know, extracts the actual IP address like, oh, it went to 1.1.1, port 5506. When it looks at port 5506, it passes it on to the listener for 5506. And once it gets into the listener, based on the actual source destination IP address and so on, it gets directed to the appropriate upstream cluster. So all this is happening at the egress from the application where the traffic goes out of the application to other pod. When an Anwar receives connection, it will actually receive connection. Once again, since all traffic coming into the pod is also trapped, it'll again enter port 5000. But at that point, it's a direct pass through to the application in the backend. We don't have to do much of like multiplexing. But yes, for simplicity purposes, the same thing happens on the egress and in response. Yes, yes, 100 is going to different cluster. So it's not on listener basis, it's on a cluster level basis. And I'll actually show in the next slide as to like how that is happening. But so once the connection is received, Anwar basically has like a whole bunch of like, you know, it's random number stuff and decides like, okay, it's gonna go to cluster two or bar and spread traffic across one of the clusters. Yes, yes, yes. No, it does not. Because when traffic leaves Anwar, it is standard IP traffic. It's the original pod and original this thing. Yes, it's still transparent. So the way we do that is like once we receive the traffic and like we hand it off to the proper listener and they do the SSL termination based on like looking at the sand field where the services name is tied to the service account in Kubernetes. That's how we actually do that. Yes, we cannot say if you use then hash in TPS traffic directly from your pod, we treat it as TCP traffic and we just like pass it through. So which is why if you're using Istio or the rate limit in STO is that it's done on the ingress level. So all mixed up functionalities get activated at the ingress level. So essentially like whenever traffic enters an Anwar like from as an ingress traffic, it goes to the mixer and the mixer can decide to do global rate limiting based on like, like signals from all the other Anwais and this is how much traffic this particular service is receiving and then decide to like, drop the traffic or accept the traffic. But that said there's also another form of rate limit I wouldn't say rate limiting but throttling which is the circuit breaker configuration which decides there's gonna be maximum of 1024 concurrent connections going to this specific cluster and a maximum of 1024 request for connection and so on. That is handled by Anwar local. So then when that Anwar, the client side Anwar is making a request, it is not going to allow more than 1024 connections to the upstream cluster which is actually an aggregate of all the connections across all the points. Oh, the global rate limiting is, well, that means it depends on how much of the global rate limit. I think left guys already do that. So yes, there is a scalability issue and which is why things like rate limiting are like, it's more like if you do it if you have the A.K.A. management edge, where you are metering things and you're charging people by like, first million A.K.A. calls is free for the next one million A.K.A. calls is three bucks and so on. So at that point, you'd really want to like, since you're charging money, you dash into it. But within the cluster, yes, there is definitely a scalability problem and there are ways to mitigate that by caching but if you want strong consistency, there's going to be a scalability issue. Okay, with eventual consistency and like, exceeding the rate limits by a small amount, then you can cash, have stale information. Any other question? I don't think, that's it. Oh, that's it. Yes. Yes. Yes, that's not part of this one. Yes, so that's completely different and that's like, it's left to the API management layer which is built on top of the mixer or it's a separate component itself, like bring your own API management tool and that thing that passes traffic into the internal cluster. So if you want to build your own API management stuff, then you build your own set of like tools on top of the mixer, where which is where you would actually impose the global rate limit and the mixer is actually, then you would actually have something like a mixer that sits at the, so this is your ingress controller and this you can imagine that this, all your API management stuff, this will call the mixer and the API management stuff sits at the top of the mixer. What I'm saying, so it's like, so for example, this right, so all traffic enters here and you do your API management on top of the mixer and that rate limit is only applied at the ingress and not at all the unvoiced. Yes, yes. Yeah, all of the headers are just passwords. Like whatever comes in just, yeah, there's nothing that gets stripped. The only thing that gets stripped is probably the X forwarded for headers, the connection upgrade headers because it's like, if it's a WebSocket thing, it passes the upgraded address, but if it's not WebSocket, it strips the upgrade headers. But there's like a few standard proxy specific stuff that every proxy would actually strip. Other than that, everything is a password. There's, you don't have to configure. Logically, yes. Logically, you could actually imagine that like everything that ends up going to the mixer, but then there's a whole bunch of caching involved, such that like, you know, the checks are basically based on a very standard set of attributes, but every request unvoiced tracks like 10 in attributes, passes them to the mixer, the mixer decides yes or no. Yes. Any other questions? If not, I can. Okay. Thank you folks.