 Yeah, so let me introduce myself first. So my name is Thinveer Kil and I am CTO at a startup called Flux Ninja. And today I'll be talking with you guys about observability-driven load management which is a set of techniques which are used by operators to ensure fall tolerance and reliability of the services. So let's get started. So let's start off by talking about the base capabilities that you get in this year today. So the capabilities of Istio are centered around three pillars. First one being security. So basic encryption between services, mutual DNS between services and also some basic access control. So use cases such as microsegmentation making sure that services are talking to each other intently. The second pillar is centered around traffic management. So these are some basic network settings like timeouts, reprise, load balancing algorithms. And also things like passive checks, like outlier detections and circuit breaking. And also some CI CD oriented kind of features like canary deployers, AB tests and so on. The third bucket is observability. So given that Istio is kind of at this interesting vantage point where it's witnessing all this service to service chatter. So it's able to give unprecedented visibility into this chatter and that too without modifying your. So even despite these great base capabilities like once you go and take your and take any application into production there's still a dearth of features around flow control. So what I mean by that is like whenever you take your apps into production you need basic rate limiting when basically at the gateway you wanna make sure that if you have a shared service or a multi-tenant service which is shared amongst various entities or users you wanna make sure that there is fair access between the users and also it isn't the case that a subset of users are monopolizing the majority of resources. Need some kind of basic protection in that layer. Another thing is like your capacity of your services is finite, right? And because it's expensive to provisions like or over provision services. So another thing you wanna make sure is that if your services are getting overloaded it should not lead to cascading failures and the good throughput essentially of your service should remain high despite any kind of overloads coming into picture. And so the next thing is like when you're working with because whenever you're developing an app like you're always dependent on some third party systems or some open source APIs or even some managed services like DynamoDB or if you're using it if it's an AI app you're dependent on open AI for instance. So you wanna work backwards from those limits and first of all you wanna make sure you're a good citizen you stay within those limits because otherwise those third party APIs might even penalize you for instance GitHub penalizes you if you go over limit too often, right? So you wanna make sure you stay within those limits and also you wanna make sure for both these use cases both for protection as well as when it's working with external limits you wanna make sure that you're using those quotas or those limits effectively like any app will have its own set of priorities or based on the business use case like certain or not all requests are equal, right? Certain requests are more important. So you need some kind of scheduling when you're close to these limits. You wanna make sure that high importance requests are given a better chance of getting accepted. So yeah, so flow control is a must when you're running at scale even at moderate scale to be honest, right? So you need API rate limits to ensure fairness and prevent abuse. You need adaptive limits per service to prevent overload. So what I mean by that is based on where the bottlenecks are in your app typically they are in the most heaviest kind of APIs or data intensive portions of your workload typically databases or big data systems. So those usually become a bottleneck. So you wanna make sure that you get the saturation signal or the queue signal from those systems and use that to basically do throttling and do throttling as downstream as possible detect the problem upstream and block it as soon as possible to reduce wasted work. And again, you need prioritization. So once you start throttling then you wanna make sure that the most important requests get at the head of the queue. And the third one is like what we talked about like when working with third party APIs you want to eliminate the guesswork essentially like let's say you're hitting a API rate limit with open AI and get a port 29 then what do you do? When do you retry? Especially when it's shared quotas, right? You're using the same key across a lot of distributed workloads. Then how do you eliminate that guesswork and basically go as fast as possible while also prioritizing the requests. So keeping that in mind I'm gonna introduce this open source project called Aperture and how it can help to flow control in Istio service mesh and even other environments. So here's a quick overview of the Aperture project. So it's an open source observability driven flow control system. It has got a programmable declarative policy language which lets you express basically these flow control policies as control circuit graphs. These are like signal processing graphs and it's an observability driven system. Essentially it's completing the feedback loop taking observability signals from the infrastructure and using that to do flow control. And interestingly it works on top of your existing stack. So if you're using service meshes such as Istio so it's just a drop in replacement just not a replacement like a drop in augmented functionality that you get on top of Istio or even if you're using API gateways like IngenX and Com we have like Lua plugins into those projects. And if you're using none of those if you're just plainly using like simple HTTP servers then we have SDKs which can either do middleware insertion or you can pretty much write code to do this kind of flow control functionality within your application logic itself. So that's a brief introduction. Now let's delve in a little bit deeper. So what does Aperture bring to Istio? So it basically brings in these observability driven flow control capabilities. So there's a Rago based request classification layer which lets you basically look at labels in your traffic and have criteria for driving new labels or even going deep into your PLO for instance, going deep into your GraphQL APIs and extracting labels from those. And now these labels interestingly can be used for both visibility as well as control. It provides a distributed rate limiting implementation. So think of it as like Istio's rate limiting service although we are not using rate limiting service we're using Auth API but it's kind of giving you this rate limiting capability so you can do rate limit users in a distributed fashion and in a very scalable way. The second capability is adaptive limits and request prioritization. So this capability builds upon that that circuit based policy implementation essentially letting you to take any signal from the infrastructure which is basically measuring the bottleneck. It could be a queue size. It could be thread counts. It could be latencies. Any signal which indicates in a reliable way overload buildup in your system. So that signal can be used to adapt every limit your APIs API calls essentially closer to the entry point. So that's the adaptive limits feature and the third one is quota limits. So essentially working with third party APIs or even within teams you might have quotas like some team might have given like microservices teams. They give each other quotas like so that they so that one team doesn't overwhelm one particular service. So it could be either between services within a larger organization or it could be some third party like DynamoDB or OpenAI or GitHub. So you want to work with those limits and again ensure prioritization of requests. So you want to make sure you use that quota in a prioritized fashion. And the last one is like telemetry. You get some telemetry with STU as well but the telemetry that you can get with Aperture is even more surgical like because it's tied to the ego-based classification rules. So you can actually even go deep into your payload to get fine-grained matrix and even get like latency histograms. So if you want to get like percentile kind of matrix so it's geared towards those kinds of use cases. So next let's look at how Istio plus Aperture the entire stack works together. So Aperture agent sits next to the on-vibroxy so it's like a sidecar to the sidecar and essentially the way it integrates is through authorization API I talked about earlier. We're going to look at it in more detail in the next slide. Essentially a couple of integration points. One is the auth API which lets us intercept calls coming into on-vibroxy. So Aperture can be installed at various vantage points that where on-vibroxy or Istio proxy is also installed. So at the ingress, the Aperture agent can help you achieve things like rate limits. So any call that goes into on-vibroxy so it gets forward to Aperture agent and it lets you implement things like rate limits. Like within services it lets you build functionality such as adaptive limits. So again the same insertion modes running next to your on-vibroxy and in this example we are looking at a database. So a typical bottleneck in a distributed app is usually a big data system or a database. So taking saturation signals from these bottleneck services and using that to adaptively limit traffic going into your service. And then also prioritizing that traffic. And now there's another use case of Aperture and Istio service mesh is on the ingress. Like calls going outside your cluster to some third party. For instance, this could be like open AI and you want to work backwards from that limit. You want to model that limit internally using token buckets. And you want to limit, you want to be a good citizen. You want to stay within limits. And once you're close to the limits you want to prioritize traffic. So you want to use that open AI quota in an intelligent fashion so that high priority requests get to use it first. So yeah, so these are the new set of capabilities we are adding on top of your existing Istio service mesh. So let's look at the insertion in a little bit more detail. So this is the on-voy proxy, Istio proxy. So it gets requests from outside that's step one. And the way Auth API works is, so by the way, this Auth API which was implemented by the OPA team for authorization kind of use cases. But for Aperture we are using the same technique the same glue for flow control. So the way it works is any time request comes in the on-voy proxy forwards it to Aperture agent for a yes or no answer either to admit it or to drop it. And plus also we attach some data for telemetry purposes. So if it's a no, then on-voy proxy won't release forward the request to the service. It will just send the response back with a 503 server BZ. And the response can actually be configured through the Aperture agent, the policy in Aperture agent. But by default, it's like a 503 or if it's a rate limit, it could be a 403. And the second point of integration is the access log. So this is needed for telemetry. A bunch of telemetry that we're doing like looking at things like latencies, even estimating tokens dynamically and things like that. So we get this access log stream from on-voy, which is observability lag. So essentially completing the feedback loop here. And yeah, so if the call gets accepted it's just forwarded to the service as usual. And then the response goes back and modified back to the original caller. So that's basically the insertion. And the configuration, the way configuration looks like is like we use this CRD called on-voy filter. So on-voy filter is a construct in Istio configuration which lets you write these kind of interceptor rules that get inserted into the on-voy proxy. So this on-voy filter basically describes couple of things, it's the external LOTC with the address to the local agent that's running next to the on-voy and also the access logging address. And this on-voy filter can be applied selectively. It doesn't need to be pervasive. So you can do it per visibly. If it's a demon set, you can do it per visibly or if you wanna do it more surgically then you can even have a workload selector. So workload selector is in Istio that you be more surgical with this insertion. Okay, let's move forward. Let's talk a little bit about Aperture's architecture next. So at the heart of Aperture is a control loop. So the way it works is the Aperture agents they have an inbuilt open telemetry collector which lets it gather metrics both locally and from external technologies like databases and such. So that telemetry is written to Prometheus and then there's another component called Aperture controller which could either be hosted by you or it could be hosted in our commercial solution called Aperture cloud. So everything above Aperture agent is hosted in Aperture cloud or it could all be, if you're using the open source origin it could all be hosted by you. So the job of Aperture controller is to run these policy circuits which are evaluated periodically. So it will periodically request the result of those metrics using PromQL queries and then run a signal processing circuit which results in some adjustments which are written down into XCD. So XCD is used both for sending these adjustments down to Aperture agents and also for propagating configurations. So if you install new policies so the same mechanism is used to populate those policies in Aperture agent. So far so good. So yeah, so Aperture control loop is kind of the heart of the system and let's move forward to maybe more functionality inside the Aperture agent. So the first functionality I want to talk about is distributed token buckets. So these distributed counters essentially we have this distributed hash table. So Aperture agents form a peer to peer group between agents belonging to each agent group. So we form this kind of peer to peer network which is being used for maintaining these distributed counters. So these are used for use cases such as rate limits. You want to limit incoming requests by users, let's say. So there will be a bucket for each user and it's a token bucket. So it's like unlike window counters this is like pretty accurate and very smooth. So the same technology is also used for quota limits. So if you're working with third party APIs the same kind of token bucket mechanism. So there's a leader for each token bucket. So there's a look up on the leader. It could either be a synchronous call or a lazy sync kind of a call. So this is basically how counters work in Aperture. Another technology typically used by some users as Redis. So this is more high performance implementation than Redis. So Redis is a single service which can become a bottleneck. So this is more of a peer to peer kind of architecture which is more scalable and it dynamically shards the keys. So if an agent goes away it will dynamically shard the keys or if a new agent comes it will dynamically resharp the keys based on that. So that's the distributed token bucket. So moving on, the next technology I want to talk about in Aperture agent is the scheduler. So just like operating systems have schedulers for processes you have nice values and such. So we have this kind of this network based scheduler and the algorithm that we have here is called weighted fair queuing scheduler which is used in packet switching networks and also used here. So each request essentially gets a priority or a weight and based on that weight this algorithm makes sure that the utilization is essentially in the ratio of these weights. So if you say, a user's get a priority of 200, trial users get a priority of 100. So this means there's a one is to two ratio. So paid users get double the priority. So if they're both seeing similar kind of request rates then on average paid users will see twice the acceptance rate. And this comes into play when you're under overload or when you're close to quota limits. So the scheduling comes into play and maybe you want to de-prioritize something like a background task or maybe free tier users in this case you want to de-prioritize. So it's like 10 times less or 20 times less than paid users. So those requests will be the first ones to get dropped or the first ones to be at the tail. They'll be essentially at the tail of the queue. So requests coming in, getting classified into these different categories. And then the way they get de-queued is based on this weighted for your team's schedule. And by the way, the way requests get dropped is based on the timeout that is configured. So when you do this integration, this OTC integration, so you define a timeout value. So this is the amount of time you're willing to queue the request within Aperture. So far so good, let's move on. Pretty much covered like the core tech pieces in Aperture. So let's move on and see how Aperture works in practice. So I'm going to show a demo application where Aperture is running in action. So the test setup that we have, so we have a PostgreSQL database and we have a service which is making a request to PostgreSQL whenever there's an incoming request. It's an HTTP service. So whenever there's an HTTP request and it's all glued together through STO. There's a K6 based, K6 is a project which lets you simulate users and traffic and such. So K6 is making these requests, making requests to the service and going through on Y. And the traffic mix that we have is like there are desk users and subscribers. Subscribers are higher priority and the breakup of traffic is 50-50, like 50% subscribers, 50% guests. And the traffic pattern that we have is like it's like a sinusoid or more of a square wave where it starts with 10 users for one minute and then it goes to 100 users. So once it goes to 100 users, then this service makes a lot of API calls to PostgreSQL and PostgreSQL queue starts to fill up. So we are monitoring the number of active connections in PostgreSQL. So if that active connections goes near 100%, then there would be queue build up here. And the latency will spike up for all the users. So we want to make sure that PostgreSQL connection usage stays within bounds. So that's why we are monitoring PostgreSQL using a Perture agent. So it's got an hotel pipeline which is scraping matrix from PostgreSQL. And then using this in an Angolan to do this dynamic throttling of API requests coming into the service. And then also prioritizing traffic. So here is the policy which does all this. So the first part of the policy is defining an hotel pipeline. So it's defining this address of PostgreSQL instance and the collection interval. So this is how we get matrix into our system into the Prometheus database. The second part of the policy is basically defining criteria, the algorithm for throttling. So a set point of 40 is defined. This means that 40% of connections are used in PostgreSQL. We want to start throttling. And the throttling policy is very simple. It's a progressive, additive increase, additive decrease kind of a policy we have. So we do a 20% decrease in load any time we detect an overload. And if let's say we are not in overload, like that means less than 40% usage of connections, then we do a 5% increase progressively every 10 seconds. So very simple policy, but it works pretty well because connection usage will have what happened there? Can you guys see, still see my screen? I'll just tell you. So yeah, this is the policy. And yeah, so that's algorithm 5% decrease every 10 seconds. So down here is selectors which identify requests for scheduling. So it's like a workload selector. Actually, why not just workload selectors also looking into traffic labels. So basically identifying the service where we want the scheduler to act. And down here we have defined priorities. So we're saying guest users have 50 priority and subscribers have 250. So let's see how the metrics look like once we have the policy in place. So on the left of the graph you see before the policy was deployed. So we saw, so this is like before aperture. So nearly 100% connection usage when the traffic ramped up. And like after aperture is installed, the overload is prevented and we stopped seeing any errors. Like earlier we were seeing too many clients already on the service. So we stopped seeing this error but also we started getting this prioritization of traffic just shown in the graphs here. So incoming token rate is higher than accepted. This means there's some queue build up happening because of throttling. And we see higher acceptance rate for subscribers versus guest users and conversely, higher rejected rate for guest users and loads rejected rate for subscribers. So yeah, so this is the priorities in action. So roughly five times acceptance rate for subscribers versus guest users. So this number is almost five times. So this is the rejected fear queuing algorithm in action. So that is it guys. So let's move on to Q&A. So I would encourage you guys to check out the aperture project on GitHub. Even try out the aperture cloud which is the commercial solution which does controller as a managed service and also gives you rich analytics. So I'm open to questions now. All right, so I'm looking at the Q&A section. So there's a question which is from Mitch Connors. He's saying, is it possible to detect overload without dedicated matrix like Postgres queue perhaps using Istio layer seven matrix? Absolutely. So we do have policies which can actually look at things like latencies. We haven't used Istio's matrix but there's nothing stopping us from doing that. Like you can write, because Istio matrix are exposed as Prometheus compliant matrix. You can write an hotel pipeline to scrape those matrix but you can also use apertures built in flux meter. There's a component called flux meter which lets you create histograms which are even more detailed than Istio's layer seven matrix. You can go down into like graph QL payload and looking at labels inside that to get these layer seven kind of latency matrix, percentile matrix and so on. So that was the only question. I'll check the chat as well. Any other questions anyone? That's all the slides I had. So maybe I can show you guys a little bit more if you guys are interested like how circuit based policies look like. So essentially we're designing these signal processing graphs but by the way we have made it pretty simple like for anyone to use this product. So you don't really have to design this. We have these high level blueprints that you can use out of the box just fill in like high level YAML and get started. So we have these basically these kind of recipes. So under load scheduling. So for instance, someone was asking, can we do we need specialized matrix like Postgres SQL? So you can even look at things like latencies and use that as a feedback for throttling requests. This is like a quick view of how the entire stack works from classification stage to scheduling. And this example doesn't show STO. So essentially a service could be directly talking to a virtual agent. It could be doing through STO. It could be doing through con, but essentially the same kind of interface works for all these technologies. And so the interesting thing we have done is with this Rego. So Rego is typically used for authorization use cases but we use Rego over here for more than that for flow control observability and so on. So the same set of labels can be used both for observability and control. So that's an interesting innovation we did here. And yeah. And this is like the architecture of a aperture in open source. So just the stuff we talked about. So there's a control circuit running in the controller which is periodically computed and the decisions are propagated through HCD. So I'll just go back to the Q&A section if there are more questions for me. Okay, no questions till now. All right, I think we can end the session if there are no more questions.