 Right. So I think, you know, this, the concept of hedge request seems quite fancy when we talk about, when we look at it from a theoretical terms, right? Let's see, are there any practical examples of it that we can probably refer to? Right. So I think after the paper came out, right, a lot of libraries and, you know, a lot of companies have gone ahead and kind of implemented this in the library in some of the open source applications that we may use, right? One is the Twitter has this RBC and a cross service framework called Phinical. Phinical actually exposes this concept of hedge request. They call it backup request, but they are actually trying to simulate hedge request, right? Then Envy Proxy, which is again a very common service proxy and kind of also plugs into the Istio service mesh network, right? So they also have this concept of hedge request, which you can implement at the proxy level and control via the control plane, right? Linkadee, which is again another popular service mesh framework, right? So I couldn't find it in their, the current, the current version that they have publicly available, but there are feature requests raised on Linkadee to kind of add the concept of hedge request in Linkadee, right? And Spring's reactor library has configurations and APIs through which you can model hedge requests in your HTTP clients, okay? Right. So the next, the next topic that the server kind of moves on, you know, before we get into the second technique called tide request, right? Is that a lot of variability also comes in from the time that a request wastes into a server's queue, right? Now it could be a TCP server's queue or it could be a HTTP web server's queue or it could be anything, right? And they kind of refer to another paper by Medzene Marker and which kind of gives a lot of empirical data, right? That which, you know, the claim that they make is allowing a client to choose between two servers based on queue lens at in queue time exponentially improves the load balancing performance over a uniform random scheme, right? So what essentially it says is that why before a client is before a client is kind of creating or kind of submitting the request to a downstream system and it has an option to kind of choose two servers. If it, if we allow the client to choose a server which has less number of, less number of requests in its queue, right? Using this strategy, the overall latency that you will see at a system level, there will be as exponential performance improvement that you can see, right? I mean, I'll encourage people to go ahead and read this paper, right? And now, you know, if any of you have configured these software-based load balancers like ALB from AWS or ELB, right? So they give you the capability of kind of configuring strategies of, you know, least number of requests pending for response, right? So if any of you kind of looked at that strategy, right? That kind of tries to mimic that this particular strategy that this paper is talking about, right? So does this kind of kind of aligning with the answer that I was trying to give earlier, right? So before choking the network, you're also kind of trying to see that which particular the number of servers on which your requests are getting choked, right? So you might want to teach other servers on the system, trying to reduce the capability of the network choking on particular servers. Yeah. So I was kind of talking in terms of only a head to request part, not really on the second one. Right, right, right. So I think the one, one question I'll probably try to answer over here is, you know, this question I had kind of, someone had asked this question in the previous session that I did on this, right? That if we are what are we saying that the client should be able to probe the queue of the downstream server? Not necessarily, right? So again, I kind of tend to bring back the point that we are talking about law of large numbers, right? We are talking about systems that are operating at 1000 requests per second, right? Essentially, what we can say is that the number of requests which a client, the number of requests that a server is getting from a client, it can be equally distributed among all the clients that are kind of sending the request to the server, right? So the client need not wait for the server queue to detect the length of the server queue, right? The client can also kind of check that the amount of requests that this client has directed to this particular server. And kind of it is waiting for the number of responses from that particular server. And this is like the number of the ELB integration that I was talking about earlier. Okay. Right, so this kind of sets the context of the second technique, which is tired requests, right? So what it says is that instead of choosing one server, why can't we enqueue a request in multiple servers, right? And when we're enqueuing the request to a server, you also send the identity of the other server as part of the request, right? This basically the keyword of tying comes into the picture, right? And as soon as the server kind of picks up the request from this processing queue, it can send a cancellation request to the other server, to the other server to which the request was sent. That's where the concept of tying comes into the picture, right? Now some of the corner cases that come out of it naturally is that you know, what if both the servers pick up the request while the cancellation messages are in transit or there is a network delay, right? So and this can be something that is very common under low traffic scenario when your server queues are not empty, when the server queues are empty or they're not getting choked, right? So in this particular scenario, the clients can introduce a small delay of two times the average network message delay, right? Rather than one millisecond or maybe it could be higher in a in a slower data center. But before we send the second tied request, you can introduce a delay in your in the second request that you enqueue to a to a second server to a second server after sending the first request. Okay, so this kind of and it shows that you're kind of not choking the number of requests as well. So they want to talk about again, they give empirical data of how, you know, this technique they applied on a table service benchmark where they're trying to run essentially is again an open source benchmark when you're talking about system performance and all right, they're trying to sort petabytes of data, largely integers and sorted strings, right? So what they mention is that you know, that at an at a server is completely idle and there's no other workload coming on to the system. In that particular scenario, a low test running, which is using a tight request, the 99.9 percentile latency kind of comes down to almost 61 milliseconds, right? There is no head and no tight, no tight requests are running, right? On another benchmark when there's a concurrent service in that particular time, the tight request have helped to reduce the 99.9 percentile latency from 160 milliseconds to roughly 108 milliseconds, which is a percent of improvement in the overall system performance, right? There's a question from Chris. Say API get me get a one sensor request to API service one and the API service one sensor request to server API service one. Now all these requests need to be canceled depending on where the time will happen. Say API gateways and request to API service one API service one sends request to sub service one. Okay. Now all these requests need to be cancelled depending on where the time will start. Correct. So what we're talking about is, you know, these techniques are not something which you are apply at only at the top service level or you apply it at only one layer of services. These techniques are generic. So what we're talking about is they can be applied at any level or they can be applied at any layer in your, in your service graph, essentially, right? And it is also makes sense that our reason being is that what is the end objective, right? The end objective is to optimize the latency of the top level request, which is entering a service graph, right? And that essentially is dependent on the performance of each and every downstream service, right? So the top level latency will only be controlled if you're able to control the latency at each and every layer of your service graph. Okay. So case your observation is correct. So I mean, it will have to be cancelled at all the layers.