 Right. So talking about, you know, we are talking about these techniques are applicable on largely read only and loosely consistent data sets, right? So some of the examples that come to mind is that, you know, so I think the first example is kind of quoted by the paper themselves that you can have a spell correction service that Google has, right? So obviously, their spelling dictionary is something which does not get refreshed every hour or maybe even like it kind of gets refreshed probably once a day, right? So you're kind of hitting that spelling correction service at thousands of millions of requests a second, right? Another example that comes to my mind is essentially is that even the contacts lookup service it could be, right? So you're typing an email, right? You start typing the first initial few letters of a email, like, of a user's email ID and you kind of start seeing the dropdown coming in, right? So your contacts don't get refreshed every day or every hour or every half an hour, right? But you're still kind of doing a lot of many lookups on those, right? So essentially write once and read in millions kind of a pattern, right? Now, the first, you know, the first class of techniques you're talking about is that, you know, one we said that, you know, we can either replicate your request or we can kind of replicate your data, right? So the next two techniques that we are going to talk about would be more around how do you replicate your request rather than replicating your data? When we talk about the Corsair Grain or the techniques that apply at a larger system level, we'll be talking about how replication of data can help, right? So let's jump into the two techniques, right? The hedge request and the tier and the tile request, right? So what does hedge request means, right? So what hedge request means is that, you know, since we know that we are sending requests to multiple microservices, the downstream services in a fan out fashion, right? Do we need to send the request to a single server, right? I mean, that's a, that's a, that's a fundamental question that this technique forces us to ask, right? Why can't we issue the same request to multiple servers and kind of pick up the request, pick up the response that comes the quickest in all the requests that we have fired, right? So what the paper asserts is that, you know, send a request to a most appropriate replica and in case we don't get a response within a threshold, we issue the request to another replica, okay? So there are two points that I will come back in a second on that. One is the definition of appropriate replica and the definition of a threshold, right? In case we get the response from the server within the threshold, we send a cancellation request to the other servers in the path, essentially, okay? What the paper does not explain is what do they mean by appropriate, which I feel is okay because I mean, that kind of depends on context to context or kind of depends on application to application, right? It could be, it could be the service which is kind of holding the particular shard or it could be a server which is holding the key or it could be the, it could be a server which is in the same availability zone or it could be a server within the same region as well, right? And when we say that no, if a response is not received within a threshold, we issue the request to another replica, right? So that's where the second point comes over here that, you know, it is very natural that, you know, if you're trying to issue multiple requests for processing a single top level request, you're kind of amplifying the amount of traffic that you propagate to your downstream systems by a large, large degree, right? So over here, one of the approaches that they mentioned is that, you know, rather than blindly creating parallel, blindly firing multiple copies of your request, you send a request to the most appropriate replica and wait for the 95th percentile response latency for this particular class of requests from the service to which you're firing, right? In case you don't get a response in that particular threshold, you fire the second request or there may be multiple requests to the secondary or tertiary replica essentially, right? So mathematically, this technique is telling us is that, you know, we are increasing, we are kind of doubling or tripling the number of requests only for five percent of the top level request, right? However, the results of this technique can be significant, right? So that again, so they have given empirical data in the paper, which I'll come to in the next slide, right? So this technique can bring down the, the top level response latency is by a big, big degree, right? And one point to be cautious about here is that, you know, this technique kind of mitigates the effect of external interferences only, right? It does not, this does not help you mitigate the request or the same request bonus, right? So what this means is that if I'm firing a request, which is, which has to fetch 1000 records, right? And the latency to fetch these 1000 records is something which is going to stay constant, assuming no external interferences are there, right? So this technique is not going to help you kind of mitigating that, right? So that's something which is very critical to understand when you're talking about hedge requests, okay? Right? So again, so they go on to show some empirical data from a Google, Google's a big table benchmark, right? Where they say is that they try to retrieve 1000 keys stored across 100 different servers, right? And sending a hedging request after a 10 millisecond delay reduces the 99.9 percentile latency for retrieving all 1000 values from 1.8 seconds to 74 milliseconds while sending just 2% of more requests, right? So again, assuming they know at a Google scale when you do a benchmark, again, you're talking about, you know, law of large numbers kind of comes into play, right? These numbers is something which you can probably trust, right? And some vulnerability that definitely there in this technique, right, that multiple servers might end up executing the same request, right? So it could lead to a lot more internet computation. And so in this particular scenario, the 95th percentile technique that we talked about can help you, can help reduce the impact and further addiction can come in by talking about more aggressive cancellation techniques or strategies that can come in, right? But from given that the benchmark and the empirical data that they have given seems like this, even at operating at a 95th of the 98th percentile latency and sending a cancellation request can result into a fairly good amount of performance improvements, right? In their example, you know, 74 milliseconds to 1.8 seconds is automatically a order of like 20 times, right? That's a significant improvement that you see on the top level response frequencies, right? I think I take a pause over here. Any questions from the audience? So I have one question here. Okay. So when I think about this, you said like this is not taking record actually circumstances, right? Let's imagine a case where the network. So what it says that it takes into, it kind of mitigates the effect of external interferences. What it will not help you is to mitigate the impact of the latency of the request context. Essentially, as example, if you have to fetch 1000 records, the time it takes to fetch 1000 records will remain same, right? So this technique will not help you mitigate that slowness, okay? So my point is like, when you kind of building systems, right? A lot of times what happens is the things that we do to kind of improve the system stability and performance during the regular times, but actually severely affect the system behavior in case of failures and down times. Okay. So I want to understand what could be the cascading effects of actually doing this. Okay. Imagine there was a network slowness caused by some incident. So what happens is like every request starts becoming slow. Now in that case, what happens is now since we are sending second request because the request is not coming back in the given time. So basically doing is like we're saying 2x requests to the network. So basically overloaded the network by a factor of 2. Network is already not in good shape and we're kind of overloading it. Okay. That makes actually cascading effects and actually bring down the whole system down. Right. Right. So, so one thing over here. So I think that's, I think that answer will, I think that will get answered in the next technique that the concept of tight request on it. Right. So what it mentions is that, you know, the second request you need not fire immediately. You might kind of give a small pause over there as well. Right. Essentially. So now I just just think about it for a while. Right. So what I'm talking about is that, you know, I'm talking about a very, very large scale when I'm talking about thousands of servers. Right. And I'm talking about a network which is operating across the thousands of servers. Right. So even these small, small delays that you're adding in your system can actually add up to a lot of elevation happening at a data center scale.