 Right, so let's move on to the second class of techniques that was talks about cross-request long-term adaptations, right? So these are essentially techniques which operate at a holistic level and we are not trying to optimize each and every single. Largely, these techniques are more aligned with how you partition your data rather than how you kind of manage your service invocation or service graphs, right? So that's what I see. If you guys remember in the previous slide I had mentioned, there is a question of request and replica, right? So the first class of techniques were the request, right? These class of techniques focus around how you replicate your data, okay? Right, so just to set some context, right? So what are the possible causes of having, you know, what are possible causes of variations or latency performances at a much larger holistic or the system level, right? They could be because of load imbalance, right? Now, load imbalance could happen because your data partitions are unbalanced, right? So think of this with that coming at a sharded kind of an architecture. There could be a shard which becomes quite heavy because I have added a lot more keys and data boards, right? So automatically the retrieval and the query time on that particular shard will be much higher than the other shards, right? So the centered around data distribution of placement techniques, right? Second could be your service time distribution, right? It could be that, you know, you are having a downtime in one particular service, right? Or you have one particular service totally degrade for a much larger duration. Then for that whole duration, all of your cascading systems will start seeing a lot more variations and variability in the system, right? So we're talking about the first class of problems, right? We're talking about unbalanced data partitions, right? Now, talking about partitioning techniques, right? So one fundamental flaw that can come in any partitioning technique could be that, you know, do all the partitions have the same cost, right? Or when we're talking about cost, we're talking about the amount of data or the amount of request that a partition can probably serve, right? We're talking about the volume and the processing that we are doing in a single partition, right? Would all the partitions have equal cost? And if they have an equal cost, having a static assignment of a data partition to a single machine would make sense, right? However, that assumption is not true, right? Now, this assumption will not be true even at a smaller scale, right? I'm not, we're not talking about Google scale always. This partition, you know, this unequal cost of partition, something which is applicable even at a much lower scale, right? So what can happen is that, you know, the partition, the physical machine which is hosting your partition can actually serve a lot more degradation, right? Because that is not predictable. And the second example can be that your outlier of hot items can come, right? For instance, you know, let's assume that you're talking about maybe, you know, let's assume a World Cup is going on, right? Or the cricket match is going on, right? Suddenly, you know, let's assume Dhoni kind of hits a six, right? Sixer, automatically you will see the number of search queries that come for that particular match or IPL will see a sudden spike, right? So you have created a hot partition on your system, right? So I'm not sure if any one of you have used DynamoDB or any of these managed non-religional databases, you will see that, you know, they kind of penalize you very heavily in terms of cost and throttling if you end up creating a hot partitions in your data, right? So the fundamental assumption of all partitions we will not scale at even a moderately sized system, right? So what can we do to kind of alleviate these kind of problems, right? The concept that they introduce is called micro-partitioning. What they say is that instead of looking at your data and creating very coarse or larger sized data terms, you kind of create a lot many micro-sized terms, right? So essentially what you do is you try to create more data partitions and a number of machines that you have for that storing that amount of data, essentially. They go on to give a reference of how they manage it in Bigtable, right? That what they say is that every machine within Bigtable or every node that they have in the Bigtable cluster can store anywhere between 10 to 100 micro-partitions essentially, right? And this gives them the capability to do a lot more dynamic assignment of partitions to the nodes and they can kind of do a lot more load balancing across these partitions, right? So again, they go on to give some empirical and kind of mathematical data that assuming that, you know, you have 20 partitions on a machine, right? That means that the system can set the load in 5% increments and in about 120 years of the time it could be able to take for the system to recover. So kind of move the, essentially, even if I lose one partition, right? Or then even if I lose one node in my cluster, I'll essentially lose only 5% of my capacity, essentially. And having this technique helps a lot more in terms of recovery rate improvement as well, okay? And the paper kind of ends up referring to this very nice paper on a chamber called God, right? Which is a scalable peer-to-peer lookup service, right? So just a quick question. I mean, does this sound familiar? I mean, this concept of having the concept of mapping multiple kind of multiple partitions to nodes and all. I mean, any guesses from the audience? Okay, so this is a similar technique which is also kind of proposed by the DynamoDB paper, right? The DynamoDB paper which came back in oven seven from Amazon, right? So they talk about similar concept where you map multiple virtual nodes on your physical nodes, right? And you kind of assign, you kind of, that allows you to a lot more balance and that allows you a lot more, better request routing strategies and replication strategies as well, right? And the common particular tailored scale paper in the DynamoDB paper is that both of them refer to the same paper by Ian Stryker et al, right? The Codd paper where there's the first paper which I'll introduce as concept of micro partitions and mapping multiple partitions to physical nodes essentially, right? So coming to practical examples, right? So if anyone of you have used Cassandra, right? So one of the primary keys, one of the primary things to be aware of when you're designing a data model for Cassandra is how do you model your partition key, right? Because the partition key essentially defines the distribution of your data in the nodes in your, in the ring or in the Cassandra cluster that you have, right? So if your partition key is not designed well, you will end up creating a lot many hot modes, right? A similar concept, I mean similar kind of techniques and similar kind of concepts are defined by edge base where you have this concept of row key, right? Depending on how well balanced and distributed your row key is, right? These data partitions will be mapped to the data nodes, right? Similarly, Riyak, which is another non-relational key value database, right? But it again has a concept of partition keys, okay? Right, any questions on this one? Right, you see we're right, sounds like Cassandra, correct? So Cassandra was actually, I think Cassandra kind of took a lot of inspiration from the DynamoDB paper essentially, right? So it's a lot of concepts in Cassandra what the DynamoDB paper introduced. So, any other questions? Anand? I don't think I have any questions, I think we can move on. Sure. So we have talked, we have spoken about partitions and we have spoken about mapping of data to the physical nodes and all, right? So the next, the next, obviously, you know, the extension of this technique is, can we do selective replication, right? What essentially it means is that if we are able to detect or probably predict that some partitions are getting becoming hot or you're seeing a lot more traffic coming onto those partitions, can we create additional replicas of these hot items at runtime? Or maybe distribute the load among more replicas by creating more copies of those partitions so that more number of nodes are able to participate in terms of serving the request for those particular, of those particular data shards or data partitions, right? This seems fairly natural and kind of intuitive. However, at least in my experience and I have been trying to try, you know, I was trying to try to look up that, are there any existing systems which allow you to dynamically create replicas? So if you look at example of HDFS which is again a very large scale file system, right? So you have this concept of replication factor. Although replication factor is something that is constant in HDFS. So you define that I have a replication factor of two or three, which is your HDFS automatically create two copies of data whenever you do a write or you create a new object in the file system, right? However, rebalancing something which has to be humanly triggered, right? HDFS will not do dynamic rebalancing or dynamic replica creation for you, right? Similarly, Cassandra does auto rebalancing or kind of distributes the micro partitions and kind of change the replication strategy but that typically happens when you change the topology, right? When I say change the topology that essentially means is either you're adding more nodes in your Cassandra cluster or you're removing a node from the Cassandra cluster, right? So I personally haven't come across any piece of technology which allows which does dynamic replica creation or dynamic partition creation on its own, right? Maybe, you know, Google being the engineering powerhouse they have, they might have designed systems for that but at least on the open source systems in my knowledge I haven't come across any system which does this dynamic replication and dynamic replica creation for data nodes, for data partitions, right? So I think we have almost to the end of the paper, right? So one of the other kind of technique that they propose is that, you know can we have latency induced probation techniques, right? So what they say is that, you know can we have intermediate layers or intermediate servers which are kind of keeping a watch on the latency distribution in my whole fleet of servers, right? So essentially you're kind of watching how each and every server is performing in my system, right? And you kind of put a, in case a server starts slowing down you kind of put it in the probation, right? What essentially probation means is that you take this server out of rotation, right? And instead of passing the regular traffic that was going on to this server you start passing only a limited amount of shadow requests to the server and just see whether this server has recovered from the performance degradation it saw or not, right? And in case we see that the server has recovered you bring the server back into rotation, right? This is the concept of probation that they mentioned, right? So this is slightly analogous to the concept of circuit breakers if any one of you have used, right? Although that's also a different class of problems but it's slightly analogous to what that circuit breaking does, right? So what you're doing here is that you see that the server has started degrading instead of sending the regular traffic to it you start sending copies of shadow requests to it like the statistics in case the server has recovered you bring that node back into rotation if the performance has improved, right? So again, so this is one of the techniques which is kind of very counter intuitive, right? The question that immediately comes to mind is that when the overall system performance is degrading we are removing a server from rotation and trying to improve the latency. This is again something which is very, very counter intuitive, right? So even I was not able to wrap my head around this when I was ready for the first time, right? But again, if we talk about operating at a scale that this paper proposes, right? We're talking about thousands of requests, a second if not millions of requests a second this kind of slightly seems to be a Bible technique at a very, very large scale, right? I think the next kind of, you know the next topic that the paper proposes is that, you know for large scale information retrieval systems, you know latency is a key quality metric, right? And they kind of rank it slightly over the quality of the results that your system responded, right? So what essentially they say is that, you know retrieving good results quickly is can be better than returning best results slowly, right? So they bring the concept of good enough to what they say is that if I have, if I've scanned through a good amount of search purpose in my system, right? And then I need not wait for the rest of the scan for scan of the rest of the purpose, right? So essentially we're trying to bring a tradeoff between completeness of a request versus the responsiveness of a request, right? So what they're saying is that, you know don't wait for your request processing part to complete for all the data science, right? If you feel that a significant amount of data charts have been hit, which are sufficient to give a reasonable result to the user, you respond back immediately rather than waiting for the user to come back, right? That's where they bring in the concept of canary request, right? Which what they say is that canary request helped them to find out a lot of untested code parts which can end up crashing or degrading a system at a time, right? Because you're talking about Google scale where the number of parts in your code or in your application can be probably thousands if not millions, right? So what they say is that whenever a new kind of a request comes in, you forward it only to a limited set of leave servers and if all the servers respond back successfully, you kind of start sending that request to all the feeds underneath, right? And, I mean, so canary request and canary deployments is something which has become quite popular in the last few years, right? So Istio is another practice, is an example which kind of allows you to configure canary request in your service measures easily, right? So AWS's API gateway also has this concept of creating canary deployments where any new deployment that happens it will kind of probe and test it for initial few requests in case you get a success response, you're gonna start bringing it into rotation, right? Netflix's Azure API gateway also has this concept of implementing canary request by implementing custom filters and custom routers in your application, right? Right, I think the next section that the paper moves on is to claiming that the right operations are not heavily latency sensitive, right? So again, this is slightly counterintuitive, but if you kind of dig deeper into this, you realize is that, you know, when we're talking about right operation, we are not talking about touching as many number of services as many number of replicas that you might be touching when you're talking about read-only or read-only or kind of read-intensive or what I would say, more interactive kind of systems, right? So what it says is that even if you're talking about forum-based right techniques, you know, algorithms use either paxers or you might be using raft, which is another consensus algorithm which came out recently, right? These algorithms typically talk about touching three to five replicas, right? So we are not talking about touching hundreds of thousands of servers, right? Essentially you're touching limited number of replicas, right? So variability effect will be much lower than what you will see in a read-seed path, essentially, okay? And I think the paper will talk about the way trends which I think we can probably skip in the interest of time. And how are we going on time? I think probably not. I think Zainab can go on for a few minutes or so, and I think even- There's a competition, right? How are we going on time? There's a competition, right? Yeah, we can go on for another time. I think this is a conclusive speech. Sure. So I think what we have established is that variability is something inevitable, right? So we have to kind of live with it, right? So we have to come up with techniques which allow us to mitigate the impact of variability, right? And a lot of these tail tolerant techniques that we have spoken about are, you know, they allow you, first of all, to kind of leverage the capacity that we deploy for redundancy and fault tolerance. And they can definitely drive a better resource utilization while giving good responsiveness. And lastly, you know, a lot of these techniques that we spoke about are very, very common design patterns, right? And they can be baked into, easily into your client libraries or server-side libraries. That's why if you see that, you know, common libraries that Istio, Envi, Proxy, Spring Reactor have been able to kind of bake it in a very generic sense, right? We need not write, we need not write this boilerplate code for every application that we build, right? Right, so these are the references, right? Guys, feel free to go through them. I strongly encourage reading some of these articles, right? And while I was kind of, you know, reading this paper, I'm kind of working on this particular presentation, I figured that Microsoft has also come up with another article with its kind of references to the same tailored latency paper, right? Which talks about managing tailored latency in a data center-scale file system under production constraints, right? This is kind of referring to the Azure file system that they give in the Azure Cloud, okay? Great, I think that concludes the presentation from my side. Any questions or any? So I don't see any questions. I just want to talk about Kassandra and the nightlife experience, but I think we'll see if there are any questions and then we'll move on to that, okay? There's no questions currently, Anand. Yeah, so there are no questions, I have something. So one thing that I get to see, these are things that people do at Google scale, right? I mean, so what the interesting thing is like, Pish, if you have any of these experience that you can implement in this kind of systems at your work or anyone else can use these kind of systems at work. Right, so Anand, to be honest, right? So as I mentioned, right? So at least at capillary as our scale is increasing, right? So we were around two to three years back, we were at a time when we were processing about 80 to 100 billion requests a day, right? Now we have reached a point where we are touching close to half billion requests a day, right? Is when some of these techniques have started coming in, started kind of helping us, right? So kind of configuring and tuning all of our anti-virus scans and batch scans that run in the night, that's something which we have tuned, right? Second is we are already moving on to the service mesh, right? In the service mesh at the NY proxy level, we have started implementing hedging requests, but again, that's for a very small experimental purpose, right? However, in movie, we did have this concept of request hedging, right? So we were not using any external libraries, like as a company, there was a standard vehicle library that was there, that we used to kind of fan out a request to multiple service endpoints. And every service endpoint was actually a collection of multiple servers kind of powering the same system. And so we were talking about serving an app, add request in under the same system.