 Hello everybody. Good afternoon. This is Sid Chaudhary here from Yugabhite. And pleased to be talking to you regarding the topic of my talk. Taming cross-region latency in geo-distributed SQL databases. So let's dive right in. The world of databases can be divided into two parts. First are the OLAP databases, which are meant for analytical processing. And the second is the transactional processing or OLTP databases. There are some very important differences between those two. Let's look at them one by one. OLAP databases are about write once and read frequently. They do not have too many concurrent sessions. As a result, they provide weaker acid guarantees. They do support long-running and more ad hoc interactive style queries. Basically, the queries are not known beforehand. They involve scanning through large amounts of data, large table scans. And the amount of data that is stored behind these databases is typically in petabed scale. On the contrary, OLTP databases are significantly different. We read and write frequently from the OLTP databases because they are the source of truth databases for serving our applications. These applications create many concurrent sessions and they expect a certain amount of strictness around the data that is served back. As a result, the databases have to provide stricter acid guarantees, especially in the isolation level that is supported. Typically, it has to be serializable isolation. And the queries that are served have to be served in a very low latency manner, so blazing fast queries. But the way we do so is the query patterns are very much well-known. The queries themselves are also very much well-known and they are coded into the application logic. These queries usually involve point reads and short-range scans. And the data that is stored by these databases is typically in terabyte scale. It can be in hundreds of terabytes as well. So, for today's talk, we are going to focus on the OLTP category and we'll dive deeper into that. That brings to the topic of today's talk, which is about distributed SQL. What is distributed SQL? In short, it is a revolutionary OLTP database architecture where we keep the SQL semantics of relational data modeling and strict asset transactions intact, but add three additional capabilities that are typically absent in traditional relational databases. First is ultra-resilience, the ability to natively failover and repair when infrastructure underneath the database nodes fails is known as ultra-resilience. Secondly, it's about massive scalability, the ability to serve an increasing amount of throughput for the applications by simply adding new nodes, which is also known as horizontal write scalability, is what we refer to when we say massive scalability. Last but not the least, it's about ability to geographically distribute the data across multiple regions or multiple data centers so that our applications can be served with low-latency queries. Now, why is this distributed SQL architecture so important? The reason is two-fold. It comes first from the needs of today's applications which are increasingly becoming microservices-based applications. These applications are going through a fast, a really high amount of iteration and changes, and in that process, they need the ability to change the queries without changing the schema and which is the hallmark of relational data modeling. They need strict guarantees around consistency and correctness. As well as for a select few microservices that are serving large amount of data, they need massive scalability. On the other hand, we are living in the multi-cloud era today where we have a lot of different cloud providers and all their data centers and regions at our fingertips. In this multi-cloud infrastructure world, we cannot rely on the previous world of specialized infrastructure, but we have to rely on commodity infrastructure, which is more failure-prone than ever before, hence the need for ultra-resilience. And since we have now access to hundreds of data centers and regions at our fingertips, we better start exploiting them by making our SQL databases geo-distributed. So the end result is developers are able to build applications faster. Operations engineers are able to scale, manage both day one and day two operations easily on the cloud infrastructure or Kubernetes infrastructure of their choice. And the end result is businesses are able to launch new services, new initiatives faster than ever before. Now that we have looked at the basics of distributed SQL, let's dive into the distributed SQL architecture. At the core of the architecture is a cluster of nodes, and typically it requires a minimum of three nodes because these databases use what is known as distributed consensus, like RAFT or PAXOS, and they need an odd number of replicas to be stored inside the database to avoid split-brain kind of scenarios. Alongside replication, these databases also automatically shard, which is partitioned data across multiple nodes, and each such shard has one shard leader marked in orange and two shard followers or follower replicas in two different fault domains than where the shard leader resides. And that source is marked by the shard follower one and shard follower two in white. As we keep adding new nodes, or we keep removing nodes from this minimum three node cluster, data will continuously get rebalanced so that we are able to exploit the capacity that we are bringing forth into this cluster. The end result of all this hard work is that microservices applications can now benefit this entire cluster of nodes as a single logical relational database that speaks the well-known SQL language. With that, I will introduce you to YugabaiDB, which is a distributed SQL database with its own unique characteristics. First and foremost, it speaks the PostgreSQL language, so it's fully compatible with the PostgreSQL drivers and its ecosystem. Secondly, it's about giving high performance both in terms of low latency as well as high throughput as you deliver, as you add more nodes into the cluster. And it is built for the cloud native and Kubernetes ecosystem where infrastructure is highly dynamic and highly failure prone. And as a database, it has all the built-in guarantees including asset compliance to ensure that even in such a dynamic environment, you do not ever lose data. Last but not the least, it is a 100% open source database distributed under the permissive Apache 2.0 license. We are excited that distributed SQL is actually the future of the relational database architecture, which is so critical to the modern generation applications that we all touch and feel daily in our lives. Previously, we'd have relied on traditional RDBMSs like MySQL, PostgreSQL, Oracle, SQL Server. However, now the world is moving towards the likes of Amazon Aurora, Google Spanner, and obviously, YugabaiDB. So, today's focus for the remaining part of the stock is in this category, this newer, exciting category of relational databases called distributed SQL. Now, let's come to the second aspect of the talk, which is about geo-distribution, which is essentially running a cluster of database nodes across multiple regions or multiple data centers that are geographically far away from each other. Now, typically, in a traditional RDBMS, what we have done over the years is have deployed them in essentially two configurations. First is the active passive configuration primarily used for disaster recovery, where even though my applications layer is running in multiple regions because they are stateless, they don't have any state, internal state to manage. They rely on the database to manage their state. We have these multiple applications running in multiple regions, but they all talk to this one node, single node monolithic RDBMS in a single region, in one of the regions where the applications is running. So, as a result, the applications that are running in the same region will have a lower latency experience, but the applications that are farther away in a different region will have a higher latency experience. There's just laws of physics involved in bytes being transferred over the Internet. In order to make this setup amenable to recovery from disasters, we add a passive replica in a different region. We use what is known as asynchronous replication, which is essentially taking the right ahead log of the master database instance and replicating after the data has been committed over to the passive replica in a separate region. The passive replica has no knowledge that this stream is actually coming from a different database instance. It is just another writer to the database. While this sort of gets the job done, there is a penalty or there is a compromise to be made here. We have to manually promote this passive replica if we have a total outage on our region and we lose the instance. And during this failover, we will have some temporary data loss for data that has not yet had a chance to replicate over to the passive replica. This is how we have been using relational databases forever, so this wouldn't be new to anybody. Second architecture, which is also getting common, is what is known as active, active, or multi-master architecture, where we create two full copies of the database, essentially active master one and active master two are essentially having the same data, the copies of each other. And there are passive replicas that are present to tolerate the loss of that master in that particular region. And then we add bidirectional asynchronous replication. And the end result is that if we update the same record on both sides, then the last writer wins in this semantics. And the result is that one of those two writes will actually be lost forever. So applications have to take a lot of care to not write to the same records from both sides in such an active, active, multi-master deployment of a traditional RDBMS. Again, these things are very well understood. We have learned to live with it. But Distributed SQL is exciting because it goes to the heart of these problems and starts changing them. So first is the entire architecture is about no data loss and it supports multiple topologies depending on the kind of failures we would like to tolerate. Let's start with the simplest, where there is a single data center or single region and there are multiple racks or multiple availability zones and a single cluster, it's stretched across three such availability zones. As writes are coming into any of those nodes, they are consistent across these zones. You will be immediately able to write to one zone and read from the other zone the same data. There is obviously no cross region latency here because we are in the single region, but there is no region level failover and repair as well. So if we lose this entire region, then we are in a troubled spot and we will have to recover using a full cluster backup, which usually involves some amount of manual processing. Taking this one step further, we can run Distributed SQL in a single cloud but in a multiple region and setup where let's say we go to Amazon or Google Cloud or Azure and we pick three regions from their arsenal and we run a single cluster stretched across all three of them. The beauty here is that it is not only consistent across the regions, it actually can tolerate region level failures automatically. So if we let's say lose region two in the middle of the image here, the cluster will automatically heal with the replicas that are present in region one and region three and it will continue to take rights as if nothing has happened and that's the power of the Distributed SQL architecture. We can extend again this same concept to a multi-cloud topology where we can run the entire cluster across three clouds, not the three regions, one region of Amazon, one region of Google and one region of Azure or on premises. In this case, we're trying to create an additional layer of fault tolerance which is to say that the chances of two clouds losing their regions at the same time is extremely low. So this is like the most fault tolerant scenario you can think of. However, both two and three come with cross-region latency as a sort of a penalty because in order to commit with distributed consensus, we need to ensure that we are waiting for one of the replicas which can be in a further ever region to also commit and this cross-region latency goes to the heart of concerns that developers have around deploying geo-distributed SQL architecture. So the rest of the talk, I'll focus on how to mitigate these cross-region latency problems by taming and enhance the note of taming cross-region latency. So what we'll look at is nine different techniques to reduce inter-region communication and it's categorized into two blocks. One is the basic building blocks which has a few constructs that you can, as application developers, you can use and usually these constructs are transparent to the application. Your application data modeling does not need to change in order to account for and that's the beauty of these. There are some advanced configurations which we'll look at where you can draw out more mileage which means you can reduce even more cross-region communication but in those cases you have to build your data model a little bit differently, a little bit more smartly in order to exploit the power of distributed SQL. So let's drive right into each one of them. First is about strongly consistent which is also known as correct reads without using any quorum in the read path. So since the shard leader has always the latest data that my applications need, we can always rely on them to serve us the data without necessarily being dependent on any other replicas on the read path. Note that we have already done the hard work of updating the replicas on the right path so we want to get the benefit of that hard work on the read path by just simply asking the shard leaders to give us the strongly consistent information. Now in order for this to work, it is important that the distributed SQL database uses not only distributed consensus but uses something known as a leader lease that ensures that for a given shard there can be only one leader at any point of time and that allows us to remove cross region communication and latency for a bunch of queries. Now if you ask me what is a typical workload for this? A typical workload for this is a user's password authentication in a SaaS application. Let's say I'm logging into Salesforce or Marketo or all this Zendesk and in there when I'm logging in I have to authenticate the applications to authenticate the password I'm entering with the password that has been stored in my profile and that authentication is very vital. It has to be served with the lowest latency possible because without that, if that process takes higher latency then we risk user dissatisfaction. That particular workload can be served with strongly consistent reads in distributed SQL since they don't use Quorum and the only latency you see is essentially whatever is the latency between the client and the database node, there is no cross region latency. Taking this step again one step further we know that on this three node every node also has some follower replicas. Now the follower replicas follow a notion called timeline consistency which means the data is always present in them in a same order as the shard leader however they may be slightly stale. So if we're okay the workload is okay to accept timeline consistent follower reads then we can tell the database to give us follower data without ever consulting the shard leader and the typical workload that serves for this purpose is product reviews in a retail application. There may be 200 reviews at this point of time for a given product or given item but the users who are seeing these products on their mobile phones or their web apps they wouldn't realize if we had 199 instead of 200 in the list of product reviews that we present to them. So this is the place where slightly stale data is okay and we can serve that information through the follower replicas and this also has the added benefit of reducing the pressure of the load on the shard leaders to serve all kinds of queries. Joints powered by follower reads is essentially a combination of the technique one and the technique two that we talked about where instead of relying on just the shard leader and just the followers we take some total of that view and go to any node and say give me the list of tickets that are filed by a single user. Again, this will not be 100% accurate but it will have slightly stale timeline consistent information that is good enough for most purposes if we actually want to avoid cross region latencies in distributed SQL architectures. So we have seen three of the techniques. Let's go over to the fourth technique. In the fourth technique we tell the database cluster to pin all our shard leaders to a single region instead of distributing them uniformly across all the regions. Why would we do so? We would do so if we know ahead of time that a single region is dominant with respect to our read request. If we take the loyalty application of a East Coast brand in the US like Dunkin' Donuts, then obviously most of the reads to that loyalty application emanate in the East Coast. So if we pin all the shard leaders to the US East region, then we are guaranteed that we can even serve completely strongly consistent information across even fully relational schemas by simply going to that one region where all the, that one preferred region where all the shard leaders are stored. Again, by using the per shard distributed consensus architecture smartly and by giving hints to the cluster at either query time or at cluster creation time without changing data modeling we can actually draw out a lot of mileage from the distributed SQL architecture where cross region latency is non-existent. Now we'll go to the advanced scenarios. One such advanced scenario is known as row level geo partitioning. Imagine that we are the, we're building the user profile of Zoom web conferencing or WebEx web conferencing. Now naturally web conferencing is a global application. They have users in almost every country of the world. So how do we model such a user profile? We would create a, okay, in today's world without distributed SQL it's a very complicated problem. You're looking at, you know, every data center having its own independent cluster and there is some synchronization happening offline in order to keep the data consistent. In distributed SQL this gets solved at the core of the database where you can design or create a single table to store this global customer data but alongside each customer or user information you also store the country of residence. And then you tell the database that as incoming rows are coming in we look at that particular country of residence column in that row and pin it to the nearest region where the cluster is located. The end result is that reads from a particular region will not have to travel outside that particular region. And we saw that the first technique, right, strongly consistent reads do not require quorum. The same benefit applies now in the context of geo partitioning where a single table is spread across multiple nodes of multiple regions but data is localized to that particular region for that node. Very, very interesting concept. Along with row level geo partitioning we can also think about how do we keep related tables or related rows of tables close to each other so that when queries come in with parent, child and hierarchical relationships we don't have to bring the network into the equation and we can just serve by looking inside whatever I have in my current node. There are multiple tricks, techniques associated with that. Co-located is where at a database level all the tables share a single shard. Co-partitioned is where a bunch of tables share a shard key and are potentially located on the same set of nodes. Interleave tables which is what Google Stanner implements and is shown on the slide here uses the technique of storing all the child records inside the parent record in a hierarchical manner. So in this case we see that example here like Mark Richards is the singer and he has two albums Total Junk and Go, Go, Go. Now for the album Go, Go, Go there are two songs, 42 and nothing is the same. This entire set of records is stored in an interleaved manner that means in a single storage construct on a single node so that queries related to singers and their albums and their songs can be served more easily. The complicated example here is around Catalina Smith whereas they have an album named Green and there are songs let's get back together starting again I knew were magic inside those inside that green album. Again the net result of all the three techniques here co-located co-partition interleaved is related data is close together as a result cross region Now we saw at what things can happen can be done with respect to data placement or replica placement inside a single cluster. What we also look at from perspective of the client if we are using standard SQL client driver like PostgreSQL for example then those client drivers are not aware of this concept of multiple nodes powering a single database cluster because those standard SQL client drivers only were built for the monolithic single node databases so the common approach here is to add a load balancer which abstracts out the nodes that are present in the cluster and gives a single endpoint for the client applications to connect to the issue here is that now we as application developers have to own the management of the load balancer as well as keeping the load balancer updated as and when we add the remove nodes. So it's not the best value proposition as it stands. What if we had a topology aware SQL client driver where since it can actually understand the topology of the nodes as they are added or removed we don't need a load balancer at all and it even if we give it the address of a single node it auto discovers all the nodes that are present in the cluster again it makes our life so much more simple because we have taken out one infrastructure element from the application to database path as a result we have reduced our operational burden and also helped in lowering latency there are other approaches as well to to using distributed SQL in a geo-distributed fashion one such approach is read replicas where we have a single master cluster in one region spread across multiple availability zones but in order to serve very remote very far away reads means reads from far away regions we actually create a separate read replica in those regions and we make the clients of those regions connect to that read replica first so that if they are doing reads and they can accept timeline consistent which is slightly stale reads then read replicas suffice completely and there is no reason ever to come to the master cluster which can be really far away but if writes come in to the read replica then it automatically redirects that write back to the master cluster because that is where the writes are being taken so this helps in reducing latency if the workload allows for that kind of an approach again product reviews comes to my mind when it comes to powering geo-distributed reads in far away regions may not work for other scenarios then the last deployment is actually a two cluster deployment it's not one cluster and in this case each cluster sort of works as a master cluster in its own write similar to the traditional rdbms paradigm we saw before and each cluster is fault tolerant consistent across zones and each cluster has the full copy of the data that needs to be managed but the additional work that needs to be done is each cluster asynchronously replicates to the other cluster using last writer wins semantics for conflict resolution which means we are back to the same problem of data loss on conflict but we do get the benefit that our writes are never cross region and on failure we can always make the applications talk to the master cluster that is still alive in a different region ok so I covered 9 techniques today in order to reduce cross region latency in distributed sql databases as summary I would like to point out that distributed sql is an exciting architecture which is increasingly becoming the future of relational databases there are a few reasons behind it it keeps the power of sql's relational data modeling and distributed asset transactions intact because that's what we need to serve our source of truth data it keeps it brings in the aspect of native resilience against failures at disk, node, zone, region and cloud level which are all the more common in the multi-cloud world we live in it adds the ability to do horizontal write scaling with continuous data rebalancing automatically you don't have to write any complex replication logic on your own you don't have to manage those scripts and run them from time to time everything is baked into the core of the database it offers us multiple geo-distributed deployment options to pick and choose depending on how much latency is acceptable to us compared to how much resilience we desire from our infrastructure and it does all this while ensuring that no data is lost as long as we stick to the guarantees that are provided and it gives us a single logical sql database which is a powerful notion from an app development standpoint where I don't even need to know whether my sql application is backed by a 3 node cluster or a 100 node cluster geo-distributed is a deployment paradigm which is very exciting previously it was not even possible in the sql world other than you know very basic active passive disaster recovery that an active active multi-master that I talked about however the concern that geo-distributed sql always implies cross region latency is not true there are multiple techniques that are available to us and every database implements it to these techniques to some extent or the other and we use those techniques intelligently to reduce cross region communication as we saw there are two categories of such techniques the basic building blocks are heavily reliant on the core per-shot distributed consensus architecture while the advance and they are completely transparent to the applications data modeling while the advanced configurations rely on more explicit data modeling changes such as role-level geo partitioning and as well as use of new smart topology aware client drivers with that I have reached the end of my presentation I recommend the audience to review the distributor sql blog where we talk about various concerns that I touched upon today you can welcome to join our community Slack channel and you are welcome obviously to install Yugo by DB on your own and see for yourself how distributor sql can make a difference in your app development life cycle with that thank you hello everybody this is Sid here now that the presentation is over I am open for a live Q&A you can put in your question in the Q&A with speaker window that you see on your screen if you have any questions I am also available on the OSS ELC Slack community and there is a dedicated channel for open source databases I am hoping that open source and databases are exciting to you please come and join us on that Slack channel would love to engage and discuss your thoughts and concerns looks like we don't have questions Tina I think we are done here