 Bad dinner, wicked dinner, bad, bad dinner, wicked dinner. Okay, so today's lecture, let me review stuff again. So hopefully everyone has finished project three. If you haven't, it's already past due. So yeah, you will either be using your grace days or there could be penalty. Hopefully everyone has finished the third project. And then this is actually the last two assignments for this semester, right? Homework five will be released next Monday, November 2nd, and it will be due on December 2nd. And the last project will be product four. It will actually be released today and it will be due on December 5th, right, before the final exam. So also, these database talks are coming to the end of the semester. Again, today we have this flurry coming to talk about their cloud native ledger or like blockchain graph database system, right? So like lots of buzzwords there. All right. So today's lecture and then the next two lectures will actually be the last high level topic of this course, right? It is fast, right? So today and the next two classes we'll be talking about distributed database system, all right? So I don't know whether Andrew has mentioned this before but in previous years we probably have discussed the concept of parallel database when we talk about query execution earlier in the semester, so there's actually a difference here, right? So when we talk about parallel databases we may also partition the database into different portions and the different parts of the system are responsible for different things but we sort of assume that those that knows parallel database actually close to each other and then there's a really fast connection that would coordinate and all connect those different nodes and those nodes go together, right? Like a one-nose, if one-nose is done then the entire database is done, right? There's no fail-over kind of stuff and then we also assume the communication cost is small, right? But today's lecture we're going to talk about this kind of distributed database where we actually can arrange different nodes that either in the same machine or could potentially be very, very far away from each other, right? For example, you know, distributed database the nodes or the partitions of data could come from different regions on different even continents, right? You know, in a cloud, for example and then I mean they would communicate with each other using the public network which means that the network latency could be really high and then different nodes can actually fail, right? So in a distributed database setting if one node is done, we also need to have a mechanism for the whole system to be running while like a handle the case where one node failure and then we spin up another node to replace that node, et cetera, right? And again, the communication cost can be high and then there could be problems of a node failure that we are going to handle in a distributed setting but you can see that the distributed database would actually be much more powerful, right? You can distribute or even replicate your data across different locations, different data centers or even different continents, right? So if one data center is done we have another data center is still up and then your database system I mean hopefully you have mechanism to let the database system still be running so the system can have much more availability and then give you much more flexibility on where to partition your data as well, all right? So that's what we're going to talk about today. So one thing to note here is that for all the topics we have so far talked about in this class, I mean all of those things we will still be using them in a distributed database system setting, right? For example, in a distributed database we will still be doing query optimization and planning we will still be doing currency control and we will still be doing log and recovery, et cetera but the only thing is that most of those things will actually become harder if not a lot harder, right? So for example if you want to do a concurrency control well then you not only need to handle the case where a different part of the data may be on different machines that are potentially be far away I mean very high communication cost and also while transactions are executing when you lock some data then that node may go down, right? And then essentially your log would actually never be released if you don't have a mechanism to handle that or resume or like a spun up another node to replace that node, et cetera, right? So we will still be doing most of these things or all of those things it's just there will be much more challenges in a distributed environment, all right? So today's agenda will actually be talking about some of the foundations or like a basic concepts in a distributed database scenario, right? So for the class on Wednesday as well as next Monday we will more talk about specific algorithm, right? Like more implementation details but today we are going to, because I mean the whole topic of distributed database could actually very well be its own like a similar time long cost, right? So today we are just only going to kind of scratch the surface and talk about some high level mental model as well as intuitions and then the next two classes we'll go into a little bit more details, right? So specifically we are going to talk about the really high level architecture that we can choose when we're trying to build a distributed database and then what about the related design issues and then we'll also talk about some methods to partition your data, right? Across these different nodes or machines and then lastly we'll talk a little bit about distributed concrete single show and we'll get to more about that next Monday, all right? As today's content. So first of all, the system architecture especially when we say distributed database, right? We refer to the architecture of a distributed database as the mechanism that the system is going to allow or specify what resources can the CPU directly access, right? So I mean every system has architecture, right? But in distributed database the architecture of a distributed database is specifically referring to how system allow the CPU to directly access different type of resources, say whether the system allows the CPU to directly access the data on the disk. Well, if the disk is on a single machine then the system can allow it to access but it could also choose not to allow and then if the disk is on a different machine then obviously the CPU cannot directly access that data on the disk on a different machine, right? So this kind of decisions would be called the architecture choices of a distributed database. And then this actually this choice of architecture would very well affecting many other implementation details and algorithms that we choose to implement a distributed database, all right? So this sounds a little bit abstract let me give you some specific examples. So the first type of distributed architecture first type of distributed database architecture I need to be specific here would be called shared everything. So this would actually be exactly the same as the single node database that we talked about so far in the semester. Essentially, I mean everything is on the same machine and the CPU can directly access both the memory and disk on the same machine, right? So I mean it's like a shared everything and then it's the same as a single node database that all that's the architecture the single node database system would be using if we consider the terminology of a distributed database system architecture, all right? So now we got to a scenario where the resources or CPUs can be on different machines, right? So the first architecture choice of distributed database would be called shared memory. So in this architecture, again as illustrated here these CPUs can be on different machines, right? I mean it can be very far away from each other but then we are going to assume that there's a central mechanism or unified channel that allows these CPUs to access memories like a central place of a big chunk of memory either on a single machine or on different machines, right? So this mechanism could be, for example, RDMA or Infinity Band but the idea is that these CPUs don't really know which memory chip is on which machine is like local to the CPU or not, right? It always goes through this unified interface that would hide the details of the memory and this interface would deal with the memory consistency or coherency for this big chunk of memories either on a single location or memory distributed on different places and then each memory would have access to the local disk attached to it, all right? So that's called shared memory. Make sense? So the next architecture which is actually a little bit more uncommon will be called shared disk, right? So kind of just move this central interface that coordinate a different system from between CPU and memory to between memory and disk, right? So in this shared disk architecture, again, different CPU chips could be on different machines, different node and the every CPU would have local access to its own memory, right? But then below this memory, there's a central, again, centralized interface or unified channel that would allow this memory to read data from this like a shared big chunk of disk space, right? Either this disk could be on a single machine again or could be distributed on different machines, doesn't matter. But the important thing is that when anytime a CPU wants to read some data it first go to its local memory and then the local memory would actually go through this unified interface to get data from a shared pool of disk space, all right? Make sense? This is actually, think of this, this is like, we'll get to the details but this is actually much more common these days and especially in the cloud area, right? This shared disk could be something like S3 on Amazon for example, right? So we can have a different compute node but then at the end of the day they all go through this Amazon S3 service to get the data regardless of which machine exactly the data is, right? And then the last architecture would be called shared nothing. It's actually also kind of common but more common before 2010s, before the cloud time. So essentially in this architecture I mean the CPU had, every CPU would have its own memory and disk directly attached to the local machine, right? So every time the CPU wants to read data go to its local memory, then go to its local disk. But then anytime one machine or one CPU needs to access data on a different disk they actually need to go to a higher level network protocol. Sometimes it just would be a public network, right? They would go to this higher level network communication channel to coordinate the data reside on different machines and different disks, et cetera. But the important thing is that every CPU has its own memory and a disk and there's no resource sharing below CPU, all right? Does that make sense? Cool. So let's give you some more specifics about these different architectures. So the first, I mean, I don't know the very first would be shared everything but that's kind of, there's a single known architecture that we already talked about, right? So in the distributed setting the first architecture I mentioned would be a shared memory and again, I mean these CPUs will actually have access to this common memory address space so we are a first interconnect and oftentimes this could be implemented with RDMA, right? Remote Direct Memory Access or just by the name you can tell that that's kind of like an interface that would allow different CPUs directly access memory reside on different machines and then, like I mentioned here, every processor would actually have a global view of the entire memory, right? So no matter where the memory chip resides every CPU will know what will be the content that this system has on each location of this entire memory space and then can put data or can either read data or write data on any location in the entire memory address space, okay? And then, yeah, then all the coordination would be happened in this network layer and then that this network layer or the centralized interface will be handling all the cache coherency and consistency, et cetera. But one thing to note here is that actually in practice almost no database system would implement a distributed architecture this way and the main reason would be just that in order to coordinate the cache coherency as well as the cache consistency with this single network interface is actually pretty difficult and the overhead is often high, right? It's much easier to deal with the issue onto the lower layer or even just use a shared nothing architecture, right? So in practice, almost nobody use that. So what would be more common use case of this idea, right? Not necessarily in the database area but a common, more common use case of this idea would actually be a high performance computing, right? Because in that world, you often have this giant machine, right? With lots of, lots of CPUs and also a centralized location with a giant chunk of memory, right? Terabytes, et cetera. So in that world, then it's actually a common to have this shared memory address of this entire big array of memory in your data center and then using MPI to coordinate the communication between different compute nodes, right? So it's actually a more common, this idea is more common used in the HPC world rather than the distributed database system world, all right? So next idea would be a shared disk. Again, like I mentioned, this is actually a very, very common this disk especially in the area of cloud and then the high level idea would just be that each CPU would have local access to the memory by doing all the computation and then do the read write memory read and write there or the read and write in the buffer pool, in other words. But then when they needed data, they would actually go to this centralized location with a unified channel, right? And then to store data there. So this is actually more common especially in the cloud area because this have this appelling of property where it can allow you to scale the storage and compute independently, right? So because in this architecture, this all the data access is going through this centralized interface and all the details of how many disks are there with locations of all those disks, et cetera, will actually be hidden, right? So if your storage is not enough, you can add more disk space, I mean, below this network interface, then this compute node doesn't need to know, right? Similarly, if you realize that you don't have enough compute capacity but you have already have enough disk, then you can simply add more compute node and then that doesn't affect your operation on the disk, almost doesn't affect the operation on disk. Even in some cases you don't handle a few corner cases. So this is actually a very appealing property especially in the cloud area, where people want this convenience and flexibility to deploy their application applications and scale resources up and down in the cloud, right? So having this shared disk architecture would be very well compatible with that goal, right? So another thing to mention here is that even though we have this separation between memory and the disk and all the data access going through this network layer, but while the database system are executing queries, especially executing transactions, right? There would also need to be direct messages sent between this compute node or between this CPU and memory. And we will know more details later, but essentially if you want to do things like a concurrency control for example, then you still need to do a little bit of coordination even though you read all the data from this centralized location, okay? So again, these are just examples of the system that would be using this architecture. Most of the systems after 2010 would actually, most of the distributed database systems after 2010, especially a system in the cloud would actually prefer this architecture, all right? So we'll give you a specific example here, right? Say I have one distributed database with two compute nodes, right? That would include the memory chip as the CPU or processor and then we have this centralized location to have all your data, right? Think of this centralized, Christian for example, as the Amazon S3, I mean data interface, data storage interface, right? And then for this node, think of them as Amazon EC2 instance for example, right? Then in this case, say the application server wants to get data with ID 101, then what this node or computer node with this processor as well as the memory would do that, it would just go through this unified interface for all the storage to find out which would pay page that contains the ID, contains record with ID 101 and then read the data and all put that specific page to the buffer pool in this node up about, right? I mean similarly, if you have another, another transactions, sorry, another query wants to read to pull with ID 200 and then this node will go through this interface, find that page on this shared disk space and then get that page back in the buffer pool and read that out, all right? So again, with this, you can easily scale the compute and storage independently, right? So say here for example, I want a, I realize that I don't have enough compute capacity, right? I want to add a new compute node. So that's very easy, right? You don't need to do, mostly, you don't need to deal with anything in the storage space. You just add another compute node with additional CPU SS memory and then say now you want to, you still want to get a record with ID 100, then you can just go to this disk and get that page back, all right? So what will be a little bit tricky here is that say at some point, you want to update this page or this ID, sorry. You'll update this record or tuple with ID 101, right? Then you could do that, right? You could go through this compute node and then eventually get to the storage layer and then update that page. But the problem is that like I mentioned earlier, other nodes in the system for example here, right? Other nodes in the system may have read this tuple from that page as well, right? And then have some local copy of this page in the buffer pool that it operates on either read or write, right? So in this case, if there's an update in this compute node but also update this record, then just like I mentioned earlier, between these compute nodes, even though it's a shared disk architecture, there would still be a message passing between this compute node that you coordinate, hey, which node has which copy of which page or which record and then when an update happen, how do we need to handle the concurrency between them, et cetera, right? This is just a high level idea to tell you that there still need to be a coordination between different compute nodes and we'll talk about the detailed concurrency control algorithms later, all right? But is this illustration of a shared disk distributed to have a system architecture make sense? Okay. So similarly, right? This is just the same as you can scale this compute node, right? Add a new node with a CPU and memory. I mean, you can also add storage node below this shared disk layer, right? And then again, I mean, you don't necessarily need to touch most of the stuff in the compute layer because it's always, all this storage will be hidden behind this unified interface of this shared disk networking layer, all right? Because they're very easy to scale resources up and down, especially independently. So the last architecture would be a shared nothing architecture, right? So again, like I mentioned before, every instance or every node in this architecture will just have its own CPU, memory and disk. And then it will always access the data locally, right? Either memory or disk. And then when a specific node needs to access data on a different machine, we'll actually go to this higher level network layer. In many cases, it would just be a public network to get the data remotely, right? So there are actually, well, the immediate, obviously the advantage of this approach is that if a query or a transaction only needs to access data on this local disk, then it will be very fast, right? Because each every CPU would have a local memory and disk attached to it. And then if all the data this query or transaction needs to access will be on this local disk, then it can just directly read everything locally, right? It doesn't need to go through the expensive network protocol, right? So that's the obvious advantage for that. But then there are also corresponding disadvantage, right? The first would be that unlike a shared disk architecture, it's difficult to scale compute and storage independently, right? Because every node would have storage and compute and memory attached to it. And also it's also a little bit more difficult to ensure the consistency of data correctness in this shared disk, sorry, in this shared nothing architecture because now every node will have its own partition of the data, right? There's no centralized location of storage anymore. Every node would have its own partition of data and every time one node, for example, update a record, then if other nodes needs to read the record, you sort of need to coordinate not only between a different competition but also you need to coordinate the data between different nodes as well in this shared nothing architecture. Especially, I mean when you want to replicate a portion of the data on multiple nodes for availability reason, right? So could be a little bit that challenges as well. But again, because of its efficiency and performance reason, many of the systems, especially systems before 2010 will actually use this architecture. So this is also a very commonly used in distributed database systems, all right? Any questions about the shared nothing architecture? Okay, so let's give you an example here, right? So in this shared nothing architecture, every node again would have memory, CPU, SOS, storage. And then for example here, let's say we have two nodes, the first nodes as I used it here or in other words, we can call it a partition, right? It would have the partition of the data from ID one to 150. And the second node, for example, could have the partition of the data from ID 151 to 300, right? And for example, in this case, if the application server sends a query that wants to access the tuple with ID 200, and then this query would just be sent to this second partition, right? Of course, there actually needs to be some sort of a mechanism to figure out which query this actually, which partition or which node should the application server send this query to, right? We are ignoring that for now, but we'll get to that later in this class. But again, as the illustration, this query will go to this second partition or second node. And then similarly, oh, in another example, right? Let's say now another query wants to access two records or two tuples, right? The first with ID 10, the second with ID 100. Then for the tuple or record with ID 10, it actually directly access it from this local disk, right? Because it's just on the first partition. Then for the second request, right? When this query, if this query needs to access a tuple with ID 200, then 200 is not on the first partition that handles this query, right? So what typically would happen is that there would be a coordination between different partitions. So one common approach would be that this first node would send a request to the second node and say that, hey, I have a query that needs to access tuple or the content of the tuple with ID 200. And then it sends that to the second node, second node, read that record, and get that back to the first node. So there would be needs to be a coordination happening there in this sharing nothing architecture, all right? So here say that I want to scale the resources up and down, then unlike this shared disk architecture, I can either only add more compute node or storage node. Here I have to add everything, right? So like every node would have memory storage and compute in it. But then one issue that we have to deal with here is that I mean, if we just create a new node with this like empty disk space, then it's kind of like a waste, right? So if we still have this data from ID 1 to 150 on the first node, 151 to 300 and then at third node, then there will be waste disk space on the second node, right? So typically what we need to do in a shared nothing distributed database architecture is that every time when we add a new node, we also need to redistribute the data, so move some of the data from the other node to this new node so that this new node also would handle some of the workload instead of wasting this compute and storage here, right? So for example here, what we can do is that we can move 50 record from the first node and the 50 record from the third node and get to the middle node, right? But let me hear, I mean, the illustration may seem simple, right? If you just, I just change the number and I have a few errors here. But then the actuality, this data movement actually can be very expensive and complex because essentially it's a giant transaction that you need to move one big part of data from, delete one big part of data from some node or like a few nodes of your entire distributed system. And then after the deletion, you need to ensure that every single record is correctly inserted into a new node, right? And then while this is happening, you also need to ensure that every system are also handling the other transactions or queries executing at the same time, either reading or updating this record correctly, right? So this data movement is actually a pretty expensive and complex and then there's also a limit the flexibility of the system, all right? Any questions so far about these high level architectures? Okay, so let's talk about a few specifics, especially specific design issues that involve when I'm trying to build a distributed database. So a little bit of history here, right? The very, I mean, like many other concepts in the database system world, I mean most, the distributed system is not always not a new concept either, right? So the very first few implementation of distributed database system could dates back to 1979. And there are actually two kind of like well-known concepts or proof of concept come came up during that era. The one is called Muffin. I'd call some multi-node, like a field, fast ingress, something like that, right? So it came from UC Berkeley, from actually from my advisor's advisor, Michael Stunbricker. And then there's another kind of, it's actually not a real system, more like a proof of concept called SDD-1, right? That came from, again, also came from 1979, but from this second person called Phil Bernstein, which actually is the inventor of many of the concurrency control algorithms for distributed database system, right? So he also has a kind of like a famous concept or proof of concept of distributed system in that era. But then of course, like we mentioned many times in this class, System R has the original implementation and algorithms for many of the current implementations and design of database system today. Here similarly, System R in 1984 also has a different version or like a new version of the system called System R Star have this distributed database system capability, right? And then there are also another academic version of distributed system from University of Wisconsin. It's called Gamma. That's mostly led by a famous professor there called David DeWitt. By the way, for in System R, many of the distributed database system work was also led by this person called Mohan, right? Which would be the author of the Aries paper that we talked about last class. And lastly, the Touring Award winner, right? Jim Gray, he also was like very involved in the development of an early distributed database system called Nonstop CQ and it came from a company called Tandem. I think right now Tandem is already through multiple acquisitions, right now Tandem belongs to HP, but he also has a version of a distributed database system that is actually still running today, right? This Nonstop CQ, I mean, Human Advice is Tandem company. Right now, of course, there's not really new features added to Nonstop CQ, but the system is actually still used by some applications today, but mostly in maintenance mode already, all right? So that's the history of many of the early implementations of distributed databases. So for today's class, right? We are going to talk about now, we are going to talk about a few design issues when we want to build a distributed database. So at high level, right? We are going to talk first, talk about three questions. The first would be how the applications, where, how does the applications find the data, right? Where, how does the application know which node or which partition the data is on? And then how do the queries execute if it finds that the data it needs to access is not on the same machine, right? Either share nothing or share disk, how does the query going to get the data remotely? And lastly, I mean, how does the database system is going to ensure that when executing queries, especially transactions across different machines to ensure the correctness of the data, right? And then also beyond these basic, no, like beyond these specific topics, one central theme we have to deal with in distributed database system world is that, I mean, nodes can have failure, right? So beyond this specific issue here, we also need to keep in mind that when a failure happens, one node goes down, or if we, oh, in other cases, we may need to add a node, right? So when node go down, we want to replace node, or we want to add a new node, how do we need to deal with those decisions as well as the execution of the entire system, all right? It's kind of the challenges or the scenarios that we have to think about when we're trying to build a distributed database. So, okay, so the first topic I want to discuss is that there could actually be a choice among how you organize or assign rows to different machines in your entire cluster of machines in the distributed database system. So at high level, there would be two different choices. The first would be called homogeneous node, which would mean that every node in your clusters of machines in your distributed database system would actually have the same responsibility, right? So every node would be handle reads, handle writes, handle the transactions, et cetera. But I mean, every node, when every node goes down, I mean, you can create or spin up a new node, and this would be exactly the same or has exactly the same responsibility as other nodes in the system, right? So obviously one advantage of this is that it's easier to provision new node as to resolve the scenario from a failure, right? Because like I mentioned, if you add a new node, well, this new node from a functionality perspective would be exactly the same as other nodes, right? If one node fail, you create a new node, then I mean, this new node would exactly be able to take up the responsibility of the node that failed. So in contrary, a different approach would be called heterogeneous node, which means that different nodes in your cluster of distributed database system may actually have different roles, right? For example, some nodes may have a really intensive queries. Some node may be able to handle right intensive queries. For example, if for those like right intimacy, for the nodes that handle right intensive queries, it may even have a faster disk, right? And then some of the node may even just be responsible for handle concurrency control, et cetera, right? So in this heterogeneous scenario, then the advantage is that you can kind of specialize different nodes for different purposes, right? So potentially have a better efficiency of different nodes based on the task it needs to do. But then the obvious problem or challenge of this heterogeneous approach will be that depending on what kind of a responsibility a node is taking, then when a node fail, it may be difficult to recover from that and then create a new node that exactly take over the responsibility, right? If the responsibility that this particular node that failed to take in is like a challenge or difficult, right? And then another thing to note in this second, heterogeneous scenario will be that one physical machine may actually be responsible for a different or a number of virtual nodes as well, right? For example, if one physical machine is very, very powerful, it can be responsible for both read and write. So there could be two virtual nodes on that physical machine, one responsible for, for example, read only course and the other responsible for write course. I mean, you can assign the task of different nodes to the same machine as well, all right? So let me give you a more specific example with this heterogeneous architecture. So the example here I'm showing you is actually a node SQL system, I know really a traditional relational database system. So this is actually an example from MongoDB, all right? So MongoDB would be a heterogeneous architecture, but by the way, it's actually a shared nothing architecture, right? So this is kind of like orthogonal, right? Whether you design a system to be heterogeneous or homogeneous, the architecture would either be a shared nothing or shared disk, right? So that kind of like a orthogonal design choices. So in this case, as I'm showing you on the right, I mean, in the MongoDB architecture, the data and the CPU and as well as the memory would be on these different nodes called shards. I mean, you can think of them as the partitions of the data or just like a node as I mentioned in the earlier example, right, same meaning. So in MongoDB part-time, they call them shards and then we have four shards or four partitions. But then beyond these shards, they actually have two different types of heterogeneous nodes, right? One type would be called router node, the other would be called config server node, right? So what would be the responsibility for them? You can see that one query is trying to read a tuple with ID 101, then in this MongoDB architecture, all the queries would actually be sent to this specific router nodes first, right? And then the router would be only be responsible to figure out or to not actually to figure out yet, but be responsible to route the queries sent by the application server to the corresponding shards or partitions that would contain that data, all right? But then when the router tries to do that, it actually needs to consolidate the locations of different records, different data with a centralized config server, right? So the config server would have had the information on the state about which record or which page will be on which partition and the router would be all go to this config server, right, read this state table from a config server, say that we have four partitions and different tuples on different partitions and then after config server figure it out, it will send this information about which partition or shard the router should route this query to back to those routers and finally, I mean, route to the correct date, correct place, right? Well, the router is only responsible for directing the location or directing the direction of queries and the config server would store all the states of the locations of data on which shards and shards would finally be responsible to read the data and execute in the query, right? So this is like an example of the MongoDB architecture and different nodes would have different resource abilities. All right, make sense? Nice. Okay, so that's the first design decision and the second design decision or actually not really before exactly, before we talk about the second design decision, a important concept here we want to achieve with distributed database system is that the users of the distributed database system should actually not know where the data is physically located, right? I mean, similar to the, we talk about concurrency control, well, similar to the concept when we talk about concurrency control where we say we want the database system to handle all the scenarios where there's a dirty, dirty rise, some system failure, et cetera where the users don't need to worry about locking the data and only need to focus on the core logic of the application. Similar here, in a distributed database system scenario, right? We don't really want the users to be worried about, hey, which part of the data is on a which machine or whether the data is partitioned in different machines or even replicated on different machines. We don't want the user to worry about any of those things, right? We want to handle all those things for the users automatically so that when users write a query, users doesn't really need to know, hey, I'm writing this query to a single node database or a distributed database. The system will actually hide all those implementation details from the user so that the users, again, can conveniently store and access their data and then focus on the core logic of the application, right? So one important notion, I mean, related to this, that the distributed database needs to deal with is that, like I mentioned, it needs to be able to automatically figure out how to partition your data to different machines in the cluster of a distributed database system as automatically figure out how to read and write data from those partitions, all right? So when I say a database partitioning, that's actually a term more often used in traditional, like relational database work. So in the newer, it's called NoSQL database system, especially with the example of MongoDB, this partitioning would actually be called sharding, right? And for some reason, I actually don't know why, but for some reason, that's a term that's more often used for NoSQL system, but essentially that's the same thing, right? Sharding partitioning, same meaning. And then essentially the database system needs to be able to partition the data onto different places and need to be able to figure out where to access data and then fetch the data from the correct sharding or partitioning appropriately and then finally aggregate the results and send back to the user, right? So the users would be similar to the scenario to writing to a query to a single node database system, right? You shouldn't know that the data is distributed, okay? So the first partitioning approach to do this, or we can, as I wrote here, call it naive, we can call it naive table partitioning would be that the database system could choose to put all the data related to one specific table on a partition, right? Just like one machine is responsible for one table and the other machine could be responsible for the other table. And of course, this would be under the assumption that every machine would have enough disk space that handle all the data from a particular table, right? If a table is bigger than the maximum of disk space allowed on a machine, then this like a simple approach doesn't work. And another thing is that the scenario where this approach works the best would be the scenario where you don't really have a drawing query across different tables or you don't have too many drawing query across different tables, right? As well as if the distribution of these queries against, I mean, each table are kind of homogeneous or uniform, then this approach would actually probably work well, right? So I'll give you an example of this, right? Say I have two tables, table and table two, then with this naive table partitioning approach, right? You could just put, for example, table one, all the data onto this first partition and then table two, all the data on the second partition, right? And then the ideal query for this would be that you're always only looking at data from a particular table at a time, right? You don't really have too many drawing queries. So because of this limitation that this every table must be able to fit the storage space of a specific node. And so as this scenario where, I mean, if you have a drawing queries or if you have queries that don't really have a uniform distribution against different tables, I mean, the performance may not be very good. Actually, in practice, most of the system would actually not, as you can imagine, sorry, most system would actually not use this naive table partitioning approach, but then there's one system actually, MongoDB, which does allow you to have the option to allow you to choose this, right? And then they have some use case for this scenario. And I think one use case they come up with for this functionality is that if a query is, if a workload has many queries that are only appending logs, right? To a specific table, right? So you can think of this as a metrics or a logging store, right? Then if there's a workload, there are a bunch of queries only appending logs to the end of the one table and they never or rarely read any data out, then you can think of, you can imagine an system where they have a designated sharding or partition for that specific table with a better disk, right? A faster write speed. And then if you only write to the disk and never and rarely read back again, then a faster disk would probably or at least a disk that is faster on write would probably help you on that scenario, right? And then other transactions or queries can use other nodes in the system and don't really be affected by the write operation here, right? So that could be one example of use case, okay? So would it be a more common approach? Again, probably a suitable for more general scenario would be called a horizontal partitioning, right? So in this case, instead of putting a entire, each of the entire table on a specific node, what you will do here is that you are going to split the tuples from every single table across different nodes, right? And then of course, you want to be a little bit intelligent about how you are going to split these tuples, right? For example, you could try to balance the size of the partitions of the table among different partitioning or shardings to be like an equal. Or you can also try to balance the load of the queries that are going to access data against different partitions of a particular tuple to be a roughly uniform or balanced, right? And there are many intelligent decisions you can do with this horizontal partitioning. And then the two common approach would be called hash partitioning and then range partitioning, which I will give you some specific examples in the next few slides. And then lastly, one thing to note here is that when the database system partitioned the data horizontally, right? It doesn't necessarily mean that the different partition have to be on different machines. So I mean, one scenario would be a physical partition would be very common in the shared nothing system where you indeed actually for different partition they're going to be on different machines attached to different local memory compute as storage, right? Then there could be a logical partition as well, right? Typically seeing a scenario where we have a shared disk system, even though you can partition the data to different portions or chunks, but then there could be all the partitions would be reside on this big pool of storage space, right? You know, shared disk scenario behind this network and then they could be on the same machine, it could be on the different machine depending on how this shared disk architecture or this shared disk space is going to manage that. But then what you can do is that for different compute node, right? They can be resubmissible for handling the queries directed to different shards. But then in this case, this sharding or the partitioning would actually be logical, right? So only at the logical level, which compute node should be responsible for the queries against which part of data the data themselves could be on the same node or same physical node or different physical node doesn't really matter, all right, makes sense? Okay, nice. So let me give you an example of this horizontal partition, right? Say here I have this table with four attributes and the first thing you need to do is that you have to choose a partition key, right? For example, here we choose the second attribute of this partitioning key and then one simple strategy like I mentioned here would be one simple strategy you can use would be hash partitioning, right? So in this case, you would just apply a hash function or all these values and then one simple thing you could do is just to mod the hash value by the number of partitions or shards that your database system has, right? In this case, we have a full system and then after a mod the number four you can have different, I mean, assignments of these different partitions, right? As I usually say here and then assuming that you have a query that have a where clause and it contains this partition key, right? Then you can just hash the value in this where clause and then, I mean, again, mod by this number of total number of partitions or shards and then you can locate the corresponding machine, right? So you can, that's how you find where the data is in this horizontal partitioning scenario. And of course, as you can probably tell already in this horizontal partitioning it's very, very important to choose which partition key is, right? Because if you have a sharding or partitioning, I mean, distribution that where many of your queries don't really have an attribute in your where clause contains the sharding key then what you have to do is that you have to broadcast those queries, right? Don't have this partition key in where clause to all the shards to find out where your query is. So where your data is and that could be potentially expensive, right? So this choice is very, very important, all right? So another challenge we may encounter in this simple, at least a simple hash partitioning scheme is that what if say right now, again, independent of whether I'm doing a share nothing or share disk, right? Actually in this example here I'm illustrating more a shared disk, sorry, shared nothing scenario. But one challenge here is that what if I want to add a new machine, right? And then right now, the number of shards or partitions will actually be five. Then in this case, pretty much, I mean, most of the values, I mean, after this hash and mod by this number of partitions the partition assignment, most of these assignment will actually change, right? And then in this case, unless this simple scenario I mentioned earlier where you have some sort of like a range-based assignment of different number records on different machines, now, if you add this new node and change your equation on assigned the shards, then potentially, I mean, lots of things will change and then you have to move data all around your entire cluster, right? And then put back the location, put back the potential of the data to the correct location. And this potentially be very, very expensive and also very, very difficult to coordinate as well, right? So that would be a challenge happening in this case. Again, like I mentioned here, it's the choice of partitioning, which partitioning algorithm you use will actually be independent of whether you use share nothing or share this, but that said, this problem would actually be a bigger when you have a share nothing architecture, right? Because in share nothing architecture, every node will have a specific portion of the data attached to it that needs to be responsible for, right? So there will be more of a challenge, more kind of moving around how to deal with in that architecture. In the shared disk architecture, then because everything is managed by this shared disk layer, below the network layer, things could be a little bit easier. But again, similar problem, you still have to deal with, right? But potentially the mechanism would be a little bit simpler. So the approach we address this problem will actually be a very interesting idea personally, I think, called consistency hashing, right? So some of you may actually have heard of this idea of consistent hashing in some of your other either distributed system cars or other like cloud computing classes, right? It's actually a very interesting idea came out from MIT in early 2000s, right? So what it can do is that it will actually allow you to do incremental additions or removals of the nodes or machines in your cluster without actually having to move all the data around, right? Destroy the assignments of everything. So let me just read a high level idea, right? The idea is that instead of having this function that mod the hash value by the number of nodes, what consistent hashing is that it will actually have a, mentally it would have a ring that would map all the hash values too, right? And this ring would be between a zero or what, zero to one, right? I mean, let me look at this username here. I mean, think about that. We have a ring of values from the top zero and then a clock. Yeah, this is a clock wise and then 0.5 at bottom and to one, right? And then for each partition, it will be actually be resubmissible for the hash value in a specific range in this ring, right? And say we have these three locations that represent the, I mean, kind of the range that these three different partitions that it needs to be responsible for, right? That this would be the end or boundaries of the locations or the range that these different partitions would be responsible for. So every time we have a key that we need to hash, right? Try to figure out what which partition it belongs to. It will actually first, again, to hash this value between zero and one, right? No matter how many nodes you have, you always hash this value between zero and one. And then you will just round up this decimal value to the closest partition that related to this value of this hash key, right? And in this case, after hash this key one, it lands somewhere at the top and then after you round up this value, I mean, this would be assigned to the partition one, because that's the first partition encounter. And similarly, assuming that you have another key that would be hashed to a different location in this ring and then you will round up and assigned to partition three. And then how to decide, as you read here, what this does is that essentially for this blue region as I draw here, right? If a key after a hashing lands to this blue region, then this key would be belong to this first partition, partition one. And similarly, for all the keys that becomes all landed in this region after hashing, it would be belong to partition two. And similarly, for all the rings, for all the keys landed in this region, it would be belong to partition three, right? So that's good, right? Make sense? So what's interesting here is that what if now I want to add a new node, right? So before, in the old scenario where I modeled a number of nodes, I mean, after I hashed the key and then after I add a new node, everything could change. I have to move everything around. Now, after I add a new node in this consistent hashing scenario, I just, I mean, pick a new location in this ring. And typically, you would pick a location that has the largest gap, right? And then, but at high level, you just pick a new location for this node. And then what you need to do is that after you assign this location for the node, this node would be resubmissible for everything before that node, I mean, up until, for example, here this P4, P4 would be responsible for everything before and up until P2 in this range, right? So what do we need to do is that you only need to move this data, so you only need to go to this node P3, right? Because originally, all data here belongs to P3, right? So you only need to go through P3 and then read every single record and compute their hash value. And then if the new hash value lands in the region between P2 and P4, then you will just move that data around instead of touching many other nodes and move everything around, right? So in this case, this would actually very localize the movement after you add a new node. And similarly, right, say here, you add a new node node five and then you would just only look at the original data reside on node one and then move that portion of data land to the new region belong to P5 to the node node. And similarly, if you add a node six here, it could do it that way, right? So with this consistent hashing, even though you are keep adding node, but you can always look at a region that is like the biggest on the ring and trying to move data from the current node that has potentially the most number of records to the new node and localize the movement that way, right? That would be actually be much simpler and much more efficient than moving all the data around where you just simply mod a number of nodes in your cluster. That makes sense? Okay, nice, nice. So one additional problem, if we will, I want to talk a little bit here is that we'll get to the details in the next class when we talk about concurrency control, but additional interesting property with this consistent hashing is that in distributed database system world, often time, you will actually be replicating a lot of data as well, right? Because, I mean, now you have the luxury of put data on different machines in different data centers, different regions, often time, right? In most cases, users will actually choose to replicate a particular partition of the data to several different nodes so that if one node down, you have the copy of data immediately available on other nodes so that you can read that from the other copies as well, right? So in this consistent hashing scenario or algorithm, it's very easy to handle that as well. Say we want to have a replication factor of three, then in this consistent hashing, for example, your data land in P1, then, right, for example here, right? If your data land in P1, then, besides just looking at the data immediately after the roundup, you can actually copy the data in these three different locations right next to P1, right? It will be P1, P6 and P2, and then you can copy this data with key one to these three different consecutive locations on this ring and then every time the query arrives, you can choose to read this data from one of the copies, okay? So that's what, if one node goes down, then you can also know the data from other copies. So this will actually be very convenient to handle that case as well. And here, I actually want to give you guys a heads up about a consistency issue that also involved in this replication scenario. Again, we'll talk about more details later, but then remember, I think this is like last week or two weeks before, we talk about acid property, right? We talk about this consistency property. At that time, I said that the property of consistency would actually matters a bit more in a distributed scenario than a single node scenario, right? So in this case, exactly, I mean, it's like examples of that consistency issue. See here, I have three replications, I mean, that all have the copy of this data, right? Then every time, if I want to update this key, right? Then I have actually two different choices here, right? The first choice would be that I broadcast this update for all those three different allocations, P1, P6, and P2, and then I wait until all the three, I mean, partitions to come back and see that, hey, I successfully write this record to my partition, right? You wait everything finished and then you come back to the client, right? Tell the client that your query has committed. Then obviously, this will be very consistent, so actually in consistency terminology, this will be called strong consistency, but then obviously the problem is that before you only need to write one partition, but now you need to write to a three different partition and you have to wait everything to finish, right? If there's a straggler, somehow there's a network delay happening on one partition, or the disk has just had a slow write on one partition, then your whole query or transaction will be stalled, right? So there could be a limited performance issue. And then another choice will actually be that every time you want to write, let's say, to this key one, you can still use to rise to all the three partitions, but then you only respond to your client when the first partition return, right? Say P1, partition one is very fast, then you just tell your client you have already successfully finished this query as long as P1 return, right? See, I have committed. Then, I mean, you're just assuming that at some point, this rise to P6 and P2 will finish, right? And reflect the correct value. So this will actually be called eventual consistency. We would actually be a much weaker consistency level as you can imagine than the strong consistency that I mentioned in the first example. But then, of course, this is potentially faster, but then this would have the issue of before the rise on P6 and P2 finish, then the value of P2 and P2 and P6 could be outdated, right? So some transactions, if they are reading a value from P1, it would read a new value. If they read a value from P2 and P6 would be old value, right? So it's not exactly consistent. So depending on application, right? Depending on how fast or how strict your application requires the transaction semantics is, you may choose either a strong or eventual consistency, right? Again, we'll talk about conclusive control in the distributed scenario later and just give you a heads up related to this this asset property that I mentioned earlier. And also, this consistency issue is actually not specific to consistent hashing, right? If you use normal hashing, if you use range partitioning, you would also be a deal with this consistency issue as well, all right? So just an example for that. So I can mention this consistent hashing is actually a very good way to handle a partitioning problem in distributed database. And there are also many systems that are using this. For example, the Amazon DynamoDB, I think it's actually the first notable system that use this and they have a paper on that. And then later on there's this system called Cassandra came from Facebook, also used a consistent hashing to handle their distributed database. And also, there's other newer startup actually I've forgotten the name of them but there are also other newer startups also use this approach as well, all right? So we talk about this consistent hashing. Now I want to give you a few examples of this logical and physical partitioning that I mentioned earlier, right? You just buy partitioning the data, it doesn't necessarily mean that the data has to be on different machines, right? Especially in a logical partitioning scenario which would be very common in the shared nothing, sorry, very common in the shared disk scenario, different nodes could just be responsible for handling the queries of different partition but all the data could be all in this shared disk space, right? So here, for example, I have a shared disk architecture, see that this shared storage space has four records, one, two, three, four, and because it's like managed, for example, managed by Amazon S3 in this shared architecture, I don't really care which machine each of these records is on, right? So what I do care is that for different nodes, they are going to be responsible for different partitioning data, right? So here, for example, in node one, it will be responsible for ID one and two, and then if the application server sends a query to access to power with ID one, it will be go to a node one, right? And then it will just go to this shared disk space to get this record with the ID one from this unified interface, right? Similarly, if a node want to get access to ID equals to three, it will go to the second node and get access there, right? Again, different nodes, I mean, it would be better for load balancing reason, right? It will also be better for the consistency handling when the different nodes are responsible for different specific records, but at the end of the day, everything would go to this shared space, which may or may not put records on the same disk. Makes sense? Okay. Then the different example right on the country of that would be a physical partitioning, right? That would be more common in this shared nothing scenario where you actually would be directly locating data on the specific disk, right? Local to a submissive node, right? So here, for example, the first node can have a high ID one and two, the second node can have ID three and four, right? And this case, again, similar to, I mean, the early example, if have a request that want ID equals to one, one, two, four with ID equals to one, you go to the first node, and then you have a query or request that want ID equals to three, you do go to the second node, right? And in this case, you actually go to the submissive node and access the record locally, on a local disk, I mean, on that machine, right? Which, I mean, again, if a transaction only access that record would potentially be faster. All right? Okay. So that's that, but that's when all the example, probably one thing you have noticed that in all the examples I showed earlier, all the transactions only access data on a single node, right? And like I mentioned, in that case, maybe I shared another architecture would potentially be better because everything is local, right? But then, but for those kind of transactions, they will be called a single node transaction, right? Essentially means that all the data a transaction needs to read or a query needs to read will be accessing to a disk locally to that node, okay? But then in practice, often cases, right? You is not that ideal, right? Sometimes you just have to read the data that is not on your specific partition, right? That scenario would be called a distributed transaction. And then in that scenario, when a transaction, I mean, a query by itself needs to either would be like one single query or multiple query doesn't really matter, right? But when a transaction needs to access data across different partitions or different nodes in your cluster, then this would actually be very, very expensive and require an expensive coordination, okay? And that's what we are going to talk about next about this coordination of distributed transaction. Again, the high level thing about this is that if all the transactions would be single node, then the distributed database would actually be very easy, right? Because all you need to do is that you partition the data on different machines and figure out which data is on which machine, then you will be done, right? So if you only have single node transaction, things would be very easy. So what would be challenging with distributed database would exactly be this distributed transaction when you actually need to access data on different machines and lots of coordination needs to happen, all right? Okay, so at high level, there would actually be two different approaches to handle these distributed transactions, right? So one would be called a centralized approach. Essentially, you can think of that just a centralized, similar to the Mongo architecture I mentioned earlier, right? There would be like a central config server handling that have information about location of data. Here, the centralized approach, you have a global server or a traffic cop that will actually know the status of all the transactions and which transaction is accessing which data on which node, which transaction locks which, et cetera, and then do all the coordination, a lot and resolving all the concrete sequential issues, right? And then every time the transaction was to commit the centralized node or traffic cop needs to make the decision, right? Then, I mean, kind of obvious, the contrary approach would be called a decentralized approach, right? Where different nodes actually just organize themselves. So every node would actually be resubmissible for coordinating whether a transaction should be committing or not, but at the end of the day, right? Even though it's decentralized, one specific node needs to be resubmissible for a specific transaction that it should commit or not, right? It's just besides a kind of like a heterogeneous centralized approach, every node would has the ability to decide which transaction commit or not, right? That would be called decentralized approach, right? So the very first centralized approach, example of this, will be something called a TP monitors. Actually dates way back to 70 or 90s, right? I think at that time it's called a telecom monitor, but nowadays people actually refer to them as a transaction processing monitor. And actually at that time, there's even no like a mature implementation of distributed database system yet, right? So, but maybe there are many use cases that people need to distribute the functionality of distributed database. So what people do at that time is that many people, for example, for ATMs or airline reservations, where you have different machines or different, sorry, different databases that are holding all the records or the tickets, airline tickets, but then you have different clients that want to access them at the same time, you actually have sort of a middleware, if you will, right? I mean, since between that coordinate the execution of these different transactions to different database servers, right? Again, for each database server, it has its own concurrency control algorithm, right? To make sure that access to that specific database is acid, right? It's like a protected by a concurrency control algorithm. But then for different clients that may access records to different databases, right? Located potentially on different machines, they need to go to this centralized transaction processing monitor and to resolve which record is on which, which transaction needs to be logged and then how to handle all the conflicts correctly, right? So I'll give you a example for this, right? Say I have an application server here, right? And then when the application server needs to access the data, say it needs to access data on these three different partitions, right? It actually needs to send this centralized coordinator that, hey, this query needs to access this data on three different partitions on this cluster of databases, right? And then what this coordinator will do is that this coordinator will actually need to log, well, it'll have a log table and it will need to put log on these three different partitions of data, right? And then only after that, this can send back the acknowledgement to the client and then client can go ahead and modify the content or read and write the content of these three different partitions. And in the meantime, depending on whether you are holding read log or write log, other transactions may not be able to access data on these different partitions. Again, this is like a neural system, right? If you do this inside the digital database system, it could be finer grain at a table level, table level, et cetera. But then back into the old days, right? And with this centralized controller, it will be at partition level. It had to log the whole thing, right? And then you can, yeah, it can access all these different records. So now, see that this transaction or the application finished this query and then wants to commit, right? So what did you do? It actually needs to go to every single of these like machines or partitions, right? To ask the machine, hey, have you finished all your operation? Have you finished all the rise? Am I ready to commit right now, right? And if every transaction was ready to commit, then it comes back to this coordinator, since the acknowledgement and seeing that, hey, I can safely commit and then I can now release all the logs in my log table, right? That's how to handle that. This is like in the first implementation, very original idea. And then actually there are systems still be using something similar to that, right? There's actually one startup called Transac that actually came from CMU, right? Build some sort of a transaction monitor within this fashion. And actually again, came out very early but still be used in many systems, right? And then Oracle also has a version of this, Oracle has also been doing database business for a long time, right? He also has a version of this called a Taxidu and there are also this other company called Omit that is like doing this kind of service, right? But then these are kind of like early implementation of these distributed transactions. And then what happened a little bit later is that instead of it just a simple coordinator where because in the coordinator, like I mentioned this early, like I mentioned this early, this application server actually needs to send the log request to the coordinator to log different partitions, right? And then when the log finishes, the application server needs to send other like an individual request to the partitions, right? To get all the data back, right? This is like this very limited, I mean, there could still be a loss of involvement with the application server as well, right? So later optimization or extension of this approach is that you actually build a middleware that would hide all those logics, right? So this would actually be used more common in many of the newer system as well, right? So essentially in this case, when the application server sends the request to a distributed database, it doesn't need to know which data is on which machine, it doesn't send log request and then it doesn't send individual request to different server or partitions either, right? It just sends a single request just as a single node database and then inside this middleware, it actually maintain all the information of which data is on which partition, et cetera. And also this middleware is gonna be resubmissible to send the log request or maintain log request on different partitions, right? In saying this case again, it needs to log these three different partitions and then this middleware will put this log in this log table. But then there's nothing that this application server needs to do anymore, right? The middleware hides all the details. And after that, the middleware will do all the read and update on the application's behalf, right? And then after that, when the application sends the commit request, similarly, the middleware will ask whether each of these partition will be safe to commit and if yes, it comes back and then commit the transaction. So this will be kind of like an optimized version, right? Of the original transaction processing monitor approach that would hide many of the details. And this actually, this approach will be more common, right? Used in many of the newer systems. The biggest example of this is like, Facebook's actually have the largest MySQL caster on the world that they manage using this approach. So each MySQL would actually be just a normal single node database system we talked about earlier in this class, but then they build this like a function, like a complex middleware that will hide all the details for their MySQL caster, right? That's actually the largest thing in the world. And then YouTube also has a variant of this, also build on MySQL called VATIS, right? That's also, I mean, use this middleware approach. And then also Google also do we, at least Google used to have a variant of MySQL service also use this approach as well, all right, makes sense? Cool. So the next approach would be, I mean, as I use it here, it would be a decentralized coordinator, right? So that would be each node would have the ability to commit a transaction. So again, let us see how you use it here. When an application server sends a begin request, it will actually find a master node for the specific transaction. In this case, it could be P1, right? But it also could be P3 and P4, depending on your load balancing, right? Depending on our routing mechanism, et cetera. But they just pick one master node, right? And then this master node would be resubmissible for all the coordination of this specific transaction, right? And then, of course, this transaction can have usual request to other nodes, P4, P3, et cetera. But then this master node would be resubmissible for recording all those information, which partitions this transaction accessed and then whether it's conflicting with others or not. And then when the transaction was to commit, this master node would similarly, right, issuing this request, whether it's safe to commit, et cetera, and then get back to the client server. But then just in this case, every node in the cluster would have the ability to handle and commit transactions. And of course, you also need to do coordination between different nodes now as well, right? Because when every node can commit, then I mean, since it can conflict and you have to do coordination, which I'll get to get to in the next class, all right? So I would just, I would have like a two or three more slides, give you just a little bit small heads up about the distributed concurrency control that we are going to talk about in the next class, right? So at a high level, right, when we are trying to control the concurrency of distributed transaction, many of the ideas we used when we talk about single node transaction will actually be, still be able to apply, right? Either a timestamp ordering concurrency control, two-phase logging, et cetera, we will actually be applying many of the similar mechanisms. But instead, which is beyond what those are making them, what we need to do here is that we need some additional steps, right, to handle the case where, hey, there's data on a different machine, you may need to put a lock on a different machine. And then sometimes after you acquire a lock on that machine, that machine go down. What do you need to do when the node comes back? As well as what if, like, there are different skills on the, there are skills on the different locks on different machines, for example, if you're using a timestamp ordering concurrency control, again, two-phase logging, optimistic highways control, all those things will still apply, which is there are additional steps and additional scenarios that we need to handle in a distributed scenario, right? In a simple example here, right, like just like, I think this is the last example in this class. Say I have just two applications, right, oh, sorry, two application servers, right, and then we have two nodes, and then connected by the network, say this like one application wants to read record A on this first node, the other application wants to read a record B on the second node, so far so good, right? And if they can lock the corresponding node or lock the specific record on the corresponding node. But now what if that, we encounter a common like a deadlock scenario, right? Say the first application needs to read a B, the second application needs to read A, right? Then, I mean we have a deadlock, we know how to resolve that in a single node scenario, right, essentially we are going to construct this with four graph and then break this up, right? I mean, it seems easy, but then in a distributed scenario, right, C scenario, we need to consider things like, hey, these different logs may own different machines, right? Then how do we coordinate different logs? And then also, there's maybe a network delay on these different machines, right? So network communication may be costly, we don't want to send messages and back and forth very frequently. And lastly, when one node go down, we also need to handle that, hey, how do we maintain the information and handle it correctly, right? So we are going to more about those details next class. So in conclusion, right, so as you can, being alluded many times, the distributed database system, especially when you have a distributed transactions, that's very, very difficult. And in this class, we pretty much only scratch the surface, right, talk about the high level concepts. We are going to talk about more details in the next class and the next next class, but then again, I mean, this distributed transaction or distributed database could very well be its own semester long class. So in this class, we are going to focus on the high level concepts. And then next class, we're going to talk about the distributed concurrency control, and we're going to talk about the replication cap theorem and give you some more real-world examples as well. All right, thanks for that. And looking forward to see you guys on Wednesday. Yeah! I see Jay talking about the Sanans rule, run through a can of two, share with my crew is magnificent, bust is mellow, and for the bust of the commercial, I pass the mic on to my fellow. Four of my check closet, the beads all set to grab a 40, to put them in your snap snacks. Take a sip and wipe your lips to my 40s getting warm. I'm out, he got still tipped. Drink it, drink it, drink it, then I burp. After I slurp, I scoop, I put in much work. With the BMT and the e-trump, get us a Sanans brew on the dump.