 Is there an event I'm missing on campus or what is it a free cake? That's pretty weak. All right D-day drop tables. Thank you. Oh All right All right, let's for those are here. Let's let's get through this So again, this is just the the the outline for the remaining things for you guys in the semester Again, these dates are all available on the website and then the extra credit for the First feedback review of the extra credit will be due this Sunday And then I also posted on Piazza last night if you haven't done this already Please go vote for what database systems you want me to cover on the last day. Yes Results of what sorry the results of what sorry Results, oh, yeah, right. Yeah. No, I have not heard back yet. I will try if it becomes in I will announce on the last day of class I Send it I send it on Monday. I don't it takes whatever a week or two or something. So we'll see Any other questions related to class? Again, I don't think it looks like me I This is a my wife was a certain lifestyle where I would have questions. Okay, so Last class we talked about it was sort of introduction to two distributed databases And sort of the main three things we focus on was What's the system architecture of a system looks like we talk about shared memories shared dish or nothing and I said that Most distributed databases are actually every distributed basis out there is either going to be shared disk or shared nothing shared nothing is Traditionally the more popular approach people take with distributed databases, but shared disk is becoming more prevalent in cloud architectures Then we talked about how to partitioning or sharding the hash partitioning range partitioning around Robin just in a way to take a database to break it up into Disjoint subsets that we assign to different different different nodes And then we talked a little bit about the end about how we want to do transaction coordination Whether we have a centralized approach that has a global view of what's going on throughout the entire system in the context of What transactions are trying to do or decentralized approach where the the nodes themselves are responsible for figuring out You know whether things are allowed to commit or not so the last class all these topics are sort of We've accepted me for transaction coordination, but it for the most part everything I talked about last class are applicable to both distributed databases that are designed to run transactions Or distributed bases that are run to run designed to run analytics So for this class and then money's money's class next week We're now going to buy it up talk about specific issues for each of those two classes of workloads Because there's different trade-offs that they're going to make that may be good for transactions But not good for analytics and vice versa So again just as a reminder for for what I mean when I say transaction processing for some analytical processing Oh, to be honest the OLAP again I think we've covered this a couple times throughout the semester just to reiterate this the dichotomy so that everyone's on the same page In all to be workloads. We're worrying about operational operations that are trying to update Or read a small amount of data in the database So again using Amazon as an example when you go to the Amazon website you add things to your cart You make purchases for your account you update your payment information all those operations or transactions that are only touching You know as we know when you invoke those changes. They're only touching your data So the Amazon database is quite large, but for your transactions to update You know do your operations that the amount of data you're touching is is small and Essentially the database system is doing the same same set of operations over and over again Right because you're going through the website application code that that you know when you click on you know add to cart That invokes a function on the application code, which then goes through and executes the queries to make those changes In a lot workloads This is where we're not start doing analytics to try to extrapolate new information from all the data We've we've ingested on the OLAP side So again using Amazon as example an analytical workload would be something like trying to figure out. What was the most popular item? For Carnegie Mellon students during the month of November When that when the temperature was was you know above 30 degrees? So that's not something you do an OTP because that's you know, that's not a transactional thing This is something you do in the OLAP side So these workloads are the queries much of running longer because they're touching more data They're doing joins are doing aggregations and oftentimes they are one-off queries Because someone's trying to say you know answer that question What's the most bought item for a particular group of people and you know They're filling out some dashboard or using a analytical tool to compose the query and firing that off and made the data Some may never see that query ever again So again for today's class we're going to focus on the first part of the next class. We'll talk about OLAP so the Again just go at a high level discuss the what we're talking about in distributed database or we're going to focus on today We talked to the setup before we have some partition database Whether it's logical partitioning or physical partitioning Meaning is it shared nothing or shared disk? We we you know, I'm not explaining just yet. It doesn't matter for what we're talking about here today But the scenario we're concerned about is we have a we have a application server that wants invocable transaction It picks some partition node to be the master one so it tells that guy Hey, I want to execute a transaction then it goes ahead and does a bunch of updates or reads a bunch of data on our various partitions And then now when the transaction complete it goes to the master guy that it started off with and says hey I want to go ahead and commit and assuming this is a decentralized architecture meaning we don't have that middleware We don't have that the TP monitor that's coordinating all our transactions Now these nodes have to figure out amongst themselves whether they're allowed to commit this transaction so last class I was Very vague about this step here. How to determine whether it's safe to commit and what does it mean to say? Hey, we're all going to go ahead and commit this transaction So this is what we're going to primarily going to focus on today this this last step here And so essentially what we're trying to do is we're trying to have in our database system. We're trying to have All the nodes agree that we should commit a transaction And if everyone agrees that we commit this transaction and we go ahead and commit it We don't want any sort of weird anomaly or weird reversal of You know one note that is going to commit and then all of a sudden it doesn't that transaction got rolled back Once everyone agrees that we're going to go ahead and commit then we go ahead and commit this So now there's a bunch of issues that we have to deal with in order to make this happen correctly and safely When we were on a shared everything system meaning we our database system was running on a single box And we wanted to do our you know a validation protocol for OCC concurrency control the All the participants in deciding whether this thing's allowed to commit was running together in a single machine all in the same memory Possibly and it was really fast and for us to figure out whether we're allowed to go ahead and commit And then if we said commit then it truly was committed because we know everything was on that single box But now in a stripping environment we have the issue of let's say we go ahead and say to commit Everyone comes back and says we go commit and then maybe I you know during that time one no goes down What should happen and all the same asset properties you talked about before That we don't want any partial updates that you're persisting to do our database all the things we have to account for So if a no goes down, right? We've got to deal with that But what if the node doesn't go down and instead our commit messages just sort of show up late Right the packet got delayed somehow on the network on the way over or which is probably more common Say our database system was was you know using the JVM like it's written in Java It's written in Scala it's using the JVM and all of a sudden the JVM decides to do a real expensive garbage collection suite And now our process pauses So we're gonna look like we're Unavailable while during this GC GC pass and then all of a sudden we come back at the GC and now our messages are arrive and It's you know a second has passed and then what happens if we decide that Have we determined how many nodes have to agree that we're going to commit a transaction to decide that we committed the transaction Should be all of them should it be some of them Right, so these are the things that we're going to we're going to worry about today So one important something we're going to make about this entire lecture is that we are going to assume that the software running on our nodes in our Distributed database or our friends meaning they'll be well behaved. They're not going to screw us over Right there. There's it's software that we we as the database system developer software that we wrote And that we've deployed under the same administrative domain So we ask a transaction we ask a node to commit a transaction and they come back say yeah, we're going to commit that we assume You know Modulo hardware failure or you know software bug we assume if a node tells us they're going to commit a transaction They will commit that transaction And that's going to simplify in some ways how we're going to do our commit protocol If you assume that the nodes could be bad actors Right if they say yeah, we committed that we're going to screw you. We're not actually do that Then you don't need some of the things we'll talk about today You actually need what it's called a visiting fault tolerant protocol And this is essentially what the blockchain is under Bitcoin Right a blockchain is essentially just a distributed database It's just a law where you append things with transactions But in that environment you assume because it's Bitcoin mining right or whatever you're trying to do that the Participants in your distributed database are not your friend and they could lie to you so you need a way to deal with that We are not in that world Most database systems distributed systems are not in this world most of your devices can assume that everybody is going to be Is going to play along correctly? Most people don't need a blockchain very few things need a blockchain so if you think you're building something and you use a blockchain Rethink your life. Okay All right, so the things we're talking about today are listed here So yeah, I didn't say this last class. I just want to sort of say it again I'm trying to cover in three lectures what would normally be like an entire year of studying distributed databases So we can't obviously cover everything in detail and I consider my goal here So just to expose you to like the issues the problems the difficulties of Building a distributed database even using a distributed database system so that you know when you leave CMU and you go out in the real world If you find yourself in a situation where you think you either need to use a distributed database Or you think you want to build one usually know what are the issues you should be thinking about and and so you can reason about You know whether you're doing the right thing And I'll sort of say up front that most people probably don't need a distributed database They're you know, obviously some useful cases, but I would say I mean I can't prove this But 90% of the world's databases can run on a single box now You probably should have replication and once you bring that in that becomes a distributed database But most of the times you don't need a partition database most workloads can handle we handle on a single box right, all right, so Let's talk about how to commit protocols how to get everyone to agree that we're going to commit How do we handle replication to make sure that we have multiple copies of our data So we can always stay online then we'll get into the cap there and talk about consistency issues How do we know what kind of guarantees can a distributed database and Provide for us given our commit protocol and then we have time at the end will quickly talk about federated databases It's the idea of composing disparate databases together and to make a single database instance. Okay, all right so that example I showed in the beginning of When we went to go ahead and commit the transaction and then the the window had to talk to the other nodes and say Hey, is it safe to commit? This is what it's called an atomic commit protocol The idea here is that we want to get everyone's feedback that participated in our transaction to decide whether it's okay to commit that transaction and then if if one node or enough nodes depending what protocol you're using if a Certain amount of nodes above the threshold that we're going to define in our protocol all the greed that we should commit this Transaction, then we tell everyone work. We will commit this transaction and then it becomes committed So there's a bunch of different variants of of an atomic commit protocol that you can use So the two that we're going to focus on is to face commit and Paxos to face commit is probably the most prevalent one Right goes back into the 1980s Paxos has certain guarantees that to face commit cannot provide Some systems can use this but it's this is sort of a degenerate case of this There's also a three phase commit that was actually developed by Mike Stonebreaker the guy had been a postgres in the 1980s No one actually ever does this it's too There's too much you know too much network traffic There's actually a four phase commit as well from Microsoft. They use in this distributed. It's called farm They have to do that because they're using RDMA sort of special remote memory access again We're not going to cover that okay Paxos. We'll talk about raft is was developed by Stanford about 10 years ago as a More easily understood variant of Paxos, but it basically provides the same guarantees Raft actually shows up a lot more often in newer distributed database systems because there's a lot of Existing or that people basically wrote like Lib Raft, right? They wrote libraries that implement Raft that you can then incorporate into Their database for a ton of different languages like there's no Lib Paxos that everyone can use Zab was developed for Apache Zookeeper and then view stamp replication is not that common But this actually turned out to be the first prudently correct Atomic bit protocol it actually came out before Paxos, but people didn't recognize that the properties this thing had until Paxos came along much later so again for distributed databases that are not the blockchain that are actually you're probably going to counter in the real world is Mostly you're most likely to see two phase commit or Paxos and then for newer systems play a rap But for this lecture, we'll just cover those two there to face commit in Paxos So actually quick show of hands who here has ever heard of two phase commit before? All right, less than half. Okay So to face commit sounds exactly the way, you know, it sounds right? It's a two-phase commit protocol has two phases So let's look at an example here where we're going to have everyone agreed to commit a transaction So assume at this point the application server has executed whatever queries that it wants to to make changes on the database Or read whatever data wants on on our different nodes and it wants to go ahead and commit So it's going to send a commit message to this guy here. It's soon. This is the master node So on their two phase commit vernacular, we are going to say this guy is going to be considered the coordinator so it's in charge of Asking around to its friends involved in the transaction whether it's allowed to commit this transaction and then the other two nodes here We call it participants Now I'm not going to show you examples of this but the node that participant node itself could also be a participant right this the This node here could also have been modified by this transaction And then it's involved in this two-phase commit process or some simplicity assume that this transaction here only modified data on the on the two other nodes So in the first phase called of a pair phase we sent out a network message to our Participants from the coordinator to participants to ask them. Hey, here's this transaction. We think you know about it Is it okay to commit and They're going to do whatever validation or whatever they need to do to determine whether this transaction is allowed to commit and if they determine That it's okay, then they send back an okay message Then now once you get back the okay's from all the participants the coordinator goes into the second phase called the commit phase Where it tells the all the participants. Hey good news Everybody said we can commit this transaction go ahead and commit this and then likewise these guys now have to send a response To say okay, we did that this transaction is committed and then at this point here When we get back in the second phase the okay's from all our participants We can then go tell the outside world that our transaction has successfully committed So there's one thing I'm not showing here And I think that the textbook talks about this is that at every step of the protocol on every single node involved in it We're writing out log messages to keep track of what messages we got and what message what responses we sent out So at this point here when I said hey, we'll go ahead and commit this These guys are going to write a log message and say hey for this transaction I saw I got a I entered the commit phase and I said that was okay to do Right so that way if we crash and come back we would say oh we were involved in this transaction How far in the two-phase commit process did we get to determine whether we need to undo it or redo it? And so another point distinction about this and this will differ from Paxos and a few more slides is that all of the nodes That all participant nodes in the commit protocol for this transaction. They all have to say we have to commit this transaction It's either everyone or no one so we go to the next example here of one We will we have an abort it's again same thing my transaction finishes I send the commit request to my coordinator the coordinator enters the first phase sends the prepare message to the two Participant nodes, but let's say this bottom guy here for whatever reason You know for its concurrency of protocols the size that we cannot commit this transaction So it sends back an abort message So as soon as the coordinator gets the first abort message from any of the participants It is no longer in the prepare phase and now it immediately goes into the next phase in this case were for the abort and so at this point we can really go back to our Client our application say hey this transaction can't finish we're going to abort Even before we go to the second phase or even before we hear back from anybody else One abort message will kill this entire thing So now on the abort phase we say hey, but we're boarding this and then everyone comes back says okay. We've aborted and At this point the the transaction is done So the idea is here that we need to network round trips to get everyone to agree that we're gonna commit this transaction And then we go ahead and or commit or abort this transaction and we go ahead and then apply that change. Yes The last part of the So this question for the border commit so question is do I need this second round here? To tell that the do I need to go to the nodes here and say yes You've committed before I tell the application or before I tell who No Right so his question is Sam here. I'm in the prepare phase this transaction should commit. I said I send my preparatory request these guys send back okay is Do I really need to wait for the next round trip to say? Okay, go ahead and commit this and get here back from them before I can tell the application. I need to commit in Practice no for absolute correctness. Yes because like if I if I If I crash at this point here Right, I say it's actually back it There's a trade-off between the time it takes to recover and the time it takes to Send a response so and if I'm logging everything to disk Then you're right as soon as I get back here if I crash at this point here I have my two okays for my participants. I would have logged that to in the coordinator So now if I crash I come back and I would say say the the coordinator goes down It would come back and look in the log say oh, I saw this thing, you know, I got these messages that I commit or not Okay, well make sure I'm I apply any exchanges if one of these guys goes down They would come back, but then they would they would still need to know whether you know whether it actually truly committed before they get Actually, you know redo everything So for absolute correctness. Yes for performance. You don't have to do that. Yes I So this question is if I'm here and then I send back so I get to this point everyone agrees that we're going to commit We're going to write on disk in the coordinator that we've committed that everyone agreed to commit So therefore we could enter this the second phase and go actually go ahead and do the commit but then I crash Before and then I come back and I can reapply the change and I didn't tell this guy he committed So that's so that is actually not a guarantee that we can provide in our database system Even if this is true whether it's single node or distributed So if we get to the point where everyone we're you know, we flush the log to disk We everyone agreed that we're going to commit but then before we tell the outside world We've committed we've crashed and that number of numbers it doesn't go that trash transaction It's still considered to be committed And it's up for the application code to then come back and figure out whether the request and it actually committed or not We can't guarantee that right because how do we know we like least keep in this case here Like this message could get could get lost The app the database server is shouldn't be responsible for figuring that out Yes From which one like for this one for this one this the second phase No, you can it's just like just like when we abort the transaction On a single node soon as I know it's I am aborted why wait to flush any CLR or anything at the disk I just tell the outside world immediately you'll board it and who cares right because that's it's as if the transaction never executed So we tell it right away. Do you have a question or no? okay so Actually related to his question Earlier about the comment like do I really need to wait for the ok's before that I tell the outside world that I committed This is this is actually two of the optimizations You can actually do to sort of speed up the protocol and in exchange for having a longer recovery time or That's the second one here. That's what he proposed so the first optimization you can do is called early prepare voting so this would be where if I know that my my application is is Sending the last query that is ever going to execute to one of the participant nodes Then in addition to it to me sending the query I also piggyback another message that says oh by the way I'm never coming back to you to ask you to do anything else So go ahead and send me your response as if we were as if we were in the prepare phase for two phase commit so now it's one network message to execute the query and Run run the prepare phase and then my response I get I get the result of that query plus the result of the prepare check So this obviously requires you to know in the application server that you're never going to go back and run another query on the On that on that node What he was proposing is the Early acknowledgement after prepare so once you know and on the coordinator that everyone agrees ever to commit this transaction Then you can really tell the outside world your contract transaction is committed and then you take care of the of the commit phase So again just visually it's like this. I do commit prepare phase everyone votes that it's okay Then now at this point once I get my two responses for my participants I can go ahead and tell the application I committed and the idea here is that The likelihood that I'm going to crash for this round trip here is Low So therefore I it's okay for me to go ahead and do this If I now crash during this you know say Before I hear back from this or before I head back from these then I have to do extra work to figure out Whether I truly actually committed and resolve the thing correctly, but that's okay. Yes If This question is if a transaction if we're in this phase here if a pair I get my prepare message participant says, okay I'm gonna have to commit this Your question is what are they logged at this point? Yes Yes, yes Yes Okay, so his question is so I'm here I Send back my okay. I Tell the application. Okay, you've had it you go ahead and committed But now this participant crashes in the log. It says I told the coordinator. I'm allowed. I was going to commit What do I do when I come back? Well, this you assume the coordinator would know all right Well, this guy crashed now he comes back and in my in my log here I would I would then say all right. Well, I saw commits from everyone This transaction should commit so you would have a coordinator fill in the missing information as needed on on the participant participant when it comes back up Right so you would get here the commit message and then that would that would never show up So this is then it dies so this requires you to have so we're not gonna tell you this this is more distributed systems Stuff but it's we care about that we need to hear so this one You can use like a heartbeat just to keep track of like is the note up to have this thing determine whether like I haven't heard Back from them a while go into some recovery mode or failure mode to handle that case. Yeah Yes So it says your question is why do we why do we send the So that's what I showed in the original question is why are we not sending the success message after the commit phase? Yeah, so that's like the the original two phase commit protocol. That's how it works Right, but think about again I'm not saying also where these nodes are located could be in the same rack and the same data center could be across the world So rather than me waiting for this the next round trip, which could be a hundred milliseconds Maybe longer I'll just go ahead and send that. Hey your transaction committed Because I assumed I'm not gonna crash during this time and I had the recovery mechanism that are necessary To then handle a fair fair scenario that he mentioned to recover myself if I come back after a crash Yes Start receiving instructions the timer proceeds prepare like can it keep all of that change in memory because it doesn't actually need to flush anything else So your question is if you're here So you've told the outside world you've told the coordinator you want to you want to commit What is actually what is this node actually doing? Could you just keep everything in memory? Yes, his question is instead of I'm being vague here But like in the original two-phase commit protocol like I I logged a disk then send my response But nobody does that right so in theory you could just buffer the log messages If it gets flushed out as a group part of group commit who cares But I know that if I crash to come back and maybe some of those log records that told me what I was How I voted for committing this transaction get lost I could come back and the coordinator would have could could Put me up get me up to speed where I let me know fill in the missing details That that I lost yeah, you could do that as well Yeah, I don't think anybody does the hard core like flush every log record on every single node every single time nobody Nobody does that So as far as I know okay, so just to reiterate everything we talked about today, so the as I said the The nodes can record what happens in each phase and what messages they receive a message. They send out to a log And that allows them to fill in the missing details when you come back after a crash So if we're in while we're running our transaction and the coordinator crashes Before we tell that you know before we resolve what actually happened It's up for the participants to decide how to how they want to proceed so the simplest thing to do is like if the coordinator goes down assume the transaction aborts and We just you know you roll back any changes But you could have the participants recognize oh our coordinators down and our transaction still open So somebody could become the new coordinator and then figure out how everyone voted and then decide whether you want to commit that transaction or not Yes So her question is what what if we're here And Say this guy sends a commit message it arrives at no two but before it can send the other one to know three it crashes What happens right so again the first option is I? If we can recognize the coordinator crash whether it's a heartbeat or timeout or whatever we just say this We had this open transaction. We're gonna abort But at this point we've told once we the coordinator says one one node Hey, this transaction is committed that is the the ground truth of what actually happened So it's now it's up for this node then coordinate with everyone else or tell everyone else Hey, this is actually I think the coordinator said I transaction committed. We should actually go ahead and commit this Right again. This is what I'm saying that like we this doesn't work if our nodes are malicious It only works that like everyone's on you know playing on the same team so that we hear we hear one commit message From this guy and that should be enough to validate everyone else and tell them yes, we should commit this Okay, so then now if a participant crashes So for this one again under two phase commit We just assume that the participant is gone and we replace their missing Response with just an abort and we go ahead and abort the transaction Right that's sort of the simplest thing to do So the key thing to point out here what's happening is that? the nodes have to block until they find out what's supposed to happen and The weight you know to avoid blocking forever you just have a sort of a timeout But how long you set that you know can vary depending on the operating environment You just have a timeout to say all right. I haven't heard anything about this for a certain amount of time So we get we go ahead and abort this transaction So you could have a you know live luck issue or an issue We're like just not making any forward progress because your nose are sitting around and waiting So an alternative to two base commit to use two base commit is probably Certainly any distributed database built in the 1980s and 90s is amusing to face commit The newer ones can use variants to base commit or could use Paxos or raft To base commit is a Sort of a subset or generative case of of Paxos and hopefully it makes sense when we go through it So with Paxos comes from the distributed computing world So instead of calling this an atomic commit protocol They'll call this a consensus protocol But the idea is the same you're trying to get a bunch of nodes to agree that this is the correct behavior This is the correct change to our state machine So what's going to happen is under Paxos is that you can have a coordinator propose Whether transactions allowed to commit and then a bunch of participants are going to vote whether it's going to succeed or not Whether that transaction is allowed to commit But under Paxos we only need a majority of the nodes to agree that committed to transaction in two base commit We need all of them Right, so now what happens is that as long as you have a majority of nodes Agreeing to commit a transaction you don't have to block the entire system or block the entire protocol You can still make forward progress whereas again two-phase commit you would have One one having one participant become unavailable blocks the whole thing So Paxos the story of Paxos is quite interesting So the the first description of Paxos is in this paper written by Leslie Landport Who won the Turing Award a few years ago called the part-time parliament? So I think this paper is dated it says 1998 But he actually invented it in to in 1992 and what he was trying to do he was trying to come up with a Proof by contradiction of an example that shows you couldn't have a Consensus protocol with this fault tolerance property and they end up in the process of actually inventing one so if you ever read this paper, it's the craziest thing because It's it's written as if he's like an archaeologist and he finds he goes this Greek island of Paxos and he finds This these stone tablets and he derives what the actual the algorithm is from these ancient symbolization, but it's not a computer science paper It's all like this this this Illustrative story so the story goes is that he wrote this paper 92 with all this like the Greek island stuff The reviewers hated the story wanted him to put you know rewrite it to make a more computer science He refused and he didn't make any change. So he retracted the paper put it in his his His filing cabinet it didn't touch it for like six or seven years until people started publishing papers that look a little bit like Paxos So then he pulled it out and says ah ha you're you're all way off or you're close, but I've already solved this problem So that's what that's the story you get when you read it from Leslie Lamport's website when I was in grad school What am I I took a class on Maurice Hurley? He the guy that invented linearizability Transactional memory he used to be a professor here here at CMU and he said he was actually one of the reviewers of this paper And he said they were the they were okay back in 1992 with all the Greek island story They just wanted him to have an appendix with just an algorithm to show what the thing actually was and like a you know brief description of it and Leslie Lamport was so stubborn. He didn't want he thought the paper is so perfect. He didn't want to change anything So but it's an interesting paper. You should definitely go read it. It's a it's probably If you read you you reason read this for amusement, you're not going to learn anything from it at least I didn't Um, and then he has a follow-up paper called Paxos made simple. That doesn't help either It's the Google one Paxos made live for me. That was the one that actually Clicked and now I understand what the protocol actually was So let's go through a brief example. So for this we're now going to use a an additional node Right because we need to have a voting majority And then the difference is going to be now when we get a commitment crest under the Paxos parlance instead of under two base commit We have a coordinator Paxos calls this a proposer and then a set of participants. They're going to call it acceptors So this the the proposers going to say hey, we want to go ahead and commit this transaction. Is that going to be okay? So now let's say in our example here this middle node here crashes and goes down So now the first two nodes come back says yes, we agree to go ahead and and commit this So under Paxos, we just need a majority so two out of three nodes in this example here agree to commit this transaction So that's enough. That's enough for us to go ahead and try to commit this on our two-phase commit If this guy goes down we we have to abort the whole thing So now we get the majority to agree we go ahead and commit and they come back and say yes We accept to make that commit and then now we have to send back our success message So we actually for this one we have to wait until we get that the second phase response back We can't shortcut it in a way you can in two-phase commit because we actually could come back and get rejected in in the second phase So let's look at a different example. Let's look at it in terms of a timeline here So let's say at the exact same time. There's two different proposers in our distributed database And so the first guy is gonna say hey I want to keep this transaction and sort of think of what they're really doing in the state machine They're just appending a log message and say here's the change we made the state machine is the database So the reposing that this transaction should commit and therefore it changes should be applied to the database So I'm moving the state forward So it says I want to commit the change for a time cmn So it goes all to the acceptors right and then they come back and say yes We agree to go ahead and commit this change But now this other proposal comes along says I have the other transaction that made a change And its timestamp is m plus one So it's a logical timestamp So now what'll happen is if this guy comes back and says hey, I want to commit n because you guys all agree to commit this Let's go ahead and commit this They're gonna reject it because they saw n plus one So even though they don't know what the outcome of m plus one is going to be Just the mere fact of seeing a new proposal for committing a transaction are changing the state of the database That requires them to all abort or reject the the one they all agreed to before So then now we send the agreement to commit n plus one and then he goes says I alright great Let's go commit n plus one and then once we all accept this then At this point here the transaction is actually committed. Yes Her question is yeah, so it's Question basically asking what is this n and so if I come along with n minus one would that be be immediately rejected? Yes now how you actually have a logically in you know a a globally Valid timestamp or counter that everyone agrees to go you know go up in the correct order so that I can always move forward in a time The simple way to do that is you assume your clocks are reasonably in sync and You could you could append a logical counter to it maybe prefix the host name so that you can break ties There's standard tricks to handle this Yes Right so her question is couldn't this go forever in theory could couldn't these two Proposers just keep clobbering each other back and forth. Yes What you will handle that next slide? Yes So your question is does it matter if there's say there's a majority of acceptors or How would that work yeah, so if one guy if one of these acceptors sees n plus one they have to reject n Yeah, I think they cover Paxos in the distributive systems class right because you actually have to implement it So I'm being again. I'm going through this very briefly just to show the distinction between two phase commit and Paxos that like You know the idea the high level idea is the same except that under Paxos You know you still get rejected in the second phase and the you have a majority have to agree So now her question is her observation is in couldn't I get starred forever if I just have two Proposers clobbering each other by posing you know n plus one n plus two n plus three and everything just keeps getting rejected Absolutely, so the way you handle this is called multi Paxos So the idea with multi Paxos is that you you select some node to become the leader for your Paxos group and then it's the sole node responsible for for proposing changes to commit transactions Right. It's the one that they were thinking this it could delegate it or designate it as the I was like the coordinator Or the the the middleware piece that everything everybody has to go to to determine whether they're allowed to commit or not and You have a lease on being designated as a leader like you know some 60 seconds or so and then after that 60 seconds is up You do a round of voting which is another another round of Paxos to determine who the next leader is going to be and Then once that that's resolved then you go ahead and have you know that that new designated leader be the responsible for all for applying all the changes so this avoid this avoids that that starving starving issue because The leaders have been the only one proposing changes. Yes Correct, so she's like isn't just moving the problem because now can you get started for the leader election, right? so Again, we assume our nodes are friendly, so we just have right in our in our database system. You say all right well the after my lease is over I'll try to be the only one to to To you know vote myself or propose that I could be the new leader How do you handle two guys calling the same time? Yes, but you just sort of back off So I tried I got rejected so it said to me immediately try to reject or immediately try to propose it something new Maybe I'll wait 10 milliseconds. If I propose again, I get rejected. Maybe I'll wait 20 and just you know do it that way Yes The question is how many proposers can you have? Then as you want The algorithm doesn't say anything about about limitation in practice. It's like again for the call it a Paxos group You typically would have one under multi Paxos to avoid this start starvation issue If we cover spanner at the end of the semester over the system property, which every year we always do I'll show you how you can do Paxos with that Question over here, okay, so the The main takeaway from this is that so with with to face commit and Paxos You can both use them to commit transactions to determine whether everyone agrees We want to go ahead and commit transactions in practice usually for distributed databases that are Local to each other like meaning they're running under the same data center where they're not like you know over widespread geographic regions to face commit is what people mostly use because the the number round trips Could be less and you assume that maybe the nodes aren't going to be crashed as often Again, there's much extra failure scenario Cody at the deal with fair hand on code to deal with like you know The coordinator goes down participant goes down. So it's you know You know, it'd be slightly faster than Paxos It's still there's so much it could stop you to do to make sure that you don't the whole system doesn't go down And you don't lose data And as I said before the the inventor of Paxos Leslie Lamport and Jim Gray the guide event to phase locking They had a paper in the early 2000s before Jim disappeared that showed that To base commit is a degenerative case of to base locking right just sort of think like I'm sorry degenerative case of Paxos right the the coordinator. It's the same Paxos round of of of voting It's just everyone has to agree rather than the majority Okay All right, so let's talk about replication now as I said in beginning most people don't need a sort of a Partition distributed database to handle the workload Most of the data says you probably encounter the real world will probably using you know Some kind of replication and I would say that still counts as a distributed database So the idea here to distribute a database or sorry replicated with application is that we want to make multiple copies of every Object whether it's a page or a tuple or a table, whatever you want and store them on multiple nodes So that if one of those nodes goes down we have a backup available for us So we don't have to wait for the system to reboot and replay the log to put us back in the correct state We could just fail over using Paxos to decide who to fail over to To determine what becomes that you know that the new location for writing data So there's a bunch of design decisions We have to think about when we want to build our replication scheme. So we'll go through each of these one by one So the first first issue is what however you can configure the system Configure the replicas in the systems and where do the reads and writes go to So the most common approach is use what is called master replica replication sometimes called called leader follower Used to be called master slave, but people try to try to avoid that term and the idea here is that there's some designated master for that for a given object in the database and all the rights are going to go to that master master node and The master node is then responsible for propagating those changes the updates to its replicas Now all the reads can go either to the master or some systems that can also go to the replicas So you can offload the work you have to do in the master because the rights could be could be very expensive So as I said if now the master goes down Then we hold a Paxos round to do a leader election to determine which replica becomes the new master And that's where all the rights go to question His question is will the system have a ventricle consistency? No, we'll get there did not necessarily No, no, no, no, not true humor slides Okay, the other approach is to do multi master where we have replicas stored in different machines and transactions are allowed to write to any of those replicas right sometimes sometimes called multi home and then now that's the replicas are responsible for determining if you have two transactions that Try to update the same thing running the two different replicas. How do you actually coordinate decide which one should actually commit? Which one should abort? How do you actually deal with conflicts? So let's look at these visuals again master master replica You know that you have a master node all your rights go to this guy and In some systems all the reads go here as well And then this just then propagates over the network the update information to it to its replicas So that can get applied And if for some systems again, you can have the reads go to the replicas so that you reduce the amount of work You're doing on the front end. So if your reads don't need to have the most up-to-date latest information Then you can all flow them to to these other guys here, right? This is still be Potentially consistent by meaning like I can have if I have snapshot isolation I can be guaranteed that I'm not seeing torn updates or partial updates from transactions still running on this guy here So I still can guarantee the consistency of the data. I'm reading on my replicas It just may be the case that I'm not seeing the latest information that's on the master The multi master approach is that again we have transactions can you read the rights to any copy of the data and then there's some Procedure to resolve the conflict again using Paxos or to face commit to decide You know tip you have overlapping changes on these two replicas. What should be the latest version? So there's as a quick antidote Facebook originally used to use this month of this Master replica setup for their giant data center, right? The the main data center was I think you know in California and then across the different around the world they would have replicas that would follow along the the The the master and get updates to you know to propagate the changes so that you can see things and the way they would Fake it out to make it look like your changes happen real fast locally like if you updated your timeline They would store that as a cookie information in your browser So that if you refresh the page you would see your update even though had may not been propagated to the To the replica where you're reading your timeline from right because it takes a bit of delay for the right to show up here And then to get pushed out to the replica and now probably five six years ago Now they do the the multi multi master setup. So an important concept in our with replicas. Yes Yeah, so There's a there's a lot of public information about this for years ago So the way like if I write a post like you know in my timeline If I refresh the page and I and I'm saying I'm in Brazil and I'm reading the I you know my local data center has is a replica of the master So now if I refresh my page and come back I wouldn't see my post because it hasn't been propagated from the master to the replica because there's there's always delay for this So people would then you they want to avoid the issue of someone posting in their timeline Hitting refresh and then thinking their their post went missing because now you're reading from this, right? So the way they would handle that is they would actually store what you wrote in your browser cookie And then fill that in as if it was coming from the database It's not Now Yes If they quit their browser and assuming the cookies got got blown away, and you cut All right, it's it's put even better scenario I I make a change my timeline on this machine and I have another machine right next to it And I hit refresh on that machine it would go to the replica database down in Brazil It would not see your post You know it'd be a couple hundred milliseconds before it actually got propagated But they would say you know, what's the likelihood of someone could hit refresh on the two machines at exactly the same time. Yes Correct so your question is does that in my scenario with it with Facebook if the replicas Brazil This is in California when I actually did the post does that mean the application server needs to communicate with the the database back in California from Brazil? Yes So saving is that's a huge bottle. Absolutely, and that's why they did the cookie thing to hide it Because doing this is this is hard Right They had to build that and then to get that right is not easy His question is or same as again using that the Facebook example if someone comments on that post I would never see that no because again It's just the deal with you at the person writing the post if you hit refresh It would pull it from the cookie so that you would think you got it from from the master But you really got it from the replica it fills in the missing information that knows it should exist for you eventually The the master will get propagated to the replica and Then now if I do a refresh instead of coming at the cookie, I'll come from my replica So if they might post my comment, I it'll be a delay before I can see it Correct in the in the old in the old system Yes, and the new system. Yes, everything is Now we're getting the gpdr world which I don't want to get into like where can data actually live But in generally think of yes, like think of this is like Brazil America US Everyone has a complete copy an entire database Whether Facebook actually does that anymore. I don't know Again, just think think of this as this is a good example of it's sort of like mp3's right mp3's take advantage of also as humans of What we can perceive in in audio and they they can compress down, you know, wavelengths that we can't see right throws away data that we The humans are never going to be able to hear to compress the you know the the actual file So it's sort of like the same thing right they know that if it takes me a hundred milliseconds to get to get up from a Comment on my post get from the master the replica Who cares if it takes me a hundred milliseconds to see your comment about my stupid picture, right? The thing they were trying to avoid was someone posting and then immediately not seeing what they posted So that's why they're doing that cookie trick But for everything else you just have to wait till it gets propagated and again If it's a hundred milliseconds to see a comment from from your friend who cares Yes Like the comment the comment use case so many California comments on like my post in Brazil or something Yes, then all of a sudden that has to be coordinated across all the way down to Brazil Yes, so just repeat his comment like for this one I'm showing P1 the partition P1 and I'm assuming everyone has a complete copy of this But now you can think of like In a really large large distributed base with a lot of data I'm gonna maybe want to replicate P1 multiple times. So there could be multiple copies of P1 And so maybe if all my data is down in if I'm down in Brazil Then I'll keep more copies of my data in down in Brazil because I can update them more quickly if anybody's posting my comment in Brazil Now if anybody updates something in California, that has to then get propagated down to Brazil so that when I Refresh I can see it. Yes. They handle all that Yeah, the Facebook architecture actually means it's it's all based on my sequel is at the end of the Is that the core storage engine of their giant distributed database system is is my sequel They're getting rid of inner DB and eventually replacing with rocks DB But all the layers above that are sort of independent of what the actual underlying storage is with the extra storage system Like all that coordination stuff of like keeping the multi multi master stuff in sync. That's all written by Facebook All right So an important property we care about in a replicated environment is this notion of case safety And the idea here is just keeping track of the number of copies of an object We have to have in order for our system to remain online So I don't know whether case safety is a standardized term. This is something that Mike Stonebreaker Uses when describing vertical and volt to be and basically it's a human to find threshold to say I need to have at least k copies of particular object at all time in my distributed database And if I ever go below that k then I grind the system to a halt And I stop until either I can bring up a new new copy of that data or you know hues or the human comes in and makes a correction And the idea here is is that we want to avoid losing data So obviously I want my case safety beat to be at least one right because if I have if I lose one You know if I lose one know that has the only copy of that of a piece of data Then I'm screwed now I could have false negatives or false positives for different queries and my database is incorrect So what this threshold actually is depends on you know, how paranoid you are about Keeping things online and then you can also you know vary this by saying like in my example Like I have more copies down in Brazil Maybe one copy up in up in and you know in the US because I want to make you know I care about keeping local copies down in Brazil All right, so now we want to get it What what are we actually propagating or how we're actually propagating our changes to to our replicas and he sort of asked about this Like is does this mean we're doing a bit of consistency and I mentioned was no and you'll see why so the propagation team is is is when should we How long do we have to wait or how many when should we tell the outside world that our transaction has committed? And this is somewhat independent of the two-phase commit stuff I this is saying like with my replicas should I wait until the replica acknowledges that they got my change and Have safety safety stored in the disk before I tell the outside world that I've actually committed And in general the two approaches that do synchronous or asynchronous Synchronous will give you what's called strong consistency Which means I can guarantee that if I tell the outside world my transaction has committed if I go read that data from any replica I'm guaranteed to see the changes of that transaction With eventual consistency the idea is that the change will eventually get propagated to my replicas So if I go if I get here back my transaction is committed and I immediately go try to read it on a replica I may not actually see it So again look at this visually so with synchronous we have say two nodes and assuming we're doing a master replica setup We we say we want to commit on the on our master and then we have to go to the to the replica and say hey We send you bunch of log messages or then to updates about this transaction made Go ahead and flush it and then we pause and we wait until we hear a response back from our replica to say that our Transaction has has successfully been committed and then is durable and disc and then once it done It's done flushing we send back the acknowledgement and at that point we can tell the outside world that we've committed So again at this point here when we get back this acknowledgement if we try to read whatever this thing modified We're guaranteed to see that change Correct, you know see that the change we'd expect on both the master and any replica With asynchronous you don't wait for that response. So I go ahead say I want to commit Then I say hey go but go ahead and flush the change But then I can merely come back into the application say my transaction has committed and then now at some later point, you know This thing will eventually get flushed, but I don't really need to be told on the master Be nice to know but I technically I don't have to be told so this is one of the good distinctions between the distributed database sorry that the the sort of traditional transactional relational database management systems and the no-SQL guys in It in a transactional database system. We don't want to lose any data We don't want to have any inconsistent reads. So we would always do synchronous replication The note the no-SQL guys would do this one here Because the idea is that eventually this thing will get propagated to my replicas and So maybe in a small window like 50 milliseconds. I could maybe get a steel read on my replica That's who cares Maybe it's a website of like stupid cat food cat photos with comments. Who cares if I go You know, I can't see the last 50 milliseconds of cat cat comments It's probably probably good enough if I have money certainly I want to use this Right because what could happen here? I tell my transaction commits I tell the outside where I committed But then this guy crashes and this guy crashes and say this guy didn't flush anything the disk and this guy didn't get Get the message yet or didn't apply it Now when I come back my transaction is gone. So as an aside comment, I'll say that a Lot of it the no-SQL the no-SQL systems from ten years ago that all say we were a shoe sequel They were going to avoid joins avoid transactions a majority of them have added transactions the majority of them have added sequel and joints right, so all the I mean, it's not to say that the certain aspects of no-SQL systems are invalid There are certainly use cases like website. We don't really need to have strong consistency But in general, I mean, there's enough applications out there where this this this matters a lot because you don't want to lose any data So let's say like if I get this commit message here, and I immediately come back and say I acknowledge a minute And I don't log anything here Actually, see if I did log something right say I've looked at this that we committed this transaction But now this this machine catches on fire those discs melt All right, so now this this guy crashes too, but he just crashed he comes back. He looks in this log He didn't get the flush message because that didn't show up in time So I told the outside world I committed But this guy never solved the change and I crashed so I come back now the transaction is gone And if that's your bank account with that money transfer you're pissed All right, so it's up to the application to decide what trade-offs they want to make do they care about now Are you super? Conservative and don't want to lose any data then synchronous applications the way to go If you're okay with maybe losing the last 10 5, you know 50 milliseconds of data, and this is the way to go okay The next issue is when do we actually when do we actually a sense and that send our changes? Oh, and what are these changes actually look like? So one approach is to have the master Continuously send all the updates that transactions make as they occur Thing of this is like it's it's like a it's attached to the right-of-head log So anytime I create a log record that I'm gonna you know want to write out the disc I also send it out in the network to my to my to my replicas and they can start applying the changes as they come in Of course, this means now I not only don't need to send a commit message But I also need to send an abort message Just as just as I would if I'm replaying the right-of-head log because I need to know what changes I need to roll back The other approach is to only send the Log messages when a transaction actually goes into commit So we just buffer all our log messages in memory on the master node Then if we get a board who cares we just drop it We don't send it send anything over the network if we go ahead and commit then we push everything to to our Master I start to to our replica And the advantage of this one is that you're not wasting time sending log messages that are gonna get aborted from transactions that are gonna abort But of course this means now if I need it if I'm doing synchronous replication And I need to wait until this guy acknowledges The replica knowledge is that it's applied all its changes Then if I'm sending this huge batch of updates all at once I have to wait till they all get flushed Whereas in this one I can do it incrementally So far as they know most systems will do do the first one here The last one's a bit more nuanced But it's it's it's what It's determining how we're actually gonna apply or what are these changes are actually going to apply to the system on our replicas and so Again, just you should be databases a lot of times the terminology is vague or people use different things or describe different things But I think active passive or active active is standardized enough that this makes sense I don't know. I don't know what the text will covers this so think of this is like with active active the idea is that our transaction is gonna run independently on On each of our replicas So say we have a transaction what I want to update for tuples We're gonna run that transaction on the master or run that transaction on the replica Or if we're doing multi master again, it runs on each each each copy of the node And so they're gonna when they commit all we need to do to determine whether we've committed correctly So we just need to check to see whether they all produce the same same result Now this is not easy to do if you're doing an an undeterministic or her drill scheme like to phase locking and and Time-safe learning all the things we talked about before because now you need to be guaranteed that the Transactions are running in the same order on our two replicas without checking for every single query So we talked a little bit about the partition time-same ordering scheme When Prashant talked that talk that lecture in that case that's a term you can use that for the termistic courage at all They guarantee that you know transactions run their operations in the exact same order on both sides So active active is not that common Because you have to do a vector extra stuff to make sure that like they run exactly the same What is more common is active passive where the transactions can execute on one location the one master node and Then they're going to propagate their changes to the replicas and These these changes could either be like the right ahead log You know we can either send out the the physical updates to the actual two plus themselves or the bites the little bites We changed or we could also stream out the sequel queries that they did and just replay the sequel queries on our replicas There's advantages both of them just as we talked about before between you for the recovery time Physical replication is usually the most common because all you're really doing is just sending out the law the right head log messages And then the the replicas replay them So is this clear? Yes This question is if you send the yeah His statement is when yeah, I actually would agree with that a statement is if you're sending the sequel queries isn't that the same thing as a As as active active I'm thinking in the terms of active passive where I run the sequel query on the On the the master and then the log message comes over has the sequel query Active active in context of store procedures think of like two transactions running in their entirety Independently on the two replicas, but in your example. Yeah, I that this is what I'm saying the terms are like nebulous I would agree that would be active active even though it's it's done after Like active passive as I run it on the master and then only after I run on the master Then I send it to the replica, but you could you could say alright I'm gonna run this query and then right before you run it on the master you send it over to the replica is that active active I Would agree. Yes. All right We have like eight minutes left and this is like one of the hardest things Let's roll the dice. Let's see if we can do it. Okay, so There's this thing called the cat theorem that people apply for distributed databases and this is a way to Characterize and understand what are the properties or guarantees that? distributed database can provide for you And it's broken up to three parts consistency or consistent always avail a network partition for tolerant so this was this was originally proposed as a conjecture by a Berkeley professor named Eric Brewer in the late 1990s and then it was formally proved at MIT that this is actually correct. This is a true theorem In 2002 and the basic idea is that all these three things if you're gonna have it should have distributed a base You you have to pick two of these you get two out of three Right, it's sort of like if you want it's like, you know If you're looking for a husband or a wife you can pick someone's either smart good-looking or not crazy But you can get you get two out of three of those things, right? Same thing for distributed databases. So let's go through these one by one again. The idea is that It's this sort of Venn diagram where you have CAP, but you can never be in the middle here You can never get a system has guarantees all these things So the consistency just means linearizability thinking this is a stronger version of serializability Availability means that at any given time we can access any node and get any data in our system and then partition tolerance Means that if we start losing messages because the network goes down or machine goes down that we can still process any response We could ever want So the no sequel guys they are going to be AP they're going to they're going to Try to provide availability and partition tolerance in exchange for giving up consistency Like that's the eventual consistency thing like I can't guarantee that if I tell you I made your I Tell you that you're right succeeded that I guarantee that everyone's going to see that right In the sort of new sequel or the traditional transactional distributed the systems They're going to try to do CP or CA and then their world if like I can't talk to a node Rather than keep keep on running. I just shut the whole thing down And in that case, I give up I give up availability All right, so let's go to each one of them one by one I think we've covered most of these already but just to show them visually to understand what they actually meet So again with consistency the idea is that if we do a right on one machine that everyone should see that right? Before we tell the outside world that our right it succeeded So our transactions running on this application server here It wants to set a to two and then we're going to propagate that change to this replica And then we can tell the outside world that we acknowledge their right and at this point whether we read A on the replica or on a master. We'll see a equals two, right? So another application server can immediately see no after this right succeeded I can see a equals two and I give back the correct response Partition talent or sorry availability says that if this replica goes down Then either the mat the this application server or this other application server It can can read and write to anything that it wants here All right, and then last one is partition tolerance the idea here is that Say the network goes down the network that I'm using to communicate between these two machines goes down The machines don't get down, but the network goes down or my messages my packets are getting lost on the network So now what's going to happen here? Well soon again before we had master replica setup, and I said with master replica setup You run paxos to decide who the who the master is And then that's where all the updates are going to go so that by this point There's a there's a network partition so these guys can't communicate, but they know they're still up like you know You're still alive So now you run paxos, and you find out. Oh, I'm still alive now. I'm the new master Right so now if my two applications over send at the exact same time updates to my database This guy sets a equals to this guy sets a equal three Both of these nodes think that they're master because they ran paxos that was fine You know no one else but out loaded us where the master so we said okay It's okay for me to go ahead and make this change we send the acknowledgement that we've made that change But now at some point the network comes back and I need to reconcile those changes synchronize And now you're screwed because now one guy says a equals to this other guy says a equal three And we told the outside world that those rights succeeded So yes, correct. So question is when can you have CP? You can't really well now take the back Yeah, so this is what I'm saying. It's sort of so what would CP look like CP was says or if the network goes down I can't communicate. You know, I can't communicate these two nodes What should I do? So if I'm doing like a case safety thing where I say I Need to have three copies of the data at all times and say I have another I have another node over here So these two guys would do say hey we have at least two copies. We're fine We do leader election this guy says he's the master So now anybody can can do rights here and then that's fine this guy over here say well my case safety is is to I Only have one so I have to shut down. I can't run anything. So therefore I'm giving up availability So in that case I can have I'm technically handling of the partition the partition in the network By being not available on that side, but this side's okay So this is called split brain and distributed systems Like I have two sides like two brains sides the brain can't communicate And they both think they're they're king of the world. So again in a in a traditional transactional database system They basically stop the system when you realize you can't communicate with everyone Right or if you have a majority then you say I'm the new master and so in this example here if say this guy came back up Well assuming that assuming this guy if this guy was allowed to make changes Because it hadn't seen it's case a factor was enough then when I came back I would have to have a you human come in and resolve this change We can't magically just do that In our system and in that case we again we stopped the world and we go offline until someone comes in it fixes us Yes Yes Correct so his comment is how do I avoid the split brain? Well, if your case safety factor is half the nodes plus one that means that you at least you're always guaranteed to have Yeah, only one side could be could be the the be the master and the other guy fails. Yes That's it. There's no the magic, right And again, so going back to the no sequel guys In their world again, they're dealing with like in traditionally with dealing with like websites that you want to have be online 24-7 so in their world they they would rather have system be available and still serve requests Albeit, maybe they're they're they're slightly wrong or delayed in getting that all the changes But that was better than being completely online If you're dealing with money and you can't you don't want like I don't want to give out I thought you know a million dollars to you that I don't have and again a million dollars over here for you Know because I have a split brain in their world. They can't have that happen. So they'd rather take the whole thing down So I'm not saying one is better than another I'm saying for certain application scenarios one one is prefer preferable But it's just sort of good understand like when you start designing a distributed system what trade-offs are you actually making? Okay? So let's just finish up quickly about federated so the cat theorem again, it's improved to be correct in the late 2000s there was some if you go If you go Google the phrase like defeated cap theorem there's a bunch of people Making wall claims about how their databases defeated the cap theorem and they were resoundly put down as being stupid It's like you can't have a distributed you can't have it should be days that can do everything You can do a bunch of extra stuff to try to mitigate The bottlenecks or sort of the issues you would have by having these these various these you know machine go down and things like that through redundancy so that you can you know Reduce the the likelihood of a network partition or things like that But in the end of the day, they're unavoidable at some point you're gonna run out of money or the system's gonna get too slow And you're gonna be you're gonna become beholden to it. All right, so let's just finish up quickly so I Just want to briefly mention what a federated databases so that if you ever see one or think about think about building one You just know what it is So in all in all distributed as we talked about so far We have assumed that all the nodes are running the exact same database system software Right. It's you know, it's my it's a distributed version of my sequel distributed version of cockroach TV, whatever But sometimes in some systems and sort of large organizations You have of these sort of one-off applications that are using you know This kind of database system and then it's other applications using this other type of data system And they need a way to sort of do maybe transactions across all of them and do queries across all of them So they appear as a single database instance even though underneath the covers. They're running completely different software So this is what a federated basis is designed to hope designed to solve The idea is that we provide a single logical database instance And we know how to take a single query that that on that single database instance and break it up into Plan fragments that we can then possibly execute on the separate machines And we have a way to sort of put it all back together So this was a big thing in the late 1980s early 1990s Right as companies and organizations got larger and there was more database deployments We think what would be great if we had a single interface for all our databases It didn't pan out because you end up dealing you end up designing a system that has to do with the lowest common denominator of all Your systems, right? This system doesn't do transaction or this doesn't do these type of queries So we can't do that for our other systems Right so get people try this people still try this It's usually a bad idea. It's not going to end. Well, you can do simple things But you know having this this beautiful all-in-one federated database is not going to work So again basic ideas like this you have your application server You have your middleware system and you have a separate back-end databases So a single query goes to the middleware and then it recognizes what all these different systems can actually support And so it rewrites the portion of the query that you want to run on these different machines for their different APIs Right so my sequel does sequel MongoDB does JSON queries Redis does Its own thing and subway is whatever All right, so it knows how to take all those those queries break it up and run on those separate machines And then you get back the result So these things are usually called connectors I have the ability to communicate with these different databases and pull them into the single system The one database that probably is the the best position to do a federated database architecture is actually Postgres So Postgres has this thing called the foreign data wrappers think of this is like a an API You can plug in different data sources that are outside the normal Postgres storage So there's there's foreign data wrappers for you know for for for Mongo and all these other systems So I read all my sequel queries in to Postgres Then the foreign data wrapper knows how to go out to these individual systems and suck the data in which I think is pretty cool Okay So any questions about any of us As I said, yes the back All right, so this question is can I recommend any distributed OLAP or OLTP system OLAP? Let's take this offline because that means a complicated question. What are you trying to do? What is your data look like? How much data do you have? Do you want sequel or not no sequel? Why? I'm not sure There's like You've seen dbd.io. There's six hundred eighty bases six hundred sixty whatever like There's not gonna be this one magic thing that like solves all the world's problems You have to look at your application requirements and end up making compromises about what you know What features you need what features you don't need how much money you willing to spend? Right So Let's come to this next class say a popular distributed OLAP system. We'll cover the next class some popular distributed OLTP systems So all the the major vendors sequel server DB2 and and and Oracle they all have their own distributed systems All right, there's newer startups like cockroach tidy B Yuga bite fauna Mongo's distributed Right, they all make you know, they all have different trade-offs Well, I'll cover and I'll list out some OLAP systems in the class and actually Let's talk up. Let's talk offline. Maybe there's something we recover at the potpourri in the last class. All right, so Again, the main takeaway is talking beginning is that we assumed all our database nodes in our system We're friendly that makes our life easier of how we do commits and transactions and replication If they're not friendly then that that's what the blockchain is and the projection work get to do to prove That when we say commit transactions We want to commit a transaction that everybody actually committed transaction And then in case of Bitcoin, that's all the hashing stuff they do with the the Merkel trees Okay All right, so Monday's class next week will be the last lecture on distributed databases will cover distributed OLAP systems And I think that'll be the end of the material for the for the semester That that that'll be covered on the final on when we come back after Thanksgiving That'll be the guest lecture from Oracle and then the system potpourri in the final review. Okay? Any questions? All right guys have a good weekend. See you Coming through with Michelle and to send for the case in me say not in the mix of broken bodies and crushed up Can met the cows in the jam or I'll drive He's with the night in my system crack another unblessed. Let's go get the next one and get over the object Is the state so I lay on the sofa better yet down my shoulders. I'll be Tim stressed out Can never be son Rick and say jelly hit the deli for a part one naturally bless Yes, my rap is like a laser beam the boards and the bushes say not If you don't you don't realize drinking it only to be drunk you can't drive And if the same don't know you for a can of pain pain