 So this is the last semester for the course, the semester. And as I said, this would be a combination of the sort of final review for the final and then we'll finish off with the system paper based on what you guys voted on. So for what remains in this course, there's sort of three things. The fourth is due next week on Monday. The extra credit will be due a week from now on Wednesday. And as I said, today is the day if you want to submit your article on the Google form that I sent on Piata, if you want me to review it from now until the next week or so, I can do that and provide you guys feedback. And then the final exam, which you guys are all super excited for, again, is on Sunday morning at 8.30 a.m. And that'll be in Gates. So any questions about project four real quickly? Any questions about the extra credit? Yes, in the back. I'll start doing that today. I'll go through the list in the order who submitted first. I think right now it might be like five or six. And I'll just send them back out as I go. Any other questions? Yes? The question is there's a flush log function in the disk manager. And this question is, are you using that to make sure that threads wait? Yes, because it'll pause until it's f-synced. Like it's calling f-sync on the disk on the operating system. And that stalls the thread because you're in the operating system waiting for the hardware to confirm that you've been written out the disk. All right, final exam. So who needs to come to this? If you're enrolled in the class, if you want to grade, you need to come to the final exam. So this is the URL and I'll post this on Piata. The website is live, but I haven't posted the practice final exam. And I'll do that later this week. But that web page there will take you to, again, the overview of what's covered in the final exam. And we'll talk about that in what topics are covered, but that will physically say from this slide to this slide is what's covered. Right, so obviously the volt-TV stuff that we talked, that we had the guest lecture on Monday, I'm not going to put that on the final exam. But everything else before that is applicable. So again, it won't be in this room. It'll be in the Gates Building 4401, which is the giant receipt auditorium. And I'll bring breakfast for everyone. So what do you need to bring? So as always, bring your senior ID. You bring a calculator if you want. But obviously, if you think about what we've done in the homeworks and projects so far, we haven't really done a lot of calculations. We're not estimating the number of disk IOS we're joined. So you can bring it if you want, but you're not really going to need it. And then I'm allowing to bring two pages now of notes, handwritten, double-sided. So don't take the slides, shrink them down. Everything has to be written by hand, okay? So if you want to, we had one kid do this year or two ago. He decided halfway through that he wanted to change into pajamas. In the exam, I'm okay with that. You have to change in the bathroom. You can't change in front of everyone else, right? Do not bring your roommate. I also had this happen one year. The kid shows up and I'm like, who's this? And he's like, oh, this is my roommate. He just kind of wanted to chill for a bit. It's a three-hour exam. And it took me like 10 seconds. Wait, he's not in the class and he's just here. He's like, yeah, we're just kind of hanging out. And I couldn't tell whether they were lovers. It was really bizarre. Do not bring your roommate, okay? So we don't have that awkward conversation. The other thing too, and I'll send a reminder out to some piazza, please fill out the course evaluation. And don't say like, oh, Andy's great or Andy's socks, right? Like, useful feedback is what we're really looking for. And specifically, there's anything you don't like. By all means, take a shit on me. I don't care. There's anything you don't like about the projects, the lectures, the homeworks, the exams, anything. Just let me know and that way I can try to make the course better, right? I teach this once a year and I take the feedback from previous years and try to improve it for the next people. So if you really hate the people, if you really hate your future classmates, you can screw them and be like, oh, there should be like five exams, there should be 20 homeworks, right? But I mean, honestly, if there's something about the projects, whether it was, the description wasn't clear. Yes, I know about the issues of grade scope, but by all means, you can list those out as well. One of the things we're thinking about maybe doing is in future years is expanding the number of test cases we give you or at least writing in English what each test case is doing, right? So there's things like that where we see that the issues we have as the course progresses, we want to make it better. And I learned this through your feedback. So the undergrads are awesome. The undergrads just rail on everything. It's the master students are like, oh, they're so great, right? And that's not helpful. So I don't care. I can't see it. It doesn't affect your grade. By all means, go to town. And I already wanted to teach you a word last year, right? Because all the master students wrote Andy were so great. So they're not gonna fire me if you say, this course sucks, here's why, okay? So be vicious. For next week, I'm actually traveling on Monday, Tuesday, Wednesday. So I'm not gonna be around my normal office hours times, but I'll have office hours on Friday afternoon on the 14th, the Friday before the final exam. If you can't make this, send me an email and we can figure something out. If I have to call you on Skype, I can call you on Skype, okay? All right, so what do you need to know from the content we discussed before the midterm, right? And I'm listing these five topics, but these are pretty high-level and superficial, right? Or not superficial, but they're, I'm not gonna ask you specific details about how this hash table actually does, overflows or things like that. You just need to sort of like aware of what these things are and how they fit into the overall system. Like if you magically forgot SQL, immediately after you took the midterm, then you're screwed, you have other problems, right? But I may show you SQL and you can't say, I don't remember SQL, that was before the midterm, right? So, bufferful management, obviously understanding how things are swapped in and out and why we're doing that and how that relates to transactions and logging, and stuff you guys have done in project three, project four, B plus trees and how they relate to concurrency control and the overall storage models, like column stormers is a row store. And again, think of this, the questions will be couched in what we talked about after the midterm, but we'll rely on these concepts to provide you the questions or to reframe the questions, okay? All right, so the first thing we talked about was parallel execution. So you need to understand the difference between interoperator parallelism, introoperator parallelism, and then inter and introquery parallelism, right? So what would be introquery parallelism? What does that mean? Single query, I can execute it on different cores or different workers in parallel, right? And that's different than interquery parallelism, is where I could have different queries, possibly from different transactions, and they're running on parallel on different workers. So we talked about the volcano model using the exchange operators to coalesce results between different parallel workers, and then you pass things along to the up the query plan once you have all the results, all the intermediate results. We said some time talking about embedded logic. We talked about the advantages and disadvantages of various techniques for taking logic you would normally have written in the application and putting it directly inside the database system. The real focus would be on user defined functions and store procedures, right? So I understand the difference between these two concepts, I understand when and when they do not work, right? I'm obviously not gonna ask you to write PLPG SQL because we didn't ask you to write this in the homework and we don't want to have to grade it. But again, it's the high level concepts that actually matter. They only spend a lot of time talking about transactions, right? So obviously you should understand what it means for something to say a database system to provide asset guarantees, atomicity, consistency, isolation, and durability, what the difference is between those four concepts, how a database system could provide those guarantees, like what about the architecture, what additional functionality you have to have implemented in the system to provide asset transactions. Then we spent a lot of time talking about conflict serializability and view serializability. So you should understand the difference between how to actually check if you have an execution schedule, whether it's serializable or conflict serializable, but then you also need to understand how to actually generate a dynamic schedule on the fly that will be conflict serializable, right? That's essentially what the two-phase locking, timestamp ordering stuff we talked about. We talked a little bit about view serializability and obviously I'm not gonna ask you to write any proofs, but then you understand at a high level what view serializability allows that conflict serializability does not allow. What does it mean for a schedule to be recoverable? How do you deal with cascading aborts? What are the different isolation levels that transactions can run at? And what are the anomalies they may or may not be exposed to at these different isolation levels, right? Serializability was at the top, redone committed was at the bottom, and then you had read committed and repeat or read below that. Then we talked about protocols of actually, and how do you actually generate serializable schedules? And we had one entire lecture on two-phase locking. Understand between strict versus non strict as defined in the homework and the projects. How you actually would implement deloc detection versus deloc prevention. Know the difference between wound weight and weight and dye. I think for the project, you guys are implementing weight dye or wound weight. Wound weight and then, yeah, I'm sure it was weight and dye. So understand what the difference between those two things are. Understand how multiple granular locks come into play, specifically how do you use intention locks, right? Intention exclusive, attention shared, shared attention exclusive. What they mean to lower levels in the hierarchy if you take an attention lock above, right? We could show you two transactions and you could tell us what's the minimum number of locks you need to acquire in the hierarchy to allow them both to run. Then we spent more time talking about time stamp ordering or optimistic occurrence control protocols. So for the basic time stamp ordering protocol, the understanding how it works, how you can apply the Thomas Wright rule to ignore rights from transactions that would otherwise abort. Understand the three different phases in OCC. What does it mean to have a private workspace for transaction when they apply updates? How does this all relate to multi-version occurrence control? So understand how you actually store the different versions, how do you order them? What are the implications of this for performance? How it actually affects other things like indexes? If I'm going oldest to newest versus newest to oldest and I update a tuple, how do I update my indexes? How do we do garbage collection? How do we go back and find old versions we don't want anymore and clean them up? Can we do this all in the context of everything being transactional? Then we spent time talking about crash recovery. So you should understand the two key policies you can have in your buffer pool management implementation. Steel versus no steel, force versus no force. So what does a steel policy mean? Yes, sorry. Yes. Okay, absolutely yes. So she said the steel policy means that the database system is allowed to write out dirty pages from the buffer pool and evict them so you can write them out the disc even though those pages were modified by transactions that have not committed yet. Under no steel policy you cannot do that. Everything that a transaction modifies has to stay in memory. And of course we said that if you go with no steel then you never have to worry about when you crash and come back that there's dirty pages from uncommitted transactions on the disc because they never got written out. So therefore you don't have to do any extra work to put the database back to a correct state but under the steel policy because you could be writing out dirty pages you have to use the log to figure out what the hell was going on and what pages should actually be rolled back what pages should be left alone after a crash. So now what's the difference in force versus no force? Somebody other than her. Yes. Absolutely yes. So he says, remember phrase what you said. When a transaction commits, force means when a transaction commits you have to write all its dirty pages from the buffer pool manager out the disc and flush it, right? That's what you call F sync, right? And the transaction is not considered committed until all its dirty pages are written to disc, right? And no force you don't have to do that. So for your project four, what are you implementing? Force of, sorry, steel versus no steel. Steel and force versus no force. No force, right. So if you use the right head log then you're essentially doing steel, no force because when a transaction commits all I care about is the right head log records are written out the disc or flushed to disc. I will flush out the dirty pages at some later point but because the log's written at disc that's all I need to be able to recover the database if I crash to restore the pages back to the correct state. All right, so again, we talked about right ahead logging. We talked about different logging schemes, physical versus logical versus physiological, right? Undo versus redo, what information you need to store for these different techniques. All right, we talked about how to do checkpoints. So there's sort of three variants of the checkpoints. They don't really have names. One was like bad, okay, best. The best one was the fuzzy checkpoints. The middle one, I think it's what you guys are implementing in project four, where you sort of pause everything and write the pages out. And then we talked about how to recovery from our right ahead log with checkpoints under Aries. So the main thing to understand here is what are the three phases for Aries, right? Analyze, redo, undo. How do the log sequence numbers permeate throughout the entire architecture of the system to allow you to figure out what was going on, right? What pages that were modified by different log records have been written out. We talked about how to do the compensation log records to understand how they work. You're not doing that in project four. So again, we're not gonna ask you nitty gritty details but how they work in concert with the dirty page table and the active transaction table during recovery. And then we finished up this semester talking about distributed databases. So we talked about the different kinds of system architectures you could have, shared disks, shared memory, shared nothing. We talked about how to partition the database. We talked about how to do replication, master replica versus multi-master, active passive versus active active. Then we talked about how to use Two-Phase Commit and the consensus protocols. But obviously, again, we're not gonna ask you to implement Paxos on the final exam because you'll be here for days and we have to grade it. So again, it's the high-level concept is what we care about here. To understand the distinction of how these different architectures work and what do they mean, okay? All right, so any questions about the final? Yes? You mentioned about Two-Phase Commit. So last time you explained how to speak Two-Phase Commit but that's obviously like two phase commits. That's today as locking. The strict Two-Phase Locking and rigorous Two-Phase Locking. We will be very specific. Right. Two-Phase Locking is different than Two-Phase Commit. Yes. But we'll see span or a second. We'll use Two-Phase Locking and Two-Phase Commit. Any other questions? Everyone's really excited for the final exam, right? All right, so in the remaining time, let's talk about different database systems which you guys voted on. So here is the top 10 systems from the last three years that I've done this, right? And what you see is that the top one and two have always been MongoDB and Spanner, right? MongoDB has always been number one and then last year, Spanner actually turned out to be number one. They're not number one this year. And I'm gonna take a guess which one's number one. Cockroach, right? So these are the results for this year. So I always ask, so for those of you that voted for CockroachDB, why? What's that? It's a funny name. All right, that's one thing, you know? Who here has used CockroachDB? Nobody. All right, it's because you see it on like Hacker News and like, are you interested in it or like, have you been talking to startups that are using it? I'm actually really curious. So in previous years, from MongoDB and Spanner, I asked the kids, why did you pick Mongo, why did you pick Spanner? From Mongo, they said, oh, they thought about, they maybe want to use it for like a pet project or like a little hobby thing they wanted to build. Like a website, they were like, oh, let me use Mongo to try it out. For Spanner, they thought, they said to me that they want to be able to go on the Google job interview and tell them something intelligent about Spanner, right? And I told her, like, look, it's like, Google's not gonna hire you and you're immediately gonna go work with Jeff Dean on Spanner, right? Because he doesn't work on it anymore. So is it Cockroach, is it just the name or just because it's a lot of buzz or like, are you just dying to know about it? Maybe someone submitted it. He says, maybe someone submitted it multiple, so maybe one person here is like, I fucking love Cockroach, right? Just, yeah. Okay. All right, so I mean, I'll go down real quickly with these other ones because we're gonna focus on the top three here in the time we have. So Amazon Aurora, Amazon Aurora is the Amazon's fork of my sequel and, well, it's not, it, let me be clear. Aurora is a umbrella of a, so a class of database systems that can either look, like based on either Postgres or MySQL. So you can get Aurora for MySQL or Aurora for Postgres, right? And so what they did is they modified the system to use a custom storage layer on EBS where they have pushed down logic about transactions and replication into the EBS layer. So if you and I use EBS, we just get sort of this like, simplistic block storage, right? Amazon controls the whole stack, so they actually put logic about transactions in the storage layer itself. And that's essentially what Aurora is doing. So even though it's gonna look like MySQL, it's like MySQL on steroids to run in Amazon's data centers, right? Redis is a in-memory, no-SQL system. Think of it as like the, it's almost like a Swiss army knife of a database system, so rather than doing SQL, you can do like, you can get sort of complex data types like sets and lists and hash tables. And do sort of simple manipulations on them. So it's in-memory, it runs in a single-threaded engine like BoltDB, and that's how they get better performance. But they don't, as far as I know, they don't do transactions. Cassandra is, it's another no-SQL system. It was actually written by Facebook. Facebook decided we don't wanna use this and gave it away and open sourced it. And then a bunch of different companies saw it, picked it up and actually expanded upon it. And it's sort of, it's a hybrid between sort of Google big table with column families plus like the DynamoDBs consistent hashing with the rings. Again, my hand gestures don't mean anything to you guys, but this one's pretty popular. Elasticsearch is a document database. We'll see Mongo in a second. Think of it like storing JSON documents, but they can do like full text searches on them, which we talked about earlier. They just went public, right? I think they just had an IPO this year. Apache Hive is a query engine on top of HDFS and Hadoop. So you basically write SQL queries, it converts them to MapReduce jobs and they run in Hadoop, right? Hive is actually pretty slow. Presto or now the other ones are saving me. There's another one that's actually better than this. You know, I'm drawing a blanket, it's terrible. There's another one, it's better. Facebook scuba, scuba was a, I'm not sure if it's actually still around anymore. It's an in-memory database for time series metrics. So they have this, they ingest all this information from their various, from their various applications, like pages read, pages written, lock times, things like that. And you can run analytics on it inside there. And then by SQL we spent a lot of time talking about it already. All right, so let's jump into this. We'll start with Cockroach, Spanner, and MongoDB. So CockroachDB was started in 2015. I actually went to go visit them last year. So this is my view at the conference table. I sort of showed up to say hello and they had a sort of meeting with the, they pelted me with questions for like an hour. It was kind of fun. But they're really all in with the Cockroach stuff. So this is in the lobby. They had the logo and they made sort of like pixel art out of it. Like it's in the wall. So like, something just said, are they gonna change their name? No, they're going all in with Cockroach, right? And it's called Cockroach because it's meant to be like the database that never dies. Like Cockroaches are notoriously hard to kill. And so I get the metaphor. You know, when they were first announced were CockroachDB on Hacker News, people were like, oh, this is terrible. It's not that bad, right? I'm okay with the name. Okay, so Cockroach was started in 2015 by ex-Google employees. And although the people that started CockroachDB had not worked on Spanner, they had used Spanner at Google. And so they sort of set out the goal of the startup was for them to build the open source version of Spanner. We'll see Spanner in a second. What I'll say is that at a high level, yes. I mean, they're trying to achieve the same goals as Spanner, like a distributed, fault-tolerant transaction processing database system. But underneath the covers, the architecture is slightly different. So it's not an exact clone of Spanner. And I don't think the company ever advertised themselves as being the open source version of Spanner. That sort of got assigned to them by the internet. But I think they've done a pretty good job at making clear that they're not trying to claim that the open source version of Spanner. So at this point in the semester, what's awesome is that I can rattle off a bunch of keywords about the architecture of the system. And you should understand what I'm talking about. Again, at a high level, you can conceptualize how exactly the system works without having to me describe all the nitty gritty details. So it's a distributed database. It's decentralized, shared nothing. They're gonna be using a log-structured storage architecture at the individual nodes, and they're using ROXDB to make this work. And the current digital model they're gonna use is multi-version OCC. And the only isolation level they're gonna support is serializable, right? So again, what's nice about this is, again, I don't have to go into details about how all this works. At this point, this slide right here, you can understand, oh yeah, it's a distributed transaction database system that shared nothing, right? So the way they always like to describe the architecture is in terms of these layers, and that's sort of like what we talked about at the beginning of the semester. We talked about how we had those layers, we had the disk manager, and they had the buffer pool manager, and above that, we had the current tutorial, and above that, we had the execution engine. So that's how they're describing their architecture. So at the very bottom on a single node, you have the storage layer. And again, that's just ROXDB. So the way to think about this diagram is, in this semester, we've covered this box here, right? We now know how to build a single node storage manager that we could use underneath the covers for something like CockroachDB, which has all the layers on top of this, right? So above that, you have a replication layer, then you have a router layer, and that's, again, that's like the coordinator for transactions or the routing table to side where the data you actually need is located, and above that, you have a distributed, a transactional key value store database API, and then above that, you have a SQL layer. So to weigh it again, think about this in the context of our single node architecture. This is sort of like the page directory telling you where to go to sort of find things, and above this is like the buffer pool manager to say, hey, I want this page, or I want this key in their parlance. And then it knows how to go find all the data you need from down below it. And then when you write to a single node, they're gonna use something that looks like packages I'll show in a second, that basically to replicate it to the different nodes. And they can do this all in the context of a transaction that's fault tolerant. So instead of having the concept of pages in CockroachDB, at the upper layers of the system, it's a key value API. So you can sort of think of like we had this page directory says, I want page 123, the page directory told you where on disk go get it. In CockroachDB, the transactional key value store allows you to say, hey, I want key 123, and it'll tell you what node to go get it at. And then within for single key, it's gonna be multiple pages. So you can go fetch larger chunks of the database at a time. So again, the bottom is RockDB, and then the replication and consensus protocol is gonna be this thing called Raft, which is a variant of Paxos. Paxos was invented in the 1990s. It's considered notoriously hard to implement. Raft came along out of a project out of Stanford called RamCloud, which is their own distributed key value store. And then they proposed, here's the consensus protocol called Raft, that they touted it as being easier to understand and implement than Paxos. I slightly disagree with that characterization. I don't think it's just as difficult to implement as Paxos. But what Raft did that the Paxos people didn't do is that there's a ton of open source implementations of the Raft protocol available in a bunch of different languages. So if you were building a distributed system in, so CockroachDB is in go, so there's an open source implementation of Raft in go that you could just download and use in your system. There is no, so there's like Lib Raft. There's a library available to use Raft, the Raft protocol. There's no Lib Paxos. So I think that part of the reason why you see a lot of systems using Raft instead of Paxos. But at a high level, they're just as hard to implement. So the concurrently protocol in CockroachDB, as I said, is multi-version OSTC. So in Spanner we'll see in a second, they're gonna assign transactions time stamps from Harvard-based clocks. In CockroachDB and in every other system, you can't rely on the clocks being super synchronized. You can have drift over time. NTP can't guarantee that the clocks are being asynced at really tight bounds. So what they're instead gonna use is a hybrid clock method where it's a, you assign a time stamp that's a combination of the current physical time that it's loosely synchronized with and through NTP. And then you have essentially like a little local counter that you can use to augment that. And that's how you're gonna guarantee that transaction time stamps are globally unique and always going in increasing order. So this link here is to the paper that's out of the University of Buffalo of a distributed computing professor there who describes how you actually implement hybrid-logical clocks. We have our own approach in another system we help build, we call physiological clocks. And at the high level, the idea is the same, it's a combination of physical clocks plus logical clocks. I'm gonna have to cut all this out and bleep it, but this paper actually describes actually how to do it. There's another professor down in Texas that apparently stole his ideas and patented it. But whether there's a litigation or not, I don't know. So whether cockroach gets DB gets sued because this guy implemented it, somebody else patented and this guy is okay with cockroach DB using the ideas, the Texas guy may not be. I don't know what's gonna happen. So when transactions, again, they get time stamps and we use that to figure out the global order of their operations. We're doing OCC, so we're gonna write into a private workspace, but with multi-versioned, there's not really sort of a separate location where you write things. We're still gonna write into the global database. Anybody can see them, but we're gonna mark these tuples that would normally be in our private workspace as being what they call intents. So there's additional metadata to say, this transaction wrote this record. Here's my time stamp range of, here's my time stamp when I started and I haven't committed yet. So other transactions come along and start scanning versions to try to find the right version they want would know not to read this. And that's basic NVCC protocol. Now all the metadata about the transactions themselves is gonna be stored in the distributed key value store. This is completely decentralized. There's no one central coordinator we can always go to and say, what's the state of my transaction or what's the state of this record, right? So we're gonna store this in our own key value store and allow anybody else to read it and figure out what's going on in our system. So at a high level, again the architecture would look like this. We have this global logical namespace and just think of this as like the page directory that in the disk manager and a single node database, but now it's, again, it's this key value pairs. And the way to think about this is that at the very beginning you have the, you map from keys to a location and that's gonna tell you for a given key, you know, where to go find the data you need, right, on a different node. In the namespace for the table and indexes and everything else you're storing in the database, the key value mapping is gonna be from the key to the actual data. So this thing tells you what node to find the data you want and then anything in here is the actual data. So again, it looks just like the disk manager page directory. All my entire database is broken up to pages and if I have a page ID, I can go get that page. So if I have the key of the record I want, the range that I want, then I know how to do a lookup in the system table to tell me what node has that data that I want and I can go get it. So they're gonna have a master replica replication. So we're gonna run some round of Raft or Paxos and elect a leader. So all your writes are gonna go to this leader and this will be done using a key value API. So they're gonna convert all the SQL statements you would normally invoke in your application into the underlying key value operations. So we send all the writes here and then there's the distributed concurrency control to figure out whether our transaction conflicts with any other actual transaction is running at the same time. We use our hybrid timestamps that we got assigned when we started to figure out what order we should actually allow these to commit. If we know this thing's allowed to commit, then we just run a proposal change to the Raft protocol to replicate the data to the different nodes. So again, at a high level, it's all the concepts we talked about before. I showed before, here's all the different possible things you can do. When you actually implement this, you sort of pick, you pick what design decisions you wanna make and you put all the things together to make your full system. So they're doing master replica replication with OCC. And instead of using PAXOS or two-phase commit, they're using Raft. But it achieves the same thing. Any questions? So it's SQL base, it's transactional, it's fault tolerant, it uses rock to be on either covers and then they have their own transaction correlation layer above that. Yes. He said it can be more specific about the key value API. So this here, I'm trying to show you, this is a thing of a logical key space. And the keys are not gonna be like, for individual records, they're gonna be for ranges of records. And then it's gonna be sorted based on some value that you wanna keep things sorted. So if I wanna figure out where, say where key one, two, three is in, say I have to do a lookup, say I have a column called ID. I wanna find my tuple where ID equals one, two, three. I don't wanna use an index to find out where that is. So what I need to do is, I need to consult the system catalog and say I know I wanna do a lookup in this table and here's the key I'm looking for. And I, therefore the catalog's gonna know you have an index that can do a lookup on that key. So it's gonna tell you, okay, well the starting point that you want to start your traversal into the index is gonna be at some offset inside the key space for this index because this is gonna be the root of the B plus tree. So then the key value API is going and fetching all the lack of better term pages that correspond to the index you wanna do your traversal. Then at the very bottom of the index you're gonna say, all right, here's the actual, here's the mapping to the record ID or which is gonna be again in this key space up here. So think of this as like my query's running here and it's just, well it's running over here but like I'm getting the pages I need from the key space here. And then I can do all the same things that we talked about before but instead of having it to go getting through the Buffle Manager and which gets it from the disk as you would on a single node system, I'm getting it from this distributed key value store system. This is not unique to Cockroach DB, this is how a lot of distributed databases work. Foundation DB does the same thing. If you go read the literature for Cockroach DB they're gonna talk about ranges and talk about keys, just think of them as being pages. Google Spanner, all right, so Spanner has been around since 2011, I think when the paper came out, yes, sorry. So his question is, what do I think is the most unique aspect of Cockroach DB that would separate it from other distributed databases? How about distributed to transactional databases? There's nothing unique about what they're doing at a high level from a scientific standpoint, right? NVCC is old, OCC is old, RAFT is from Stanford, right? All those concepts are not new, right? The hard thing though is engineering, right? So getting this thing to actually work and be correct and fault tolerant, they're pulling that off. So that's actually a big deal, right? And having a deal with all the weird corner cases, lost messages, one node goes down, one node comes back, right? All that kind of crap, it's one thing to write the paper to describe how you actually would implement this but actually have something actually is reliable and running in the real world is a big deal. So whether they're more reliable than the other transactional distributed databases that are available, TIDB, Yugabyte, FoundationDB, I don't know, but if they are able to achieve the fault tolerance claims that they're claiming, then that's impressive. Okay, Spanner. So again, Spanner is Google's distributed transaction processing database system. And as I said, the CockroachDB guys use Spanner when they worked at Google and then they sort of built something that is sort of similar to it at a high level. So I gave a talk in 2013 in Google's office in Kirkland in Seattle and I met with some of the Spanner SQL team which is different than Spanner, we'll cover that in a second. There's actually four versions of Spanner. So when I gave the talk, I wanted to talk about a bunch of different new SQL transaction processing database systems and I had to show something on my slides to represent the Google Spanner logo because back then there wasn't ones. I actually made this myself. They didn't like that and then they gave me this piece of art. This is actually the real internal Google Spanner logo as of 2013. But now with the advent of the cloud-based version that you as an outside non-Google employee can actually use, this is actually the current Google Spanner logo. So to understand why Google built Spanner, you have to understand what they were doing before them. So back in 2006, Google wrote a paper on a system called Bigtable. Google actually does a really good job at publishing or writing research papers that describe the various database systems that they built. And they're extremely influential because people assume, oh, Google is the best. They must be doing something right. So the people go take their, whatever they publish and go try to recreate it and make their own version of it, right? So this is a system called Bigtable from 2006. The open-source clone of this is HBase. HBase is the Hadoop-based clone. Hadoop is also a clone of a bad producer out of Google. It's a clone of the Bigtable system. So this is sort of one of the first no-SQL systems that came out in 2006, 2005. And in this paper, they talk about if you really want to achieve the kind of scalability that you need to have in a distributed database that Google needs for their infrastructure, you had to give up SQL, you had to forego joins, you had to give up transactions, right? And in the case of Bigtable, it was a column family system, sort of thinking of like a key value system that ran on GFS, which was the early version of the distributed Google file system. So then this was sort of how they thought they were gonna go forward. At the time, though, the Google AdWords, which was then and still is now the biggest money maker for Google, that was actually running on a sharded version of MySQL. So they wrote their own middleware layer, their route queries over different MySQL installations because they needed transactions in that environment. So there was another paper in 2011 that came out on a system called Megastore. Most of you have probably never heard of this. I don't know whether it's still used, but this was sort of an early incarnation of Spanner where they wanted to add support for transactions in a distributed environment. As far as I know, this did not get a lot of traction internally at Google because the performance was not that good. So then based on this experience, they went off and then ended up building what we understand to be now Spanner now. So I think the paper came out in 2011, but Google had been working on it, I think for five or six years before then. So it's been around since 2008, 2009, maybe before then. So it is a distributed transaction processing system that's geo-replicated. And it's gonna be decentralized transaction management with a shared disk architecture. So they're gonna have, there's like a big table would write data out to a shared disk system running on GFS. Spanner is gonna write out to some underlying distributed file system as well. I forget what it is actually now, the name has changed. So it's gonna do log structure storage on disk, all right, which we talked about. And it's gonna support physical denormalization, which we'll show in the next slide. And they're gonna use that to reduce the amount of IO you have to do to process queries. Now, for the concurrency control part, this is the most interesting aspect of Spanner. This is the part that's really, the most complicated to understand, this is the part that makes it actually really unique from everyone else. So they're gonna be doing strict two-phase locking with multi-versioning and using both multi-paxos, or multi-paxos where instead of having anybody be able to propose something, you take turns having a leader and then it's the only one allowed to propose things. And then every so often you renew its lease on being the leader. So we're gonna use multi-paxos for transactions that are within a single-paxos group. Then if I have to talk to different-paxos groups, I'll use two-phase commit. So the key thing about what Spanner's gonna achieve that no other system achieves, and as far as they know, this is unique to Google's operating environment. I don't really, I haven't really heard of any other applications that actually need this, but they need to achieve what is called external consistency. Sometimes called strict serializability or linearizability, right? And the basic idea here is that we wanna have the order in that transaction's commit in the system is the order that they enter the system, right? Remember when I talked about serializability before, I said that transaction one can show up, transaction two shows up later, but because we're allowed to interleave our operations to anybody we want, it may actually be the case that transaction two commits first. And in external consistency, you cannot do that. So they're gonna do that for right transactions and then any redoing transactions are gonna be able to run without acquiring any locks in the system and run on consistent snapshots in your Paxos group. So I'll cover, I'm gonna spend most of our time talking about the current serial part, but I'll talk a little about the physical denonization because I think we mentioned this before and we'll see this again in a second at MongoDB. But one of the big things that in the big table system and earlier in the SQL systems, everyone claims that, oh, joins are really slow, you don't wanna do joins. So in an OTP environment, they're not wrong because most of the times you wanna do foreign key lookups between two tables and if those two tables now are physically located in different pages, now I'm doing additional IO to go get, for a single record or single query, I'm going to do different locations to get data to different places. So what Spanner allows you to do is if you have like a foreign key reference between two tables, so we have a user's table and an album's table and then the album's table, we have a foreign key reference on the user ID here. So they allow you to have this additional pragma at the end of your create table statement to tell you that all the tuples for this table here should be interleaved in the pages of its parent table. So that means physically on disk, say each of these represents a single page, I'll have one user record and I'll have all the contents that that table has, but then also in line with it will be all the album records that I have. So now if I wanna do a join, get for a single user, get all user info and all their albums, I ideally can only have to do one page fetch into the distributed system, the file system to go get all that data. Whereas if I was not denormalizing like this, I'd have to go get the page from the user's table and go get the page from the album's table. And they wanna do this because again, they're running in a shared disk architecture, so the disk they're reading from is not gonna be local to the machine where we're actually running the query. So the latency in that case might be now 40 to 50 milliseconds to go get something from the underlying file system. So if I can now pack everything in into a single page in order to complete this query, then that's one less round trip I have to do to go get data. So this is not unique to Google. I think I mentioned this before when we talked about this during the semester. IBM first did this in the 1970s, right, in System R. And they decided to bail on this idea in the 1980s when they made DB2 because this just became a pain to implement. Right, because this was too slow. Yes. So in this case, the new... All right, so again, sorry. The user was 101 and user 990 shares the same album line. In this example here? Oops, sorry. Yeah, in this example. In this example, no, right, these are different. So user 1001 has album 990 and 991. Oh, okay. So that means that the album is in the same disk type as that user? Yeah, so this is like, this is a page. Like for one page, for user 1001, with one fetch, I get the page and I get the user plus their albums. So the albums are not shared by multiple users. Right, so this statement is that means the albums are not shared by multiple users. In this example, they're not. If they are, then you either have to not denormalize it or you have to make multiple copies of it. In many use cases though, they're not gonna be shared. Right, so again, like this is, we'll see this in a second with MongoDB. Think of it, think of it like Amazon. You have your orders, you have your customer account, you have your orders and you have the order items you bought in each order. Right, so it's a one-to-one map or it's a one-end mapping going way down. So for every customer, they can have multiple orders but those orders only belong to one customer. So I could pack those in together and not worry about having to share things. Something that would be shared with things like the product information of the item I bought. And that I wouldn't want to denormalize. Okay, so let's talk about the current control stuff. So buckle up. So they're gonna be doing multi-virgin concurrent control with strict two-phase locking using wound weight deadlock prevention. The reason why they're doing deadlock prevention is because then you don't have to do distributed deadlock detection, right? All you need to do is look at the time and some transactions to figure out who should come first and who should get killed, right? But now if I'm geo-replicated, I mean, it's not just I'm running, I don't have copies of the database on one rack or even one data center, they're all across the world and then you keep everything synchronized. This is when things are gonna get really tricky. Like how do I ensure that the clocks are in sync at all these different data centers? Again, I can use NTP, but NTP is not gonna be super, super accurate. Like I think the range might be like 250 milliseconds, right? So, and then in Google's environment for their particular workload, this is targeting the ads, like the Google AdWords stuff. They want these things to be externally consistent because they wanna make sure that you only, that they're deducting the right amount of money every time they show an ad for you. So think of this, if I give Google $100 to show I wanna show ads, Google wants to make sure that they only show exactly $100 worth of ads, right? Because if they're not completely synchronized, I may show an ad in the US, I may show an ad in China and when I showed the ad in the US, I ran out of my $100 but I didn't know that in China. So it still showed an ad in China. So it gave away its product for free. So it didn't wanna do that. So that way, we're gonna keep everything in synchronized to avoid having people get more ads than they paid for it. So the way they're gonna keep their clocks more instinctive than you could through just software is actually through specialized hardware. So for every single data center, they're gonna have, they installed both atomic clocks, which count the number of seconds based on how I think atoms or electrons come off some metal, barium or something like that, a beryllium and then they're also gonna have on the roof GPS receivers that are gonna pick up the locations from the clocks and the satellites and use that to keep the clocks super synchronized. So they're gonna be able to, I think the balance they can achieve is like within like seven millisecond accuracy at every single data center, which is amazing. So we'll discuss how this is all gonna work in a second so let's understand how they're gonna do replication. So the database is gonna be broken up into tablets. These are, think of these, they're called tablets for I think for historical reasons because in the big table paper, they call them tablets but think of these as just logical shards. Everything is down on some kind of shared disk system and the different nodes will be responsible for some portion of it, which they define as a tablet. So when in each tablet group, you're gonna use Paxos to elect a leader and this is where you're gonna do all your rights into that master node and then you'll use Paxos to keep those guys in sync within the tablet group but any time a transaction needs to span tablet groups, you're gonna use two phase commit to make sure that you're allowed to commit on the other side and everything's completely synchronized. So let's look at an example. So I have one tablet, tablet A, and again, and this is all gonna be and I'm gonna have two other replicas for it across three different data centers and this is gonna be called a Paxos group. So I'm gonna run Paxos, elect a leader and that's where all my rights are gonna go to for transactions that modify any data that's in within this tablet. So any modifying transaction always goes here and then I'm gonna run two phase locking with deadlock prevention because if I have transactions running at the same time I can make sure that they can guarantee they're executing in serializable order. So when a transaction wants to commit we run around the Paxos and we propagate the updates from the leader to the replicas, right? And then they come back and agree to apply the changes and then the transaction is considered committed, right? And to be just to be clear, this is across geographical location. So this might be a data center in Virginia and this might be a data center in Seattle, right? It's not on the different nodes in the same data center. This is going over the wide area network. So now what you can do in Spanner is if you have read only transactions they can actually run on the replicas, the non-leader nodes in your Paxos group using snaps to isolation. So again the transaction can be assigned time stamps when they show up and they can run on the different data centers and be guaranteed they're not gonna see any data from transactions that haven't committed yet. You can see what the data was. You see a snapshot of the data is at different time stamps. For some queries that's okay. And so this is all within a single Paxos group. If I have a transaction that has to span Paxos groups then I have to run two phase commit to tell, to coordinate with them say can we also commit as well? So again within a Paxos group I just need a quorum of the replicas to agree that we have to commit. In this case here with two phase commit I have to have everyone agree to do this. And then within the Paxos group for this other panel up here it's gonna run Paxos internally for itself as well to synchronize with its replicas. So this is clear. So hopefully you can see how we discussed all these sort of building blocks throughout the entire semester and now we can build up something more complicated and see how it all sort of fits together. So what's the tricky thing about this to do data prevention in a distributed environment? I've already sort of said this already. How do you decide whether one transaction is allowed to commit or there's confl... Whether you have two transactions that could conflict because they want to acquire the lock on the same item how do you decide whether to kill one versus the other? Was that, let's say it again, sorry. Let's assume he said timestamps, right? So spanner is gonna order transactions based on physical time, the wall clock time. Now that's different than CoqchurchDB. CoqchurchDB was wall clock time plus a logical counter, right? They had a hybrid timestamp. Spanner is gonna be completely physical, right? And they're gonna do this because they're gonna be able to guarantee this is how they're gonna achieve linearizability because since every single node throughout the entire world and their data centers are gonna have completely synchronized clocks that if I'm given timestamp one and you're given timestamp two, I know that in real time I come before you. I don't have to worry about your clock being way farther ahead than I am. It's still gonna be slightly off but we'll see how to deal that in a second. So you have to do this and if you wanna guarantee external consistency or linearizability because you wanna be able to say that if T1 finishes for T2 then T2 should be able to see any result of T1. So it needs to know about it because that's how we achieve that strict serializability. So within each Paxos group, we're gonna decide how we're gonna commit transactions based on these timestamps just doing the wound weight technique, right? It's when we have to span, it's how we're actually synchronized with across the different data centers is actually when things get tricky. So this is where Google introduced something called TrueTime. So TrueTime is, this is again, there's another unique aspect of Spanner that no other system has. So TrueTime is gonna be an API that, or like a service API, they're gonna have internal Google that Spanner can invoke to get what the current time is and that clock is gonna be synchronized based on the hardware devices. So, but instead of getting a single value for what my timestamp is, like timestamp one, timestamp two, timestamp three, I'm actually gonna get a range. And they're gonna use this to bound how long you have to wait to see whether anybody else shows up at your node with a timestamp that's less than you. Because if they're less than you, then overachieving linearizability, then they need to commit before you do. And so, you could have to wait forever because you don't know how off those clocks are gonna be at the different data centers. But because you're using TrueTime with these hardware devices, they know they have a bounded error rate, a bounded amount of time they have to wait. So, the way to think about this is, the devices are gonna have some error defined by epsilon. And you're gonna say it's either gonna be, the current time is gonna be either within epsilon above me or within epsilon below me. So, I'm gonna have the earliest mark where this timestamp could actually be and I could have the latest mark where this time could be. I don't know which one where I fit, but it doesn't matter, I can use these to figure out how long I need to wait. So, again, we already said this, but every data center has the GPS and autonomous clock devices. They're gonna go from just doing software-based synchronization, which might be 250 to 500 milliseconds, down to seven milliseconds. So, I think every, so often every 30 seconds they refresh and keep, tighten the bounds and get back to about a seven millisecond drift. And you have to do this again because the hardware clocks on the actual CPUs themselves are not super accurate, right? They're like some little crystal thing that counts. It counts forward in time, where the atomic clocks are counting like how atoms move off or electrons come off a piece of metal. So, they're gonna have these different synchronization demons with every single cluster. That way you have fault tolerance within the data center to keep all of these clocks in sync. And that way they can distribute a load of all the different spanner nodes who need to get what the current time stamp is. They can go to these different services within the data center. And then we keep talking to each other all over within the data center and across the different data centers to make sure everyone knows what's going on and everyone's always in sync. So, let's see how we use TrueTime. So, let's say I have a transaction that comes along and wants to execute. The, I run at this point and I start acquiring locks, right? And I do whatever changes I want to do, right? Then now I want to commit. Again, we're doing strict two-phase locking, so we hold our locks to the very end. So, I want to commit at this point here. And so, I ask the data center is gonna ask TrueTime what the current time stamp range is, like what does now give you back? And it's gonna give you back a, again, the high watermark and the low watermark, the earliest and the latest that this number could be. So, I'm gonna be able to say that the current time stamp is either gonna be in this range here. And so, I think I'm in the middle here, but I don't know. And I also don't know if anybody else is gonna show up at my database or my node here with a time stamp that's less than I am. Therefore, they should get preference over me. So, I need to wait until this point here, which is the total amount of time I have to wait as defined by my range to see if anybody's ever gonna show up with a smaller time stamp. After this point here, right, so this is just the average epsilon put together like that. So, after this point here, I've waited this amount of time. I know nobody else could show up. Somebody else could show up with a lower time stamp than me, but I've waited long enough and I'm allowed to commit and they'll have to abort. So, the basic idea of what TrueTime is giving you is this tighter balance on how long you have to wait to see if someone's gonna show up because they might be running on another data center. If they do show up, then we just use regular two-base locking, the wooden weight to decide who gets preference over the other. So, this is the key aspect of Spanner that makes it different than everyone else, that they're using these hover clocks to get ranges to bound the amount of time they have to wait. In the back, yes. So, the question is, if someone shows up in the middle here, right, with a time stamp that's less than I am, how do you know who gets preference? I think the answer is that you have, I think there's a global ordering of the node IDs within the data center. So, if we have exactly the same time stamp, then if your node ID is less than mine, then you get preference. Use that for tie-breaking. That's a standard technique as well. All right, so as I alluded to in the beginning, what I'm describing here was defined in the original Spanner paper. There's three other versions of Spanner. So, two years later in 2013, Google put out a paper about a system called F1. So, this was now, in Spanner, it was a pessimistic system using two-base locking. F1 is a optimistic system using OCC built on top of Spanner. The other tricky thing about Spanner is like, I think in the original paper, it looks like they're describing something that looks like SQL, like they support SQL. The original version of Spanner didn't actually support SQL. So, they built F1 to have support for SQL because they wanted to use this to replace the MySQL cluster they were using for all the ads. So, F1 was built to be specifically for the ads system where Google makes all their money. So, it's basically the same thing that we talked about with OCC where they're going to just use these time stamps to figure out in the order that transactions should be allowed to commit, and they have a validation phase to figure out who conflicts with who, and who's allowed to proceed versus the other one. So, they just store this time stamp information inside of Spanner just as a hidden internal column. So, from Spanner's point underneath the covers, it just sees a column. It doesn't know anything about how it's being used. But then up above this in the F1 coordinator, it uses that information to figure out who acquires the lock and who's allowed to do what. So, F1 was built for the AdWords teams down in Mountain View. At the same time, they were building another version of Spanner called Spanner SQL up in Seattle that at a high level had the same sort of basic architecture as F1. I asked them, why did you guys build two versions of F1? And they said that because F1 was for the ads, that's the golden calf of Google. That's where they make all their money. So, the regular riff-riff of Google is not allowed to use it. So, they built Spanner SQL for all the unwashed masses at Google. The fourth version of Spanner is now a database as a service that Google will sell you called Google Cloud Spanner. So, just like an Amazon, you can go get RDS or Aurora as a service. You can go give Google your credit card and you get access to their distributed database that uses Spanner. I've never used it. The benchmark numbers look pretty good, but it is quite expensive. So, this is actually the fourth version. So, they've got another version that's available to everyone else. So, if you go to Google, if your employee at Google, you're probably gonna end up using Google Spanner SQL. If you're at a startup outside of Google and you wanna use Spanner, you end up using Cloud Spanner. How much code they share? I don't actually know. All right, so any questions about Spanner? The main takeaway to remember is that they're using the hardware clocks to bound them out of time at the weight for a transaction to have a lower timestamp than you. But they're still doing two-phase locking and multi-versioning just like we talked about before. Okay, let's finish up quickly with Mongo. So, actually the same day I went to go visit Mongo, or CockroachDB in New York City. I also went to go visit Mongo. I said this is a picture of their lobby. I took a picture of the sign because I liked it. CockroachDB is actually in a cool part of in New York City in Flatiron. MongoDB is in Times Square, which is basically hell on earth. So it's right near Times Square, and they're around the corner. But you have to walk through all the people wearing the Elmo costumes and the Superman costumes to go. So MongoDB is a distributed, shared nothing document database system. So it's one of the first NoSQL systems that really got a lot of traction outside of Google. So a document database doesn't mean like a Microsoft Word document, doesn't mean like a PDF. Think of like a JSON document, JSON object or an XML document or a Python DIC, something like that. So at a high level, it's going to look a lot like a relational database in terms of how they organize the data, but they just use different terms. So again, instead of a tuple, they call it a document. Instead of a table, they call it a collection. Right? So it's going to use a centralized routing architecture with a shared nothing storage. And just in MongoDB version four that came out this year, they just added support for multi-document transactions. So now you can have transactions that span multiple objects and cross the entire distributed infrastructure. So the one thing I think was interesting point out is the open source license is this thing called server side public license. This is actually a license that MongoDB invented. They originally were AGPL. They came out with this year with extra precautions about making sure that people don't take the MongoDB open source code and then go use it as a cloud service and make money off of it because Mongo wants to sell their own cloud service. I actually think this is a good idea. I don't know whether this is in response to Amazon or some company in China, but this is actually a very interesting concept that is unique in databases, open source databases, and we'll see whether they get where this goes. So just like in Spanner, MongoDB's documents, they're pushing the idea of doing physical denormalization because again MongoDB is not a support sequel. It originally didn't support transactions. It does now. It originally didn't support joins. It does now. But they were always pushing this idea of pre-joining things, right? And the example I gave before were the customers orders and order items, right? From a relational standpoint, it would look like this and you have to do separate fetches if you want to join these things together, but they would argue what you really want to do is just embed everything inside of a single document. So for a single customer, I have the list of orders and for every single order, I have the list of order items. So in JSON, it looks like this, right? And as we said, this works out great as long as you're not sharing data across different objects because otherwise you have duplicated data. So query execution is through this JSON-based API. So again, MongoDB originally came out and said, oh, we're not going to do joins, we're not going to do transactions, we're not going to do SQL. They do joins, they do transactions. The last time I talked to the founder a few years ago, I asked him whether they're ever going to implement SQL. He said no, but I don't know whether that's changed since then. So we have this JSON-based API. They don't have a query optimizer. They don't have a query planner. They basically just have heuristics to figure out what indexes you want to use and use some light statistics to do join ordering. The one actually interesting thing they do for their query optimizer, which sort of seems like a hack, but it is actually an interesting idea, is when you want to figure out what the best query plan is, rather than trying to pick the best one, they generate a bunch of them, and then they execute all of them and see where everyone comes back first. Whatever one's the fastest, that's the one they end up using, right? So that's what I mean by a random walk. They sort of just randomly try different query plans, and then they figure out which one is faster, right? And every 1,000 times you execute query, they try that again. So they support UDS. This is very JavaScript influenced, so that all the UDS are in JavaScript, but they encourage you not to do that for whatever reason. In the newest version, they support server-side joins, but I think it's only left outer joins, and then 2018 now they supported multi-document transactions. So we talked about the distributed architecture before, but I'll just go over it again. So they have a heterogeneous distributed components, meaning different nodes are going to be involved or responsible for different activities or different roles in the system to share nothing architecture with a centralized query router and transaction coordinator. They do master-slave replication, and probably the biggest selling feature about MongoDB, in addition to this JSON stuff, was that they support auto-sharding. So the idea here was you shard your database and over time, if a shard gets too big, they'll automatically split it for you and move it to another machine, to load balancing automatically. And I think what happened was a lot of startups end up using MongoDB, although they didn't need a distributed database and they didn't need this auto-sharding stuff, no startup says, oh yeah, we're going to go under in a year, right? Every startup thinks they're going to be huge. So they're like, all right, my database is small now but it's going to grow, and MongoDB is going to be able to grow with me through this auto-sharding stuff, right? Because in MySQL and Postgres, can't do that for you. You have to do this manually. So I think this was actually one of the really interesting ideas that they had in the very beginning of the system that was very popular. So we already talked to this before. This is the architecture, right? You do a lookup from the application server through this MongoS router. It looks in the config server that has all the information about how the shards are split up, and then it gives that information back and tells you what node to go get the data you need. And then now this MongoS is actually doing support for transactions in a way that the early system couldn't do. So MongoDB originally was written using M-Map. I have made strong declarations before, and I'll say this again, using M-Map for your database is a terrible idea. Like think of it as a poor man's buffer pool. Instead of using the OS, or instead of running the buffer pool manager yourself, you let the OS do it, the OS is always gonna do a bad job. So they were all in with M-Map. They made public claims of why M-Map was a good idea. But then in 2015, they bought this startup called WireTiger, and now the M-Map engine is gone. It's been deprecated, and now they use WireTiger, which is an awesome database system. It's an awesome storage manager that has a real buffer pool manager. And they got way better performance and way better reliability. So in the same way in MySQL, you can plop in different storage engines. You can replace innerDB with other things. In MongoDB version three, you can replace the backend storage manager with different engines. So the default is WireTiger, but you could use RocksDB, or other things if you wanted to. So I realized that it's a really launchy storage for me to guys tell you, but at the end of the day, just realize anytime you're talking to somebody on the street, make sure they're not a cop, and make sure they don't have a gun, okay? All right guys. So, my concluding remarks for the semester. I love databases. You have no idea, right? So I think they're awesome. Anything you're gonna do throughout your life, whether you stay in computer science, or you go out in other fields, you're gonna come across databases, right? And there's so many different aspects and so many different problems you have to solve in them. And the 25 lectures we had this semester barely covered everything you need to know. I'm learning all the time that even I, there's stuff I don't know about. So, but what I want you to get out of this course is not so much that I think you can go off and go build your own database. You could at this point if you wanted to, but when you go out in the real world and you have to make an engineering decision about either how to store data or manage it, or what databases you wanna use, now you can actually make an educated decision. Right now you're not picking things because it was you showed up on Hacker News, or because you like the name, right? Now you can say, well, my workout looks like this, the architecture, this system looks like this. Is it gonna be making the correct trade-offs for my application to achieve the things I wanna achieve, right? So, the other thing I also say too is, there's no one system is completely better than another. There's obviously shit systems, right? But there's a lot of really good ones, open source and commercial. So, what I would caution though is I would not, I would not make premature optimizations in your decision of a database system based on what you think you're gonna need, some months or years from now, right? Because you may never get there because the system you're picking is not what you need for right now. So, another way to say this is, MySQL and Postgres might just be good enough for 99% of the things that are out there, right? They're open source, they're rock solid. All this sort of specialized exotic stuff, you may need that later, but if you're at the point where Postgres is not able to keep up with your workload needs to do, chances are you have money, right? And that means you can hire people that are really smart or you pay you guys a lot of money to go make a decision on how to actually scale that thing up or out, and then maybe look at other different database systems. So that's the one thing that I would say that I see over and over again, students come back to me and they say, hey, I'm thinking about doing my startup, I wanna pick a database system, and I always say Postgres or MySQL. They go, oh no, what if I wanna distribute a database? I'm like, you have one gigabyte of data that fits on a single machine. You don't need to distribute a database, right? You're executing 10 transactions a second. You don't need anything fancy. So just avoid a lot of headaches and doing things that are overly complicated at the very beginning. Get something rock solid at first and then you can figure out how to scale up later, okay? All right guys, so that's it. Good luck with all your exams, good luck with project four, good luck with the extra credit, and good luck with the snow, okay? All right guys, it's been awesome, thank you, see you. Ha ha ha ha, that's my favorite all-time job. What is it? Yes, it's the SD Cricut IDES. I make a mess unless I can do it like a Gio. Ice cube with the G to the E to the T. Now here comes Duke, I play the game where there's no rules. Homies on the cusp of Yamaguku because I drink bro. Put the bus a cap on the ice bro. Bushwick on the goal with a blow to the ice. Now come, Willie D, that's me. Rolling with fifth one, South Park and South Central G. By the 12 pack case of a four. Six pack, 40 act, gets the real price. I drink brooch, but yo, I drink it by the 12 ounce. They say bill makes you fat. But saying eyes is straight, so it really don't matter.