 He's dealing with his lawyer. That's his problem. Let's jump into this for you guys. So again, midterm exam will be available in my office review during my office hours tomorrow. Please bring your CMID. Project 3 went out this week on Tuesday, and that'll be due November 13th, which is a Sunday. I'll announce it on Piazza, but we'll have the Q&A session next Tuesday at 8 p.m. over Zoom. Again, the slides and the recording will be available afterwards. Then they've announced the final exam. It's Friday, December 16th at 1 p.m. Please do not book a flight prior to this. This will be in person. Okay, yes. The question is, does the final exam cover overall everything or just the second half? It will be focused on the second half, but again, if you forget SQL and you can't do SQL, that's your problem. It'll be like the midterm. Yeah. Yes. No, but just to follow up on that, you'll maybe ask more obscure stuff from the first half. I wouldn't call anything I'm doing in this class obscure, but it'll be focused on the material we've covered since after the midterm, but including parallel execution. But again, I'm not going to ask you what's the joint cost for this setup, right? When I say it'll cover everything, again, if you forget SQL, if you forget how a Buffalo manager works, you'll have problems. So don't do that. Yes. I don't know how other classes do it. It'll be the same length and complexity of the midterm. If you want to save the entire three hours, you go ahead and do it. Some people do. We'll get coffee and donuts and what do you want? Okay. He said, no. Okay. No, I don't get fired. No. I think I don't know the answer to that. Okay. I can't get fired, right? I have a child. I got to pay child support. I can't. That's what he's got to deal with. All right. So again, we'll have more, more information. I'm just saying, Hey, we have a date. We have a time. We have a location. Please don't book a flight before this. Okay. All right. So at this point now in the semester, we've reached the summit. We've reached the top, right? We started off, man, the disk manager, and we worked our way up to these different layers. And now we, you know, now we know how to take a SQL query, you know, and convert it to a query plan, execute it, let it read some data, write some data at the disk and so forth. Right. So now we're going to talk about going forward for the next two or three weeks is sort of coming back down and looking at a bunch of other parts of the system now in the context of concurrency control and recovery. And the reason why we covered this after we've sort of gone through all the layers is that this concept of concurrency control and the protection mechanism we would have for recovery and other things, these are going to be all intertwined with all the different parts of the system. And so it's for, it just sort of makes it easier to go through all these things without worrying about these extra stuff we're talking about now, because now you understand the basic concept of like, you know, what a bufferful manager does. So now we can say, okay, how do we make it work and, you know, and make sure that the changes we write to disk are recoverable or durable and how do we make sure that transactions can run safely protecting themselves from each other. Okay. All right. So to get this motivated on talking about concurrency control, I want to come up sort of two basic scenarios here, two sort of obvious things, right. And so say we have an application that's recording some, you know, your financial status, your bank account, right. So let's say we have a case where we have two threads or two application programs running at the same time and they issue queries at the exact same time that will update the same record in the same table, right. So how are we going to make sure that we avoid a race condition to not have, you know, some weird torn rights or some uncorrupted database state. Let's say there's another scenario where we, it's my bank account, I'm taking $100 out of my account, I'm putting it to another account. But in between the time I take the $100 out of my account, before I put it in the other account, there's now a power failure, the building is struck by lightning or something, the machine loses power and the database system crashes. When we come back, what should be the correct state, right. So the first example is what's called a loss update. How are we going to deal with two threads or two queries trying to write to the same thing at the same time and there doesn't have to be a record that can write to the same table at the same time, even the same database, we can have different levels of granularity. But we're going to protect, the system is going to protect itself from these problems using concurrency control, which we'll talk about for the next three or four classes. And then the second problem here of how do we make sure that we don't lose anything when there's a failure, that's going to be in the context of recovery mechanisms, which we'll cover after concurrency control. Again, these things are intertwined, as we'll see throughout today, but I'm going to really focus on the loss update, the first problem for the next two lectures or three lectures. So it's sort of obvious why you need these things or why you want these protection mechanisms. This is what the advantage you're going to get using a database management system over you trying to go off the rails and write your own data manager in your application. If you're a new startup, you're trying to get your application out the door and so people start using it, so you make money, you don't want your developer spending time writing all this concurrency control stuff and recovery mechanism stuff. The fact that your application doesn't lose data when you crash is not a differentiating factor from your application versus your competitors. So why are you going to spend your time writing this stuff? You're going to let the database system do this for you. And they're obviously going to do a much better job than you possibly could. So the core concept that we're going to have to have these protection mechanisms are a reason about what does it mean for the system to execute correctly. It's going to be in the context of this acronym called ACID. Quick show of hands, who here has heard ACID before? All right, actually not bad. Good, excellent, good. So we'll go through each of these one by one and we'll see. It's actually more than the previous year, so that's very good. We'll go through the ACID, what they all mean, and talk about how the system would implement the mechanisms to protect them. But first we need to find what a transaction is. So the transaction is going to be an execution of one or more operations. In our case there will be SQL queries, but we'll see out today it's going to be from the database system perspective, it's going to see just read and write operations, but it's going to be these operations going to execute to perform some higher level tasks or function in our application. So my example of taking money out of my account and putting it into my bookies account, the first step of taking the money out and then putting it into the other account, those two together are a transaction. It's the thing that the application is trying to do to achieve some state change in our database that it reflects some aspect of the real world or something that occurred in the real world. So the transaction is going to be a basic unit of change in our database system and the key idea here is that we're not going to be allowed to have partial transactions. So I can't do, if I want to do take money in my account, that's a right, and put the money in this account, that's another right, my transaction can't do only one of them. I can't take the money out and then not up to the other, I can't up to the other without taking the money out. We'll simply say if I can't do all of them, then I'm going to do none of them. Now you say, well what about if I have a one operation, if I'm doing one operation that's by itself, well that's a one operation transaction, that's still a transaction. And there will be some other interesting things that will come out of that because you can obviously have a single SQL query and a transaction by itself, but it can update multiple records. So again, going back to my example here, I'm taking the money out of my account, putting my bookies account, the transaction we would define would be these three steps. Obviously we do a lookup, a read into my bank account to see whether I have $100, probably don't. And then we'll deduct the $100 from my account and then we'll add it to the bookies account. Now you can imagine other checks being done in here, like is my account closed or open, is the bookie account, like is it flagged or fraud or something like that. There's a bunch of extra stuff you actually would do here, but for our purposes it's three steps. And so the right operation is the deduction and then the addition. These have to occur together or none of this is going to occur. So let's look at a straw man system, how we can actually implement a database system to support this. And so let's say now we make a single-threaded database system that's going to only execute transactions one after another in serial order. So it means that if I have multiple applications trying to submit a bunch of transaction requests at the same time, they're going to put into a single queue and be one thread popping those transactions off the queue, executing them one after another, right? And I mean only one transaction will be running at a time. So now when we pop up a new transaction off this queue and we start executing it, the first thing the database system is going to do is copy the database file to a new file, right, just to the copy command, nothing special there. And then when the transaction starts running and makes changes to the database, we're going to write all those changes to this new file. Then if we are able to commit, the transaction says I want to go ahead and commit and there's no other mistakes or failures, then we'll just flip a pointer somewhere that says if you want the latest version of the database, it's this new file that we wrote to, right? Would this work? Would this solve all? Well, he's shaking his head, yes. Right, would this avoid the partial transaction problems? Because the statement is, can we assume the override is atomic? Override of what? The pointer? Right, so again, maybe, I should draw a diagram. There's one file that says, here's the file name of the latest version and then when my transaction starts, I copy the database like heap file and make a new one on disk, make all my changes there, flush those out the disk, then when I go to commit the transaction, I'll update that one pointer file to say here's the new file location. So you can assume that, updating that pointer file is atomic. These are the things that works. Is this a good idea or a bad idea? Yes, you said you know it's a bad idea, why? I don't want to start to scratch and copy everything, it's doing lots of files, we have to either do a lot of updates at the end or what's at the end, all the updates kind of get mangled in. Why would they get mangled in? Well, you have all these dirty copies, what is partly through the process of updating through a place name between all the dirty ones when it's halfway in that next drop. You delete the file, who cares? That's easy. Yes? Even though you said you want to modify this 50GB file, maybe only one Qabyte, they will do the exact same pointer thing and then it will put it in the same package. You're describing M-Map. That's going to have problems with that. Yes? And then every time we actually make the transaction to copy the entire database, so it's basically said there's no concurrency because transactions are executing in serial order one after another, and then every time you want to start a new transaction, you've got to copy this entire file. If it's a four kilobyte page, sorry, the database is only four kilobytes, who cares? If it's four petabytes, then yeah, you care. So there's sort of two ways to think about this. Is it correct and is it fast? This is correct. You can guarantee that your transactions will execute in the correct order and produce a correct database state, and I'll define what correctness is in a second. But as you guys pointed out, it's going to be slow. Now, if you think about the problem you had with Checkpoint 2, you had to rate a concurrent B plus tree, right? If you make it single threaded, you don't have to do any of that, right? And there are some systems like Redis and BoltDB that actually do that. But in general, most systems are going to have concurrent threads and concurrent transactions, and concurrent control, yes. But isn't Redis and BoltDB quite fast? So what's the... Is Redis and BoltDB quite fast? Why? Because it's single threaded and in memory. Right? So if your database fits entirely in memory, fantastic. Great. If you can execute all the operations you need to do with a single thread, then yeah, great. Yes. So his statement is, and we're getting this, instead of copying the entire database file, we're going to copy pages. That's shadowpaging. We'll get that in a second, yeah. Okay, so this will work, but it's going to be slow. And in a modern system, we're actually going to want to have concurrent transactions, right? Because just like we had when we talked about parallel execution, we want to take better advantage of the hardware that we have, allow multiple threads run at the same time. And in any case that we have a transaction that tries to read something that's not in memory, there's a miss in our buffer pool, we've got to go out to disk and get it. Well, we can let that transaction wait and stall while the system fetches that disk, that fetches the pages from disk, but then we can let other transactions keep running. Right? So essentially what we're trying to do here is interleave our transactions to get better parallelism, better concurrency in our database system, but we want to do it in a way that guarantees that the database is going to be correct. Again, out the final correctness is. And obviously in some cases we're going to care about fairness. Right? We don't want transactions to get starved, you know, because they keep getting preempted by some other transaction. So the challenge we're going to face is that when we start interleaving these operations in our transactions, it's going to put the database in an inconsistent or invalid state. Right? So my example of taking money out of my account, right, there'll be a moment where I'll take the money out of my account and at that point in time, that money doesn't exist anywhere because I haven't put it in, you know, updated the other account with the new value. Right? So that's unavoidable. Right? Think of the low level execution of the system. There's no magic, you know, instruction that we can go update two different pages atomically at exactly the same time. These temporary inconsistencies are going to be unavoidable, but it's okay because we're not going to expose that inconsistent state to the outside world. Again, I'll explain what all this is and it means in a second. What we want to avoid is the permanent inconsistency, meaning like, if I take the money out of my account, the system crashes, and I don't put the money back in my account, now the bank has magically lost $100, and that would be bad. So these are the protective mechanisms that we need to have to make sure that everything is going okay. So to define what it means to be correct, we need to define actually what a database is, and this will be a simplifying instruction or simplification of how a real database works that we'll go through for this class because it'll make it easier to understand these concepts as they go along, and then on Tuesday next week when we talk about two phase locking, we'll talk about how to do this in the context of our real system. This instruction is going to be able to carry out as many operations as it wants on the data on the database, and it can read and write whatever it wants. I'm not going to define what actually is an object in the database. We'll just call it, you use variables ABCD. In practice, it's usually going to be a tuple, but you could apply these same techniques for pages, for tables, for databases. And the other important thing to understand is that we can only, the databases will only be able to control things that are within its purview or meaning of the data that is actually being stored inside the database. That means if the application does something that leaves the orbit of the database system, we have no control over that and therefore we can't reverse that or roll things back. Meaning like, if I take the money out of my account, put it in my bookies account, and then the application sends an email confirmation to say, hey, Andy, we did the transfer, but then the transaction aborts and rolls back that change, the database system can't magically retract that email. So there are external things that the application may do in the context of a transaction, but because the database system is not the one doing it, it has no control. So we're going to find a database for this class to be a fixed set of named objects. As I said, it's ABCD. We don't need to define what those objects are now. All the techniques we'll talk about today, and actually for pretty much all the techniques we'll talk about going forward, it doesn't matter what the granularity is. And the other key thing to point out is that I'm saying the database is fixed, meaning we're not going to support inserts and deletes. We can only do reads and writes on database objects that already exist. Again, we'll see how we handle that, how we insert the deletes next class. And then for our transaction, we're just going to say it's a sequence of three-day writes being so forth. For our purposes here, we're not going to actually see SQL queries. In this case, write is like an update, right? Yes, in this case, write is an update, yes. So we'll say it's rather than a 10. Again, it's overwriting whatever the value was there before. So B equals Andy, I overwrite it with B equals Charlie. So in SQL, again, we'll talk too much about this, but you'll see it in the schedules that I show. You start a new transaction every time you call begin. There is something called auto-commit where basically any time I have a SQL query, it assumes that's the start and stop of a new transaction. But in general, you explicitly start the transaction in SQL using begin. And then you can either complete the transaction with either a commit or an abort. Commit says I want to save all the changes I made. Abort says rollback everything. I think the SQL standard also supports the rollback. Instead of abort, you can say rollback. I think Postgres supports both abort and rollback. My SQL only supports one. I can't keep track, but I'll just use abort because there's fewer characters, fewer letters. So if I abort, then it's going to be as if the database never executed the transaction at all. Now, one important thing to understand in Elsa 2 is that if the applicant says a call is a commit, there's no guarantee the database system can commit your transaction. You want to commit. It doesn't mean it's going to let you. So it may be the case that you call commit and the database says you did something you're not allowed to do and you actually get an abort. You get an exception from back at you. It says we can't commit this transaction and it usually might tell you why. And there are partial rollbacks which we talk about later, but in general you have to write your application to when you call commit, handle the exception that says I wasn't allowed to. And the abort could either be because you call abort or like I said, the database system you can't proceed and it kills you. The question is why would the database system fail on the commit step and not in the early queries because you could do what's called optimistic currency control where you assume everything's okay, you let it do whatever, the transaction do whatever it wants and then only on commit you go then verify. So we sound like everyone had heard of ACID before but I'll go through this quickly and then we'll go through more details in a minute. So ACID stands for Adamicity Consistency, Isolation, and Durability. So the first one, Adamicity basically it's the partial transaction thing. All the actions or operations or transactions have to occur or none of them happen. Consistency, this one's a bit vague and we'll see why we discuss it. It'll make more sense when we talk about distributed bases. But basically it says if the transaction is consistent and the database starts off being consistent with the transaction it has to be consistent. That's pretty confusing. It basically means that if the database was correct and the transaction does correct things then my database will be correct after I execute the transaction. Now I'm saying what does it mean to be correct? It depends. So the true story is supposedly the guy that invented this term ACID he kinda admitted he shoehorned C in there to make it be ACID because he was trying to make it fun for his wife or something. I don't know if it's an urban myth or whatever but like his wife didn't like sweets so he called her an ACID woman and so he named this after her. He was German. He's German. Who knows. Isolations could be we want to give the illusion that our transactions are executing in serial order like one after another even though they're not really. And then durability means that if the transaction commits and we get the notification by the outside world that it commits then no matter if the database system catches on fire or blows up or crashes then we should be able to see our changes afterwards. Now some of my comment overwrite those changes we obviously have to support that but assuming that's not the case then our changes should be durable. Alright so we're going to go through these one by one. Again so today's class is the high level concepts and understanding what this all means and how to actually potentially implement it and then how to actually implement this. We will cover that starting next week. I just want to make sure you guys understand what's actually going on why these things matter and what are the sort of correctness guarantees we need to provide. We're mostly focused on isolation because that's the trickiest one here. Alright so as we've already said atomicity is no partial transactions we're basically guaranteeing that all the operations that are in our transaction will happen atomically all at once and again it doesn't mean that it's going to, it can't execute these truly atomically in some point of time but from the outside world it'll appear that's the case. So that if we go and commit then all our changes get applied if we abort then everything gets rolled back. So these are the two examples that we talked about before taking the money out of account putting my bookies account well it's two different scenarios of this so if we take $100 out of my account but then the data is an abort start transaction before we apply it to the other account we need to make sure we roll that one back and then if we take $100 out of my account but then there's a crash and a failure again it's basically the same thing we need to make sure that this doesn't happen. So the correct state of the database in both these scenarios will be as if my transaction didn't execute at all. And the data system is going to use concurrential protocol and it's and the recovery scheme to ensure that this is the case. So there's two basic ways to support this. The first is through logging sometimes called also called the right-hand log or that log structure stuff that we talked about before where as the transactions are making changes in the database we're going to write these log records that keep track of the change that was being made and it'll have information of how to undo that change if necessary. So now if there's a crash then I can go back and look at my log and say okay what was I doing at the time of a crash before this thing got committed and potentially reverse those changes. In the case of the log structured storage that we talked about before there really isn't that any recovery you potentially have to do because the log itself tells you exactly what happened. So most systems traditionally most systems would maintain a log separate from the database heap files if you're using log structure storage the log itself is everything you need to provide this. Beyond just for durability reasons logs are used for a bunch of different reasons it guarantees that you can have a bunch of sequential writes instead of random writes and it often times too in enterprise systems you need the log for formatting. Regulatory reasons you need to keep track of I think in the US the last seven years of any financial transaction so that's essentially the right ahead log you could use that to figure out what actually happened. The other approach is what he brought up and it's called shadow paging and it's basically the same example that I talked about in the beginning of the straw man where instead of actually copying the entire file you'll copy the pages that the transaction keeps track of some there's some data structure keeps track of what those dirty pages are and then when you commit you just flip a pointer to say okay here's the latest version of that directory data structure with all my new updated pages right so this technique is rare the logging one is in this common but this is actually what they implemented in the first relational database system that I have yet in system R but they ended up banning it because it would suffer from a lot of fragmentation because now you if you do like you would sort of invalidate pages because they would get they would get updated by transaction you have to update new pages and then you just sort of had this hodge podge of pages of what was the latest version of the database at a given time and you had fragmentation you had garbage collection it was very expensive and it was a lot of random reads whereas if you use the right-of-head log from the last slide you get sequential access and that's better so they abandoned this and when they built DB2 in the 1980s they switched over to the logging approach the three systems the modern systems that I know that do do something with this is KashiB, Tokio cabinet and LMDB LMDB is a embedded in-memory B plus tree or in-depth organized database that uses M-map and this guy is very very how does this he is very very opinionated about how great M-map is and he emails me about this he's wrong KashiB uses something similar and the Tokio cabinet was a embedded key value store that was sort of popular out in Japan a few years ago I don't think a lot of you are still using it anymore but again this is rare so the advantage of this approach why would you want to do this is that you actually get instantaneous recovery in the right-hand log approach it's like the black box of an airplane if the plane crashes you have to look in the black box and figure out what actually happened saving the database the system has to look in the right-hand log to figure out what was going on at the time of the crash and put the database back to the correct state to remove any partial transactions in shadowpaging when you crash and come back that pointer is always going to point the director is always going to point to the latest version of the database any transaction that was running at the time of the crash they were updating a bunch of a secondary copy, the shadow copy so you just ignore that and garbage collect that when you come back up right so it's not to say that this is a terrible idea it does have particular advantages but most systems choose the right-hand log approach and the logging approach because of performance reasons yes if there's a crash updating of the pointers you assume that has to be atomic that has to be atomic but it's updating a pointer you can do that atomically the one system that is interesting that did do this as well I don't know the name of it, it was the 1970s it was this database built by the Puerto Rican telephone company Puerto Rico has problems now but in the 1970s they had more problems with the power the power was always going on randomly so they built a database that we used in shadowpaging because when the power went off randomly the system would crash and because it happened so many times during the day when the power came back on you would have instantaneous recovery if you use shadowpaging yes this question is does it allow for current access to certain pages you would have to use the isolation mechanisms above that to protect it, yes, but you could back in the day though it was a uniprocessor in one core yes the question is, and she's correct if you're using shadowpaging you don't make a copy of the entire database as I said before you make a copy of individual pages, yes there's additional data structure to say here's the pages I've copied which makes memory databases this question makes more sense for memory databases uh I think right ahead logging is still better the logging is still better we can cover that later let's take that offline you have a log, you log the disk okay alright, so let's talk about consistency again this one is kind of vague so bear with me here so the thing I understand is what Davis is trying to do, Davis is trying to model something in the real world right and so you think of like I have a database on S3 what students take what classes you're all here at students, you're all in this class so somewhere there's a record that says student X is enrolled in 1545-645 right and then if you're enrolled in the class you obviously want to row there, if you drop the class then that gets marked as dropped and so forth right so we want our, the database represent whatever the thing in the real world to be like logically correct so that when you ask questions about it like is this student enrolled in this class and the person is truly enrolled in the class it comes back with yes, true, right so that's what I sort of mean by consistency and there's sort of two levels of this, one of them is something that we can guarantee and one of them is something that it's really left up to the application programmer because it's just too we don't, the system just doesn't under the semantics of the correctness so we can't really do anything so again, so the first example I said about like the modeling of the class this is what it's called database consistency so it just means that the database is going to actually reflect whatever the thing it's trying to model in the real world and it doesn't violate any integrity constraints like a student can't be enrolled in the same class twice in the same semester right and so we the dataset couldn't enforce that if you try to do something and then any transactions that makes a change to this database any when a transaction makes a change to the database any future transactions will see those changes assuming they weren't overwritten right but the things will persist right so again for a single node database maybe this doesn't make sense like yeah of course if I insert a record I should be able to see it right this will make more sense when we talk about distributed databases where if I insert a record on to say the database is stored across multiple machines and I insert a record on this node over here if I'm told my transaction commits then I should be able to immediately see that change on another node right for a single node again it's it's sort of weird but again distributed databases if they have strong consistency they will provide that guarantee and we'll cover that how we do that I think after Thanksgiving transaction consistency this is the one that's real fuzzy it basically means that if the database is consistent before the transaction runs and if the transaction does consistent things to or is consistent then the end state of the database after the change will be consistent right so it's subjective because like the the database that doesn't understand the semantics of the high level meaning of the changes that transaction is making again like well student can't be enrolled in can't be enrolled in the same class twice like you can put integrity constraints for that but like a student shouldn't be enrolled in the class if it appears on a Tuesday because of you know they have some allergy or something like that right that's a high level thing that's a bottom of the database that only the application or human action could know and therefore if a transaction violates that then the database is tactically inconsistent but we just don't know that this question is in my example of transferring the $100 out of my account the notion that when a transaction commits that $100 has to be somewhere that would be database consistency because there's integrity constraints we can put in place well actually I took it back it would be database consistency because we don't these are all tied together because you don't want to allow for the partial updates so if the transaction commits it has to be somewhere again let's not get hung up too much on consistency here again for distributed databases it'll make more sense but again in general integrity for transaction consistency the database system doesn't doesn't control it we can't enforce it so it's again this is more theoretical thing not something that any system would actually do alright so let's spend most of our time talking about isolation levels or isolation so the thing that we want to achieve is that when an application or user submits a transaction we want that transaction to execute under the illusion or the idea that it's running by itself there's no other transaction running at the same time and therefore we won't see any updates from other transactions and we don't worry about clubbing our updates the reason why we want this is because it's a way easier programming model to have because now in your application code you don't worry about should I be seeing things in my database that I shouldn't be seeing if I write data I should be able to read my own writes this sort of seems kind of obvious but the whole NoSQL movement was all about not doing this and you had to write a bunch of extra code to deal with the eventual consistency or deal with the lack of isolation of transactions in your application code because maybe I could write something and not give back my own write I may actually get back a series of writes and I gotta figure out which one I actually want to look at series of values so the the challenge is going to be that we would achieve this we want to have transactions think they're running by themselves but as we said we want to interleave them the reads and writes of transactions that are running at the same time because that's going to give us better concurrency or better parallelism right so we want to be able to interleave our transactions in a way that they appear or they make changes to the database as if they're one at a time running in serial order but but we were actually able to interleave them and we still wanted to be the database has to be correct so this is what the curtail protocol curtail scheme is going to provide for us you can think of it as a traffic cop that's going to be responsible for deciding at run time what operations from what transactions can do what things on what different database objects and it can sometimes deny the request or it can allow it and then later on say you shouldn't have done that and go ahead and abort you so there's two categories of these protocols so you asked about this earlier like why can a transaction go to commit and it isn't later on it decides to abort it that's an example of optimistic occurred control because you're going to assume the conflicts are going to be rare so you let anybody do whatever they want and only when they go to commit you look at the things that they actually did and say okay that's okay you can commit or no that wasn't okay you have to abort pessimistic occurred control would be I assume the problems are going to so I'm going to require you to require transactions to acquire locks on objects that they want to touch before they do anything and that way I know that if you have a lock on something you're allowed to do whatever it is you want to do on it so in some cases optimistically better other cases pessimistically better then the day if you have the worst contention possible you have a million transactions all trying to update a single tuple these things basically are the same thing it doesn't make a difference we probably want to mostly think about how to do this on the average case none of this comes to free just because it's optimistic doesn't mean it's magically faster than pessimistic there's always bookkeeping there's no free lunch in databases so let's look at an example here we have two transactions now and we have two accounts A and B and they both have a thousand dollars so what we want to do here is we want to take a we have the first transaction we want to take a hundred dollars out of A's account put a hundred dollars back into B's account and then there's another transaction that wants to compute interest on the bank accounts and give us everyone gets 6% right so what are the two possible outcomes of running T1 and T2 assume these things are committed or submitted to the databases at the exact same time by two different threads or two different clients right what are the possible outcomes well in this case here there are many different ways to interleave these things but at the end of the day we need to guarantee that if we add A and B after executing T1 and transaction T2 that the total amount that's in the bank for both accounts is 21-20 so the data system is not going to guarantee that just because T1 gets submitted for T2 or T2 gets submitted for T1 it's not going to guarantee that it's going to execute those things in exactly that order and this is going to be slightly weird when you think about maybe other parallel programming you've done where those memory barriers or x86 is very cautious to make sure you get things executed in the right order in our world of databases we can actually interchange these or interleave these anyway as we want but at the end of the day we just need to guarantee that whatever the state of the database that we end up with is equivalent to executing the transactions in serial order so either T1 followed by T2 or T2 followed by T1 so the possible outcomes we have for this if we execute T1 first followed by T2 we would have 954 and 166 or we execute T2 followed by T1 we would have 960 or 1160 at the end of the day when we add them up it's always 21-20 so let's see what this looks like visually so here's our here's the two serial orderings for these two transactions so execute T1 first followed by T2 or execute T2 first followed by T1 but all that matters again is the values of A and B when you add them together equals 21-20 both are correct even though A has less money and B has more money than the other but from the database system perspective these are both valid these are both correct orderings we talked about before why we want to interleave our transactions, why we want to do this because we're going to avoid blocking or stalling because of slow disk and network that was the big deal that they cared about in the 1970s because disk was super slow you had limited memory all of the time transactions had to go stall because they had to fetch things from disk so even though you only had one core in your CPU you could then pick up another transaction and run that while the other transaction was still on disk in modern systems the disk has gotten faster memory has gotten larger and for all of these systems you want to keep most of not all of the data is entirely in memory in your buffer pool but now you have these multiple CPU cores and you want to take advantage of those so this is why you want to start interleaving transactions as well maybe one transaction stall is because it's waiting on a latch or a lock and then you let another transaction and another core keep running as if they were executed in serial order so let's look at a good interleaving for these two transactions so in this case here we're going to execute T1 first it takes $100 at A but then there's a context switch for whatever reason maybe it stalls on disk, maybe the OS swaps it out whatever, it doesn't matter but then T2 starts running it then does the computer's interest on A then there's a context switch back to T1 with $100 back in B back in to T2 and it computes the interest on B this is equivalent to executing T1 followed by T2 the basic idea here is that we're always guaranteeing that we compute the deduction or addition on the account in T1 before we compute interest in T2 or likewise we could if we do something like this the outcome is not equivalent to two possible serial orderings and we end up missing $6 in our total which is bad right so now the database system doesn't see again these operations like B equals B plus 100 so it can't do any tricks like oh these are commutative let me play some game like that all it sees is these read and write operations so again it doesn't understand the high level meaning of what the if I read a record what's it actually going to do with it when it writes it back we don't know that because that's over the application code we only see these little things and we have to deal with this so we need a way to actually program we need a programmatic way to understand and reason about whether it's interleaving in a schedule is actually going to be correct like visually I'm just sort of pointing at stuff and drawing circles this would be problematic but this is for two transactions in a pretty simple schedule I need to be able to reason about this at a larger scale and obviously do this in a way that's you know that a program can run, our database system can actually run yes if you actually run this in Postgres will it ever give you this bad result? the question is if I actually run this in Postgres for really any database system could I end up with this this result yes, if they're not doing if they're not running the Kirchner protocol could potentially allow this if you turn off locks and other things you could, yes so basically would the transactions would transactions do this isolation checking for it? would the data system do this for you? I guess you're saying if you turn things off it doesn't do it but if you leave them on it would do it? yes, but I'm telling you how to turn them on I'm telling you how to if you leave them on that's what this is the application code isn't going to have you don't want people writing their own application code doing the checks and saying oh did this happen for this because there's no guarantee that you're always going to submit the true transactions at the exact same time what we're trying to discuss today is how do we understand what does it mean to interleave things and end up with the correct state next class this would be like when I give you the schedule and I give you a fixed set of database objects next class would be if I'm getting incremental queries or incremental read-write operations how do I guarantee that I end up with a schedule that is correct so what we're going to find is that we're going to say that a schedule is, a schedule will be correct if it's equivalent to some serial execution so we've already said it's sort of what a serial schedule was there's no interleaving we're exceeding the transactions one after another and we're going to say a schedule is equivalent for any given database state as the input the effect of executing the transactions in a schedule end up producing a new database state that's equivalent to the other schedule and again these are low level read and write operations we don't actually care what is it additional multiplication of whatever it is on these objects it doesn't matter, we want to have the state of the objects be the same so the gold standard we want to achieve is what is called a serializable schedule and this just means that the schedule with its interleaving will be equivalent to some serial execution of the transactions right and so obviously this means now if the database is consistent, the transaction is consistent then the serializable schedule that we would choose or use would guarantee and preserve that consistency so again as I said before this is kind of a weird concept where I could have transactions get submitted in one order but I'll commit them and apply their changes in a different order inside of my database system and that's okay and we want that in our database system because it'll guarantee that it gives us more opportunity to commit interleavings to maximize the amount of parallelism we could have if you care about the ordering of commit order of transactions based on their arrival time then you either do that in the application with your own barrier it says okay I submit t1, wait till it comes back then I submit t2 or there are some systems which they're rare, there are some systems that will guarantee you that the commit order will be the same as the arrival order that's called strong serializability or external consistency there's not many systems that do this google spanner and fauna are probably the two main ones question desse this question is why are we saying this is why make a big deal about this is isolation not consistency because if you can normally this sentence here we want the state of the database to be as if we're executing them in isolation of others and that means that not only do we do we end up with this serializable you know the serial ordering or serial equivalent to a serializable ordering or schedule that as part of that we're not going to see any effects of any transactions that are running at the same time as well so I'll guarantee the state of the database at the end is equivalent to a serial ordering and it's part of that while I'm running I won't see the effects of anybody else running at the same time as me because consistency basically means that like it's part of that like the database it was correct when it started it's correct when it's afterwards even though I interleaved things but again as I said you could have temporary inconsistencies but that's okay at the end as long as everything is consistent that's correct okay so as I said most systems don't guarantee that arrival order execution spanner does this they claim the paper for google ads it's a big deal that's why they need it that's actually true but as they said most systems don't do that most systems actually most systems may not even give you serializable execution that we're describing here it's not all by default in most systems if you get it if you ask for serializable in oracle they'll lie to you and actually give you something weaker but in general like this is the gold standard because this is the the this is guaranteed not to have any anomalies or any problems in your application code or in your transactions alright so now we're going to define what does it mean to have a need a programmatic way to say okay can we have this transact or is this schedule actually serializable so what we're going to do is we're going to identify the conflicting operations in two schedules or sorry in one of our transactions and we're going to say that the operations are going to conflict if they occur in different transactions and they're accessing the same object and at least one of the transactions is doing a right on them right you could have a read write write or a right read and then we can look at these conflicts and identify what are some of the anomalies that could occur that should not occur would not have occurred if it was actually a serial ordering and if we can identify these anomalies then we would know that this is not a serializable ordering or sorry a serializable schedule so I have the three basic anomalies read write write read and write write conflicts there's no read read conflict because obviously who cares if you not read the same thing that's not a conflict there are two additional anomalies that we'll cover next week phantom reads and write skew but for our purposes here we can focus on the three main ones alright so the first one is a write read conflict also known as a repeat or read and this means that a transaction tries to read an object multiple times in what's running and it gets back a different value at some later point alright so T1 starts it reads A, gets $10 then there's a context which T2 starts running it reads $10 as well but then it writes back $19 and then now when this transaction reads A again now it sees 19 it shouldn't right it was truly running for the installation of any other transaction it would see $10 again right so this is called a un-repeat or read again this would violate a serial ordering a write read conflict also known as a dirty read is where one transaction reads data from another transaction that hasn't committed yet and then say that other transaction then is does some action based on that dirty data that it shouldn't have seen so we read $10 here right back $12 then there's a context which T2 starts running it reads the $12 that T1 wrote and then there's some additional logic or something where it says okay well let me add $2 to it now it's $14 it goes and commits and assuming at this point here we tell the outside world yes your transaction has committed T2 is done, you're good but then T1 aborts and now we gotta roll back the change that it made in A to the outside world the write that T1 made and that shouldn't have happened if it was running in serial order right so this is bad we don't want this to happen the last one is a write write conflict also known as a lost update and this is where we have a transaction that's allowed to overwrite uncommitted data from another transaction and you can potentially end up with torn writes so T1 starts there's a write on A T2 starts overwrites on A then writes on B and adds in DG Mushu and then T1 runs again and now writes Andy so in this case here you would have the value of A would have been written by T2 but the value of B would have been written by T1 which shouldn't happen if there was a serial ordering because it'd either be all the changes from T1 or all the changes in T2 yes yes the question is the conflict with the first read or the second read? it's the second read right the combination of the two of them if I see ten dollars here any time I read on A again I should see ten dollars statement is if T1 only does one read there's no problems assuming that I read on A here and I don't do anything else that I commit there's not a conflict yes so instead of a board I do commit here is that still okay? that would be okay in this example here so yes if this guy commits then since it didn't update anything else technically the statement is correct because this guy overwrote anything this guy wrote anyway so it would be T1 followed by T2 and that would still be correct so statement is say this is like getting queued up and then you commit and then you overwrite it in this case it would overwrite that this is we'll get this in a second this would be equivalent to view sterilizable where at the end of the day the database for A is correct because like it is whatever the last one that committed is but like if it was truly running in in isolation like in serial mode this guy should not be able to see that update anyway so that's the violation we're going to put that there's a high level this sounds kind of fuzzy there's a high level concept like doesn't matter that's very hard for us to enforce so we can't so we'll be very pessimistic I don't use that word very strict and say I can't use that word either that means something else we want to be we want to enforce that like you can't see right from other people if you want to commit instead of aborting what would the value of A be it depends on the implementation of the current control if it's because you could like yeah sorry look at this it depends on the implementation because it could be the case that like this thing runs in like temporary mode not temporary mode this is running like its own private workspace and then when it goes to commit then it applies the changes and so this guy's already committed that he's overwriting that even though in wall clock time he wrote first but he wrote in a private workspace that nobody ever saw this is why I love transactions because it's like this concept of like there's like the wall clock time and there's like this transactional time thing and you can do whatever you want in them in this case here the final value is 12 it is not correct or in this case yes because this should have rolled back think of like think of why you don't want to allow this because say I read A and then I don't know what to actually do with that A maybe there's an if then else statement so if value is is not equal to 12 then don't write $14 but if it is right equal to 12 then write it to $14 so if this guy got aborted this guy should not be able to read A sorry read $12 and therefore it shouldn't have done that right that's why you don't want to allow this in this case here we have the right right conflict on right B and right B here T2 is already committed why is this then a problem because so again it has to be equivalent to a serial ordering so it's either going to be all the values of T1 or all the values from T2 in this case here assuming that like on commit then I applied the changes I would see the I would see the right on A I would see the right on A from this guy and the right on B from this guy and that would not have happened if I executed the serial ordering yes so the question is the thing I was trying to say before the question is does the application have to be okay with the idea that I could either be T1 followed by T2 or T2 followed by T1 the answer is yes and that's again if you cared about the exact ordering of T1 followed by T2 then you either need to use a database that supports external consistency which we're not talking about here or you write in your application your own barrier that says okay execute T1 when I get commit then I execute T2 right either in the application code or you can do some extra stuff on the server side but that's hard and it's more work most of you don't care the truth is also too I would say because it's important to understand like this is the like this is the how this is this is the gold standard of what you would have in a database system but in actuality most systems don't actually achieve what we're talking about here they run at a lower isolation level where you may allow some of these anomalies sometimes it's okay, sometimes it's not but it's hard because nobody knows this is actually an open research problem like nobody knows how bad things you actually can get because like the application is doing if you're actually doing a million transactions a second if one of these anomalies cause a problem would you actually know? probably not but it's important to understand what serializable is and then we can then strengthen it or weaken it so this now gets into the comment that she where I thought she was going about like at the end of the day does it in that right right conflict in one of the conflicts did it actually matter that like the right from one transaction over to another so the the things we're talking about going forward these are going to be techniques or a method for us to determine whether a schedule is correct whether it's equated to a serial ordering next class will be how we actually generate a serial ordering or serializable schedule at run time right so this is like from a pencil and paper point of view we have the transactions ahead of time we know the operations they want to execute in a real system you usually don't have that there's only one system fauna that actually looks at all the transactions ahead of time and then does the scheduling things most systems it's where the application is incrementally sending queries gets back a result does some additional logic and then sends more queries so the protocols we'll talk about next week are how to do this on the fly as new queries are coming in today's class is just about looking at a schedule and the operations of the transaction want to do I know everything ahead of time and figure out whether it's serializable or not so there's going to be two different levels of serializability you would care about there's actually more but that's in the theory world we don't care about that there's conflict serializability and view serializability most systems actually all systems when you say I want serializable assuming it actually implements it correctly Oracle doesn't then you will get this view serializability is this is it sort of impossible or very difficult to actually achieve because it requires semantic understanding what the application actually wants to do with the data and that requires you to like parsing the potentially parsing the application code which is not easy to do or understanding the high level concepts that humans would care about so that's why no system would actually can do this alright so what we're going to do now is going to say that a two schedule of conflict equivalent if they involve the same transactions the same operations and all the pairs of conflicting actions will be ordered in some way and then if we can achieve this we can achieve that we can determine whether a schedule is conflict serializable if we can guarantee that it's conflict equivalent to some serial schedule everyone's eyes are glazing over this probably doesn't make any sense right the basic idea how we achieve this is that we're going to do the swapping technique where we can move conflicting operations sort of swap their order in the schedule so that we sort of push all the operations from one transaction to the top all the operations from another transaction to the bottom and we end up with a serial ordering and if we guarantee that if we achieve that then we know we have something that's conflict serializable alright so say we have two transactions T1 was a they both want to read on A, write on A, then read on B and write on B so what we're going to do is we're going to find conflicting conflicting operations or sorry non-conflicting operations where we can swap their order and try to push all the T1's operations at the top push all T2's operations at the bottom so in this case here this wants to do a read on B and that wants to do a write on A they're not conflicting so we can swap their order now we have a write on B sorry a read on A read on B, read on A swap their order same thing with this write on B, that's a write on A we can swap that write on B, read on A, swap that yes again we're not doing this at runtime we're just this is a theory pencil and paper proving that we can determine whether this is actually conflict serialized or not we're realistic that we're going to actually do this so your question is like would this kill a parallelism we're not I'm not saying that I would execute it in this order because then it's a serial ordering I'm trying to say if I start here my original schedule this thing here is this conflict serializable yes or no and by swapping I push these things up and down you can determine that it is equivalent to a serial ordering to move the commit I'm I'm sorry I think what I was just trying to do is say like by the time this commits I can ignore everything below that yes the statement is if I start with a serial ordering the most parallel schedule would I then try to do the reverse of this yes but in real system you wouldn't do this because you wouldn't know the application would call begin and maybe you see the first read but you wouldn't know what comes after that and by the time you get to the bottom here you've already executed the other ones two phase locking will fix this for us on Tuesday next week so the statement is some systems require you to send all the transactions from beginning to end to the database ahead of time yes so DynamoDB is that Fauna does that they so in that case they're basically they wouldn't use this algorithm because this is obviously inefficient but like you would figure out the interleaving within a batch and then submit them all at once most systems don't do that most applications aren't written that way yes so his question is instead of T1 committing it aborts this still be considered serializable going back here we can ignore aborts for this it depends on how it's implemented right like this is just getting just pencil and paper yes again that's the implementation detail we can ignore that for now alright so this is equivalent to this here's one that isn't right if we try to swap the right on A the red on A here we cannot swap with this right because they're both rights so therefore this is not equivalent to a serial order right so this swapping thing it'll work it's easy to do for two transactions but it's coming to do with as many transactions and so a faster way approach to do this is use what is called a dependency graph I think Wikipedia might call this a dependency sorry precedence graph the basic idea in our graph is that we're going to have one node per transaction that's in our schedule and then we'll have an edge from transaction T1 to transaction Tj if there's an operation in the first transaction that conflicts with another operation in the other transaction and that first operation the first transaction appears earlier in the schedule than the operation in the other transaction so we have a directed edge from TI to TJ and so we would look at the entire schedule build out this dependency graph and if there's no cycle in the graph then we know that the schedule conflicts are liable so go back to our example here so read on A read on B and write on B in this case here we have a conflict between write A and read A so we have an edge from T1 to T2 then if here we have a conflict on write B so we would have an edge from T2 back to T1 we have a cycle therefore this is this is not conflict serializable let's look at a more complicated example here now we have three transactions so we're going to do a read on A in T1 and that will conflict with the write on A in T3 so we have an edge from T1 to T3 we have another write on A to a read on A in T3 but since we already have an edge from T1 to T3 we don't need to draw it again same thing with the write-write conflict here we don't need to draw that again and then on this side here the only conflict we would have is the write on B to read on B we would have an edge from T2 to T1 and then write on B and write on B in T2 and write on B in T1 again we already have an edge there so it is we don't draw an additional edge right so is this equivalent to a serial execution or by definition yes because since there's no cycle in the graph it is and it would be T2 followed by T1 followed by T3 again this idea that we can reorder the commit ordering of transactions to be different than their arrival order in this case here T3 called begin before T1 assuming this is wall clock time but in our serial order we're going to say that T3 will get committed after T2 even though it started before and that's still okay that's still equivalent to a serial ordering yes the state of the database will be equivalent to where T3 committed after T2 and that's okay alright so in the second time I'm going to skip view serializability again no application actually no database support this quickly I'll show you the basic idea here is that like if I try to confuse something like I add to I want to read B and then add them together and print it out if I do my dependency graph stuff then I will have cycles and this will say that this is not conflict serializable but if I just change the code where I don't actually care about the sum the exact value all I want to know is the number of accounts that have more than zero dollars in their bank account so I just rewrite the application to this then assuming that you know these values are above zero then I could produce this actually the correct result but again this is hard how would the application know that this is actually what you're trying to do and that this count doesn't really matter I mean you can do deeper static analysis to understand what's going on but now you got to support every possible programming language that someone writes an application in right and that would be super hard so this is what view serializable means right or the other example that I think someone brought up before like if I do a blind write like if I T2 writes A and T3 writes A with actually ever reading it then even though there's a cycle in my dependency graph at the end of the day all I care about is that this guy wrote it last and that's okay right so the main thing I was pointing out is that there's a there's a looser notion of serializability that you can get through view serializability but it requires you to understand you know are certain things okay in the application the program actually going to care and that's really hard to do so nobody does it okay right so pretty much if you after serializability this is what you get so the way to think about what possible schedules could exist is through this sort of nested structure here so say that this region here is all possible schedules and there's no notion of correctness no notion of serial ordering or serializable to anything you possibly could do and then in the middle is going to be serial ordering oh sorry and then after that we conflict serializable and then after that we view serializable so any schedule that is serial is also conflict serializable and view serializable but not every view serializable schedule or conflict serializable schedule is serial out in this middle part here because it's efficient to execute it, or you can efficiently implement it, you can efficiently guarantee that there's property, and there's enough opportunities for parallelism that you can get better performance than you would otherwise in serial ordering. All right, so quickly I won't say much about transaction durability. We've already talked about a bunch of things, the logging and the shadow paging. We'll go into more details how this all works in two weeks. I realize I keep saying we'll cover this in the future, but there's a lot to cover here and I can't sort of, you guys ask a question about, hey, what about this, and there's answers to those things that are in the future lectures. So I just want to say we will cover this in the future. And then we've discussed all the asset stuff before. So in my opinion, the concurrently troll and the recovery mechanisms in a data system are some of the most important features along with the query optimizer. This is really hard to do to get correct and make sure you don't lose data, don't have transactions, do things they shouldn't be doing. You want transactions, it makes it easier for the programmer to write code and not worry about weird issues. There's always some, I think it was a Bitcoin exchange that went down recently because some guys let them all the money because they weren't doing transactions correctly. So you want a data system that does transactions, you don't have to worry about these anomalies because the data system will guarantee for you. And so we'll see in the next class how we're going to use concurrently to automatically guarantee that we can take any arbitrary set of queries that are showing up from our application and still guarantee that we achieve serializability. And then we'll talk about weaker isolation levels. We can start relaxing some of these guarantees and allow some anomalies to occur because that's going to potentially give us better performance. And so don't take my opinion that I think transactions are super important. There's this paper from Google about Spanner over 10 years ago in OSDI. Like Google was that sort of the forefront or they were sort of the vanguard of the NoSQL movement because they built all these systems that didn't do transactions because they cared about performance and scalability. And everybody copied them because Google was making a lot of money, they're smart, but let's do what they did. And then Google realized, oh, for transactions actually a good idea. So there's this line in the paper that says, we believe it is better to have application programmers deal with performance problems due to overuse transactions as bottlenecks arise rather than coding around the lack of transactions. The lack of transactions is the kind of I said before in these NoSQL systems where you have to write your own application code to deal with all those anomalies that I talked about. So if you don't know, the average programmer is probably not going to be able to reason about these things, let the data system finish it, do it for you. Okay. I want to quickly talk about Project 3. So Project 3 is out, you're going to be implementing query execution support in Bust Hub. Bust Hub now supports basic SQL and we have a rule-based optimizer that can take a SQL query, convert it to an AST and then convert that AST into a physical plan. And then your job is to build the query executors, the operator executors that can execute those query plan nodes, right? So everything is posted on the project website and pull the latest version from GitHub. So there's two major tasks. Most of the work we spend on doing plan node executors. So you do a sequential scan, index scan, we're only doing inserts and deletes. You don't have to worry about updates. And then you do a basic nest loop join and an index nest loop join, then aggregations, limits and sorts. So for those that struggle on Project 2, Checkpoint 2, when having a current index, if you think your thing has bugs, just put a latch on top of the entire data structure, and that'll be okay for Project 3. Okay? What? Well, we'll do that in a second, right? And then we have a rule-based optimizer. There'll be one task at the end. We have to convert the order by and limit clause into a top end, right? Which is more efficient. So now the question is, will that be efficient? No. But for the leaderboard, what we're going to do, rather than relying you having, you know, just sort of having whoever had the fastest Project 1 and Project 2 crush it also for Project 3 on the leaderboard, we've actually given additional tasks now that would not necessarily rely on you having the fastest implementation for your buffer pool manager and your index, right? The basic idea is that you have to implement new optimizer rules to get better query plans. Because for certain queries, the optimizer we have now will produce horribly inefficient query plans. And no matter how fast your B plus 3 is, you would lose out. So this is the idea here to make it so that like, it's not the same people always getting top ranking for the subsequent projects, you can actually do some extra stuff and still beat them. So the task you have to do for this is implement optimizer rules to do joint reordering, column pruning, and then a more aggressive predicate pushdown. Yes. His question is, if I don't do this, will that interfere with Project 4? No. Yes, this is optional. Yes. Alright, so my advice to you guys, start with insert, sequential scan, do that first. So you can implement, actually read data. There are, there is a mock table that he has a mock sequential scan. But she did this. This is, he's fantastic. So there are, there are, how to say this? There are internal tables that you can then run queries on and maybe test other parts of your database, like, like your sort clause and so forth. But I recommend you start with insert and sequential scan, right? You're not need to worry about transactions just yet. All the things we talked about today. Don't worry about that. The, the aggregation and sorting do not need to be backed by the buffer pool. You don't need to external merge sort. You don't need to have your hash table to spill a disk. Assume that everything fits in memory and then use the standard template library functions for these things. And again, of course, Gradescope is not meant for grading, or sorry, not meant for debugging. Write your own little test. Alright? So you're not changing the files that you did. The ones you spit out on Gradescope, make sure you pull the latest change from us, of course post on Piazza and come to Office Hours. But the one thing we have this time for you guys, which is super, super exciting, is there as now a, there's now a browser version of bus stop. Use m script to convert the C++ code into JavaScript. So you can run queries. I'm going to print. This project is already done on it. That I don't know. But you can create tables. Did I? No. Did she? Yes. Insert into question is why is it called bus stop? Yes. Alright, so you get a row, select star from foo, and you get it back. But he also supports explain. So you can see the query plan, right? So this is fully implemented for the project. As far as I know, I think the optimizer rules are implemented. So you can throw the query and see what you get back at this. Okay. So you can check to see whether your, your application matches this. It runs in your browser, runs in time JavaScript. Okay. Don't cheat. Or you're going to shift away. Okay. Next class to his locking isolation levels. Okay. See you guys.