 Alright, where does Bonn? Let's just start it. Alright, real quick. So, any questions about the project that's due today? In 12 hours? None, because everyone's off going doing it. Okay. So, also real quick, a reminder today, at 4.30 we have a friend from Battery Ventures giving a talk in Gates, and then tomorrow at noon is one of the time series lectures. We have somebody from KDB coming to give a talk. Alright, so, just, we're now sort of entering a new chapter in the semester, where now we're going to start talking about concurrency control and allowing multiple queries or transactions, and I'll define what a transaction is in a second, to run at the same time and modify the database at the same time. So, hopefully you can get the impression of how excited I am, because this is my favorite lecture, because this is what, in my opinion, this is one of those fascinating things that database systems can do, sort of this magic of allowing things to run at the same time inside of our system, and sort of have almost the illusion that they're running on a database by themselves. So, to sort of motivate what we're talking about today, and actually for a lot of the stuff for the rest of the semester, we want to pose two sort of simple scenarios. So, the first one is that you and I are going to query at the same time on the same database, exact same moment, and we want to access the same record, we want to modify it at the same time. So, the question is, what should actually happen here? How do we avoid one right incorrectly overriding another right, or maybe seeing a half of, if I update ten attributes, and you update to the same ten attributes, we don't want to see my five and your five, right, we want to avoid all of these problems. Another scenario would be if you have, say, a bank account and you want to transfer money from one account to another, let's say that you start to transfer and take the money out of the first account, and then all of a sudden there's a power failure, and you don't actually complete the process of putting in the money in the other account. So, when we turn the system back on and we recover, what should be the correct state of the database? We don't want to lose money, obviously, because that would be really bad. So, these two problems are roughly called the, you know, loss update problem and the durability problem, and the solution we're going to have inside our database minutes system to solve both of these are either the concurrently show protocol, which is what we'll talk about today, or the recovery protocol, which we'll talk about in a couple weeks. But the thing I want to impress upon you is that these things are not completely separate and they're actually closely intertwined, and this is sort of, again, go back to the question somebody asked earlier in the semester, why can't I just use ROXDB to maintain my log and do whatever else I want in the system? And this is because there's, the concurrently show needs to know what the recovery system is doing, and the recovery system needs to know what the concurrently show is doing in order to make sure that if you have both of these scenarios that we don't lose data, we don't end up with incorrect data and everything sort of goes along correctly. So, the concurrently show and recovery mechanisms are one of the two key valuable, two key contributions that you get from using a database management system to manage your data versus just writing your own files out the disk and hoping that everything goes okay. And the key concept that's going to sort of encompass all of this, or sort of the key principle that we're going to have to be able to understand and reason about whether our database system is running correctly is this notion of a transaction. And this is a term that's often used in computer science to mean different things, the bank transaction, adding things to the shopping cart and things like that. But in the context that we're going to talk about, it has a very specific definition that we need to understand in order to understand how we're going to build our concurrently show protocol or mechanism and our recovery system. So, in the context of a database management system, and I'll say this doesn't have to be a relational database system to have these types of transactions. You could have a NoSQL or a document database system, the transaction essentially would be the same. But for our purposes, we're focusing on relational databases, so we'll use that as our running example. But our definition of a transaction is that it's going to be this sequence of operations that we're going to execute inside of our database system on behalf of the application to perform some higher level function. So, the higher level function I mean, think of the things that your application could do. Using Amazon as an example, you can add something to your cart, you can make a purchase, you can update your account information. All those things are considered the high level function. And the way the application is going to invoke them or actually have them perform that function in our database is through this concept of a transaction. Now, I will say also in some systems like Amazon, for example, adding something to the shopping cart is not a transaction, but for our purposes, we'll just assume that it is. And so, the basic unit of change that the database management system is going to deal with that is going to expose to you as the application programmer is this concept of a transaction. And a transaction, as we'll also see in a second, is defined by the operations that the application wants to perform in that transaction. And it's the only thing that the database system will save. You have to have all the operations in your transaction, you can't have some of them or half of them. Now, you can have a transaction that has one query, that's sort of trivial, but if you have multiple queries and I can't have some of them work and some of them not work, everything has to be able to get saved and run correctly in order for my system to go from a correct state to another correct state. So, let's go to this one example that I showed in the beginning, say that I have a gambling problem and I want to take $100 out of my bank account and transfer it to my bookies account. So, the high level function of our transaction is what I've written in English there, but the steps that we're going to have in our transaction are the following. First, you have to take the $100 out of my account, you have to check to see why I have $100, and then if I do, then you go ahead and take the $100 out, then you put it into the bookies account. And this sort of goes back to what I was saying in the last slide about the transaction was always going to be the atomic unit of changes in our database because for either of these three steps in my transaction, I don't want some of them to succeed and not others. I don't want to take money out of my account and then not put into the bookies account because if I crash, then that $100 just sort of disappears into the ether and that's bad because now the bank is losing money, customers complain and try to figure out what's going on. So, we're going to talk about how we're actually going to be able to support this in our database system over the next couple of lectures. And this is a simple case. I'm only talking about one transaction running at a time, but now when things get really tricky, now you may have multiple transactions running simultaneously. But in this case here, assume we have one transaction at a time. Let's sort of propose a straw man system that could actually solve this problem, right? It'll make sure that everything's atomic and it'll make sure that all our changes are saved. So, what we're going to do is we're going to have a database maintenance system where transaction requests will arrive one at a time and there'll be a single thread in our system. It'll take the queue of the transactions as they arrive in the system, executes them from beginning to end, it makes all the changes they want to make, and then go ahead and completes the transaction and saves it. And what they're going to do is that thread is every single time you're going to start a new transaction, you're going to have some database file and you're just going to make a complete copy of it, put it into another location on disk, and then have your transaction make all its changes to that copied file. And then if the transaction commits successfully, complete successfully, then you just overwrite the original file with the modified version, right? And then if the transaction fails or we lose power, or it's not a big deal because when we come back, we would see we have the original file that was unmodified and we have our modified file that has some of the changes of our transaction, but not all of them. So, we just throw away that modified one and we go back to our original file and now we know that we don't have any partial transactions. So, why not that this would be a good idea? Yes. So, his statement is that it's guaranteed to be atomic because you're a pointer swing at the end. There's a master pointer that says where's my correct version of the database? It's either the original one or the new one that the transaction saved, yes? What's that? Suppose you're Amazon, meaning what? Right, it's actually, this is trivial to do if your database is like, you know, four kilobytes, but if there's anything larger than that, then you're copying a huge file every single time. So, this seems kind of like simplistic, but this is actually used in a very, very well-known database system. Can anybody guess what system uses this? Nobody. He says my SQL because they're different. No, not my SQL. SQLite actually does this. Now, this is actually the older version of SQLite. In the last couple of years, there's been a lot of log, but there is actually still the ability to use this particular version or this type of system in SQLite now. And again, this works really well for embedded systems or small applications, small databases. Database is not going to be real big. I copy it, make all my changes, and go ahead and commit. And even though I'm running at a single thread, that's fine because I'm not trying to support the current users trying to access my database. SQLite is running inside of your cell phone and things like that. I will say also too, we'll talk about maybe later in the semester, this is sort of what actually IBM did in the very first version of system R. They actually used a technique called shadowpaging, which is sort of like this, but instead of copying the whole file, you just copy the pages you actually need to modify. And then you flip a pointer at the end to say here's the correct version. This is pretty simplistic, but actually it'll serve our purpose. But what we're going to talk about in the next couple of classes is how to allow multiple transactions to run at the same time and read and write to the database at the same time without having to make an entire copy of the database file for every single time. And it's sort of obvious why you'd want to do this because we already sort of covered this when we talked about parallel execution. We're going to get better utilization and throughput because now threads aren't going to get blocked with each other, waiting in a single queue and this is going to reduce the response time for our users. The time it takes when they submit a transaction from the time it takes it finishes and you get the acknowledgement back from the database system will be much less in this environment. But there's a whole bunch of other stuff we have to deal with that is actually very, very hard in order to make this parallel execution stuff work. So when we talked about parallel execution before I was completely hand wavy or I completely ignored the issue of queries or transactions updating the same record at the same time. We already talked about read-only queries just scanning large segments of the table. So who cares if they interfere with each other because they're both reading and that doesn't cause a problem. But now if we want to have transactions modifying things we have to care about correctness because it's not good if we start losing money or losing records or corrupting data. But then there's another issue also too of fairness which we'll talk a little about when we talk about two-phase locking and other concurrency protocol. But ideally you want transactions that get submitted to the system, they all have a fair chance to actually be able to run. You don't want one client's transaction to always finish and everything else to always get killed and blocked. So what I'll say though that both of these things are actually very, very hard. And it's hard to ensure correctness because again if you have threads or transactions trying to access the same records and modify at the same time then how do we make sure that they don't interfere with each other. And it's also executed very quickly because if you take a really simplistic view of like having a single thread execute every transaction one after another for your entire database then it's going to be slow. Now again I will say there's some database systems in particular like BoltDB is a system I helped create at least the earlier version of it when I was in grad school. They actually run with single threads but they run single threads on multiple cores so it's not like a single thread machine. And this allows you to do some of the things like avoid some of the problems I will talk about as we go along but in general doing this for a general purpose workload or for any application doing both of these things is hard. And this is part of the reason why the NoSQL system guys became in vogue a few years ago because they got much better performance than the traditional transactional database systems because they didn't do any of the things that we're talking about here today because in their environment for their applications that they were targeting they decided that was not necessary. Like again using the shopping cart example like if I add some to my shopping cart it doesn't maybe it doesn't matter if it's a few milliseconds delayed from showing up on another machine or some other things like that. Or if you have two people buy the same thing at the same time rather than making sure that you only sell the items that you have and they might just do things like if two people try to buy the same item and I run out of it I'll just issue a coupon to the one person that didn't get it and that's better than having to maintain transactions and spend more money on hardware. Again we'll cover that as we go along. So his statement is that at some point in Black Friday they sent you underwear and socks or where they send you socks. They sent you a box of underwear because of their transaction failure. That might not be the database that might be other things but we can talk about that more about when we talk about distributed systems. So again I just want to press upon you that doing transactions very fast and doing transactions correctly is very hard but that's why they pay database to develop a lot of money to do these things. So with that in mind let's keep going. Okay, so the issues that we're at the deal with when we have transactions trying to modify the data at the same time are the following. So we know that we want to allow transactions that show up at the same time that show up at different queries. We want to interleave their operations in a way that's going to allow us to maximize the parallelism in our system. But the issue is going to be is that if we interleave transactions in incorrect ways then we can end up with inconsistent data. Meaning there may be data that appears that shouldn't actually be there or there may be negative values that shouldn't be there, things like that. So in this environment as we go along there will be cases where there will be temporary inconsistencies where the database is in a invalid state and all transactions are running. But that's okay and it's unavoidable just as long as that when the transaction commits we don't these things get resolved. What we don't want to have is these permanent inconsistencies where we issue we allow you to reserve a seat on a plane that's not actually there or buy something that's not actually there. So to understand these inconsistencies, to understand these issues we're going to have a sort of a formal definition of what it means for a database system or database to be correct when you execute transactions on them. And this is going to look a lot different than if you take in like any sort of parallel programming course and we talk about concurrency or linearizability this is going to look a lot a lot different. Okay so in our environment what's going to happen is that we're going to transactions are going to carry out one or more operations but the database management system is only in charge of scheduling operations in these transactions that can read and write data inside the database. So that's the only sort of purview of control that it has on your reads and writes of your database. So that means like say you have a transaction that does a bunch of reads then the bunch of writes and then in the middle of that transaction it decides to send a confirmation email to the customer to say yes your order succeeded but then that transaction fails and you need to roll back all the writes that email that it sent out is outside the control of the database system it's like consider the external world. So it can't retract that email right so it's very important to understand that when we say a transaction the scope of what we're dealing with is only inside the database anything that goes outside the database system which you could do when we talked about UDFs right we talked about that you know you could have your UDF C and then you could call whatever outside system you want to do whatever you want the database system can't control and roll back those things and it can't make sure those changes outside of itself are failsafe. So we're only talking about things are inside internally in the database system and so for as we go along in our examples we're going to say our database is going to be defined as a fixed set of named objects they can be tuples they can be rows or sorry they can be tuples they can be pages they can be individual attributes they can be entire databases for all the protocols we'll talk about we're not going to care and the other thing I'll talk about too is that we're saying that this is all for now we're just assuming that it's fixed meaning when you start the system and you start running your transactions you have the same number of elements as before as you do after we'll relax this in in two lectures when we talk about other things because this makes it harder for now we'll just assume we have fixed number of elements and then our transactions are going to be only defined on or defined using the read and write operations so we don't care about whether it's an update if you update some attributes or all the attributes we just know we're writing this object so this is the only thing the database actually can see in terms of the transaction on what the application is actually trying to do it doesn't know that you've read this object and then you did some funky math or you ran it through some Python package all that is external and it can't control that so in SQL the way we're going to start a transaction is through the begin statement and then a transaction will finish or terminate either with a commit or a abort statement or sometimes also I think SQL standard says rollback is an alias for abort so when a transaction commits all the changes that it made with those write operations that I defined in the last slide all those things will now be saved into the database and we'll say that when it's saved it's actually permanent it's forever persistent or durable and then you as the application programmer you do not get an acknowledgement from the database system that your transaction successfully committed until all the changes have been applied to the database and are now persistent if there's an abort command all the changes that you make will get rolled back and it'll be as if the transaction never ran at all and so now one of the key things I understand about this is that an abort can either be self-inflicted or forced upon you by the database system so what I mean by that is I start a transaction in my terminal and I make a bunch of changes and then I call rollback or abort that's self-inflicted, that's me telling rollback my transaction but it may be the case also too that the database system that the application tried to commit says I made all my changes and now I want to commit and the database system says I can't let you do that because it's going to interfere with some other transaction and it'll violate the guarantees I need to ensure so you can't commit I'm going to kill you and you have to rollback and you'll see this in the third project when you implement two-phase locking when a transaction tries to acquire a lock and it can't because somebody else holds it the database system can say no, no, you can't you can't keep going, kill yourself and rollback right? so again this will come up later on when we talk about the different conventional protocols but an abort can be either because the application wants to abort or the database system forces them to abort so now the correctness criteria we're going to have is defined by this acronym ACID as a quick show of hands where has heard ACID before? okay, so most of you so ACID was defined by it came out with in the late, sorry early 1980s by a researcher in Germany and it stands for atomicity, consistency, isolation and durability and so if you say you're a transactional database system or ACID-based database system you want to ensure that your transactions have all of these three properties and as far as I can tell the lore goes the way the lore goes for this this acronym, the guy that came up with this was trying to make fun of his wife because his wife didn't like sweet candy, she only like sour things so he came up with the term ACID to name it after her but in order to make it work he kind of had the fourth consistency in there to get a C and as we see when we talk about it in the next couple of slides consistency is this really nebulous thing it doesn't have an exact definition like the other ones do but you know people always include it and we'll talk about it as it comes along alright so for atomicity it means that all transactions, all actions are operations that transaction will happen or none of them happen and this goes back to the no partial transactions requirement we had before consistency means that if every transaction is consistent which I'll define in a bit and the database starts out being consistent then if I execute the transaction and it commits then the end result of the database has to be consistent so if I go from a consistent database with a consistent transaction I should end up with a consistent database and again I'll explain what consistent means in a bit and then isolation is the one we'll spend most of our time talking about today but this means that if the transaction will execute inside of our database system under the illusion that it has exclusive access to the database so that means if other transactions are at the same time my transaction can't see any of their changes I only see the data as if it was as if I was the only one running and durability is sort of obvious it basically means that if a transaction commits and we get back the acknowledgement to our application that a transaction successfully committed then all the changes or modifications we made in our transaction will always be processed forever and any transaction that comes along should be able to see our changes and no matter the time we restart and crash all our changes are durable so a shorthand way to think about all these is that animosity means basically all or nothing for operations in our transaction consistency means it looks correct to me that's good enough isolation means if the transaction is running as if it was alone by itself and the durability means that we can survive failures so for today's agenda we're going to go through each of these one by one and we'll spend most of our time talking about isolation because this is where we actually understand the transaction schedules and things like that for durability we'll talk more about logging later in a few weeks and then on so most of the time we're talking about animosity and isolation and then when we start talking about concurrency control protocols like two phase locking time stamp ordering all the index stuff that are mostly focusing on animosity and isolation so as I said earlier there's basically two possible outcomes when you execute transaction and it's either that the transaction commits after all its changes have been applied to the database everything was successful or the transaction aborts either because the application said to abort or the database system says you have to abort and then after executing some number of transactions or operations and then the database system is going to ensure that all the changes that the transaction successfully committed transaction makes are atomic, meaning they're all sort of appear inside the database all at once and so again from an application program to stand a point this means basically that all the operations we do in a transaction will always finish if our transaction commits or none of them will finish any partial modifications so the way we're going to ensure this is through possibly two approaches and we'll go back to that example that we had before where I was taking money out of my account and putting into my bookies account but let's say that before again we put into the bookies account there's a power failure and we crash and we need to come back and we need to make sure that everything's always in the correct state so the two ways to do this are logging the way basically think about this is that every single time we make a modification to the database we're going to write into a log file and say here's the change that I made to this particular tuple and the contents of what we write in a log can vary based on the implementation it could just be the sequel statement it could be the actual low level bytes of the things that was modified but sort of it's a record of the changes that were made by those changes to the log before we apply them to our database so that way if we crash we come back and we look in our log and we say well what was going on at the time of the crash and check to see whether in our actual heap files in our table whether those changes actually should be there or not so you can look in the log and you say well here's a transaction that was running and then the log stops because there's a crash and you can't see a commit message for any of those transactions so you know you need to go back and roll back their changes so that none of them are there or if you do see a commit message then you know the transaction is successfully committed so now you need to go back to the table heap itself and check to make sure that those changes are persisted so again you think of this sort of like a black box in the airplane it's a bit of morbid metaphor but the plane crashes everyone dies but you want to know that caused it to crash so you get the black box you look in the log and say it went down so logging is the most commonly used technique to ensure atomicity and it's not just in database systems it's used in a ton of different systems like file systems do stuff like this and it has two benefits one is that we can turn what would normally be random writes to disk pages into sequential writes to a single log which is much faster if you're using a spinning disk hard drive and it's also used as an audit trail for compliancy or regulatory purposes if you want to know who transferred money to who in your bank account you can look at the database log and look at all those operations it's very common I think in some enterprises it requires you to have seven years of audit trail history for your business if you're doing financial transactions and so you can use the log in the database system to do this for you the other approach is actually what I alluded to before when we talked about one of the ways you can run SQLite this idea called shadow paging and the idea is again basically if the transaction wants to make a change to a tuple that tuple is going to be in a page you can leave that page first then make all your changes to that page and then when the transaction commits you just flip a pointer to say here's the correct version of this page and this ensures that when you crash, come back if the transaction didn't finish then you know you shouldn't be you don't even bother with the modified pages you only have to look at the original pages or if the transaction did finish then it's sort of a tautology because if the transaction committed the correct pages are the correct pages so again this was originally used by IBM in the 1970s they eventually banned this because it has a bunch of implementation issues that are really tricky to get right and there was other aspects of the system that were difficult to use with this so they eventually didn't do this eventually got rid of it and DB2 doesn't do this at all nowadays as far as I know the only two systems that actually do this approach is the LMDB system from the OpenLDAP guys and then CalcDB does this as well it's an interesting idea we actually tried this to do this in non-volatile memory thinking an old idea could be used with new hardware and didn't actually turn out using the right-of-head log and the in-place updates was always faster but for LMDB we had a lot of different problems than a general-purpose database so that's everything about atomicity all the changes that occur in the transaction are none of them change and the way the data system is going to enforce this is either through logging or through shadowpaging but actually now you can kind of see how these things are intertwined because the logging and shadowpaging is also how you can ensure durability but this is how we're also going to ensure atomicity yes this question is if there's a crash what do I do so what you do is you come back and you immediately go to the log and you say what transactions were running at the time that I crashed right you actually do multiple passes we'll cover that later but basically you look at the log say what transactions were running at the time I crashed and did they commit if they didn't commit then you want to go into the database pages and make sure that their changes are not actually there right because what could happen is you could make a change to a page in memory and the buffer manager flushes that out but the transaction hasn't committed yet but it made it to the log so you want to go back and make sure that you reverse that change or if it was in the buffer pool and didn't get flushed at disk yet then you want to make sure that you go actually make those changes if the transaction did commit so the log as we'll see later will actually make sure you do undo and redo anything that committed it makes it to the disk and then if it didn't commit it it always gets reversed so excellent the next question is what if I'm recovering from the log and I crash again so we'll have a whole lecture on how to do log recovery with Aries A-R-I-E-S which is from IBM guys in the 1990s they have a whole mechanism to deal with crashing during recovery you basically put these compensating log records to say I recovered and I made this change and you write that to the log as well and when you crash you come back and you figure out did I crash from a recovery or did I crash from action run time so this whole you learn to hate this yes so in shadowpaging you don't have to do this because you just come back and immediately the database is correct right because the pointer only points to the things that actually got committed and logging it's a bit more tricky but in general this is this gives you better performance ok so now we need to talk about consistency so again this is very nebulous I think it's easier to understand the context of a distributed database but for now we'll only focus on a single no database and we'll sort of go by the textbook definition of this so the way to sort of think about consistency is that the world represented by the data in your database is correct and I'm not going to define what correct means because it depends on the application depends on humans understanding or having a notion of what correctness is so the two sort of sub-albums of consistency are database consistency and transaction consistency so database consistency is something that the database can actually enforce and again basically means that the database managed by the database system accurately represents whatever the real world entity that they're trying to model so think of it like this so if my database is capturing temperatures of buildings on campus I could have a constraint that says no temperature could be greater than a million because otherwise everything would be on fire so the database system will ensure and enforce that so it's data that violates that constraint so that's what it means to be correct and the reason why I mean hand-babe about this is because if you don't write a constraint that says no temperature can be above a million then the database system can't know that can't happen so we can't enforce that right another way to think about this also too and this will come up more when we talk about distributed databases any transactions that happen in the future will always be able to see the effects of transactions that committed in the past within our database right and this sort of seems obvious when you're on a single node because transaction commits made it out of disk, I know it's all adorable and then I come back and of course everything's going to be there the distributed databases it's much different sorry so database basically means any data we put in we'll actually model the world or entity outside of it that is trying to model but it's only as good as used application programmer can tell the database system what to model transaction is more vague and it basically means that if the database starts out in a consistent state before your transaction runs and you assume your transaction is running by itself then the database will be consistent after the transaction completes and of course now this again relies on you as the application programmer to make sure you write correct transactions because otherwise if you don't do that and you put the database in a inconsistent state the data system doesn't know that because it just did whatever you told it to do so again using that example of the temperature if I insert something that's a million degrees and I do that in my transaction the data system doesn't know that it shouldn't have done that that's not actually modeling the real world but the transaction told it to do it so it just went ahead and did it so there's no way the data system can enforce this at all because it's up to you as the application programmer to write transactions that do the correct thing so this is sort of what I was saying before that the they throw the C in there in acid to have a nice acronym but in practice the C part is kind of weird and that's pretty much all I have to say about this right? it looks correct to you, it looks correct to me it's consistent that's it so now we get to a good one isolation so isolation again means that if a transaction execute in our database system then we want it to have the illusion that it's running with exclusive access to the database and nobody else is running at the same time that basically means that it doesn't see the effects made by transactions that haven't completed yet or committed yet and actually that's actually incorrect too it's running as if all the transactions were running in serial order one after another and it doesn't see any I won't say inconsistent it doesn't see any sort of halfway chaff changes or partial changes or things that may be committed after it actually already started only things see things as if transactions were running in serial order so the the problem we're going to have with though as I said earlier is that we don't want to execute transactions in serial order because we want to maximize the performance of our system and maximize parallelism so we want to allow these interleaving operations in our transactions because it may be the case that one transaction has to access data on disk and therefore it has to stall while the buffer pool manager goes and fetches the page brings it into memory and while it's stall we may want to allow other transactions to keep on running and still make forward progress but we want to make sure that again, even though they're running in parallel and we're interleaving the operations in arbitrary ways, we want to make sure that it still looks as if they're running in isolation with each other so the way we're going to achieve this is through the database metabasystems concurrently to a protocol so the concurrently to a protocol the definition I like to use is essentially like the traffic cop in the database system and that's going to decide what operations are allowed to execute and how to interleave them and then the end result of the database of this interleaving the end state should be equivalent or the same as if the transactions were executed in serial order even though they didn't now there's two classes of protocols we're going to want to follow or there's essentially two classes of categories of constitutive protocols we can have in our system one is pessimistic and the second one is optimistic so pessimistic is where you assume that transactions are going to conflict and therefore you do you apply protections to make sure that transactions can't do the things that would cause problems you sort of prevent them ahead of time and optimistic is where you see conflicts are going to be rare less likely the transactions are going to interfere with each other and therefore you just let them do whatever they want and then when they actually go to commit that's when you actually go and figure out whether there was any conflicts so as far as I can tell these categories aren't something I came up with as far as I can tell in the academic literature these are the only two classes of concurrential protocols you can have optimistic and pessimistic if anybody tells you there's another one or they have some protocol that doesn't fit these two they probably don't know what they're talking about I've had people come up to me and say pessimistic is not optimistic and I'm like he's wrong it's one of these this theory goes back until early 1980s when they really started figuring out these different types of concurrential protocols so as a preview we're going to talk about in the next couple of lectures two phase locking is an example of a pessimistic protocol and then time stamp ordering is an optimistic protocol so we'll have separate lectures for each of these two classes we don't care about this we just know that we can have these mechanisms to ensure all the things that we'll talk about in the upcoming slides how to ensure isolation so let's look at some examples of where some problems can arise and then this will help us understand how these concurrential protocols can prevent us from having these problems so let's say we do we have two transactions in our system and it's a simple banking application and the first application wants to transfer $100 out of B's account and put it into A's account and then the second transaction wants to credit both of the accounts with 6% interest so transaction T1 takes 100 bucks out of B and puts it into A and then the other one computes 6% interest on both and I assume that both of these transactions or both of these accounts A and B have $1000 to start with so the question is now what are the legal outcomes of our database of interleaving these operations of T1 and T2 in our database in arbitrary ways well there's actually many possible outcomes but the key thing to point out though is that at the end of the day we always want A plus B to equal this value here because there always should be $2000 in our system and at the end when we compute interest on these it should always add up to this amount here now the issue we're going to deal with in our system is that there's no guarantee that the database system is going to run these exactly one after another assume that in our application our transactions submit at exactly the same time and the database can then decide how to interleave them in a possible way but we want the end effect of our database, we want the state of the database to be as if the transaction is executed in serial order because that's going to guarantee that we don't lose money we always end up with this account here right so at the end our only two possible legal outcomes in order to find what legal means in a second are either A equals B equals this or A and B equals that and these two together you end up with 21-20 which is what you would get if you execute these transactions in serial order right so now where things get tricky is that now when we start interleaving these things we want to make sure that we end up with that state like that so another way to look at it is sort of these transaction schedules like this and the way to sort of read this is that going from the top to the bottom we're going elapsed in time we have a single thread in a single socket and our database system can only execute one operation at a time right so in the first schedule here we execute transaction one we take the money or add money to A subtract it from B transaction two you then compute the interest so T1 goes first followed by T2 or the other one T2 goes first followed by T1 right and again the key issue here is that at the bottom the values of A and B are different the when you add the numbers together they are consistent they're correct, they're equivalent right so but now if you want to interleave these operations in different order we want to do this because we want to make sure that we maximize the parallelism in our system as I gave the example before it may be the case that one of these transactions has to touch data that's on disk and it'll get stalled and we want to allow the other transaction to keep on running and still make forward progress right and it's even worse if you have to go over the network to get data from another node because now you're dealing with the speed of light issues and that's much slower because it's farther away alright so here's an example of a good interleaving right so in the first case here T1 starts we add $100 to A then we have a context switch and T2 starts running and it computes the interest on A then we switch back over to T1 then we take the $100 out then we compute the interest on B right and so the key thing to point out here is that in this modification for B we always make sure we did it in the correct order right so we did T1 modified A and then T2 modified A and then T1 modified B and then T2 modified B and then that ensures that the end state of the database is equivalent to one where we executed them in serial order so that's an example of a good interleaving here's a bad one right we add $100 to A then we compute the interest on A then we compute the interest on B and then we take the $100 out of B right in this case here now our final state of the database the values for A and B are not equivalent to one where we executed them in serial order so in this case here the bank ended up losing $6 yes the end state interleaving is good it can map to any one possible consistent state yes so his statement is his question is I'm saying this interleaving is good because the end state of the database is equivalent to one where it's equivalent to where we executed it in serial order in any serial order so that's a very important difference between currency maybe like multi-threaded programs right in this case here either state is actually correct because if I submit these transactions exactly the same time it's up to the database system to decide whether one executes for the other and either one is actually correct if you cared about one transaction can only run after the other one then from the database system's perspective we just say well if you really cared about that SQT1 first wait until it comes back then execute T2 that's actually called external consistency as far as I know the only database that actually enforces that is Google Spanner like if you submit transaction T1 in wall clock time like the wall right there that clock right there if I submit T1 first and then maybe a few milliseconds I submit T2 they'll ensure that T1 executes followed by T2 and then you end up with the correct outcome of the database equivalent to that serial ordering when we talk about serializability which next couple slides the correctness means I don't care what serial order as long as there's any one of them and this is why transaction correctness is different than programming correctness so in this case here in our example we ended up losing $6 because the bank lost $6 because it computed interest on B and credited their account when they should have subtracted $100 first at this point in the hit system when this transaction ran there was an additional $100 in the system that shouldn't have been there and it made money out of magic that shouldn't have been there yes if I don't have a preference I can submit T1 and then immediately submit T2 so if you require T1 to execute before T2 under what we're talking about here is serializability under serializability you would have to wait until you get the acknowledgement back from T1 before you submit T2 because the data system can decide to interleave these things in any way and pick any of the possible serial orderings as the final outcome now maybe if it's a really short transaction we're essentially talking about milliseconds here so it's not like I'm saying wait an hour 48 wait an hour before we go swimming wait you just wait to get the acknowledgement back then you can submit it there are notions of transaction priorities where you can say that this thing should run before this one or give it more resources or give it higher priority that's outside the scope of this for now the key thing I want you to understand is that I submit these exactly the same time or maybe one little bit of that before the other and in either case the data system can decide to execute in any way that it wants if you truly cared then you'd wait until T1 comes back and then submit T2 okay alright let's look at another bad example here actually sorry this is the same example we had before and again the key thing to point out here is that the database system doesn't actually see these these changes doesn't see a equals a plus 100 you know a equals b minus 100 it just sees these low level read and write operations so the the database system doesn't understand the semantics about what your application is trying to do because it doesn't understand sort of correctness it just knows that you were asked to do certain things and it in terms of low level reads and writes it doesn't do them alright so we when we're going to worry about correctness and ordering of things we're not dealing with sort of things like this we're dealing with things like this alright so now the first so the first question we have is we sort of have this intuition about alright well yeah maybe the ordering of operations need to occur in the right direction but we want to be able to now figure out like how do we actually can be formally define what it means for a schedule to be correct and produce the answer that we're looking for and so I sort of said some of these words before but in our world here we're going to say that a schedule will be correct if it is equivalent and I'll define what equivalent means to some serial execution and I've sort of talked about what serial execution is again it's when the transactions are executed in serial order one after another so again a serial schedule is just that there's no interleaving at any operations or any actions from transactions running at the same time and then we say two schedules will be equivalent if the end state of the database is exactly the same as one where even though for one schedule the transactions are interleaved it produces a state of the database that's equivalent to one where the transactions were not interleaved and again it doesn't matter what the high level operations are that you're doing in your transaction it's just for object A or object B do they have the exact same value from one schedule to another and so now what I've been describing in terms of these the schedule of these arbitrary interleavings that are equivalent to these serial ordering or the serial schedules these are referred to as serializable schedules so serializable schedule is one that's equivalent to some serial execution of the transactions that we want to execute and again it can be any serial ordering it doesn't have to be one in particular so now we can say that if our transaction preserves consistency meaning if it's going to execute and put our database in a correct state then we know that every serializable schedule will also preserve consistency because the end state of the database again will be equivalent to one that was produced by a serial ordering of our transactions so related to what he asked in the back again is this less intuitive notion of correctness that you may be familiar with in other programming environments and the reason why database management systems are allowed to do this or why we want to do this is that because we don't always have to enforce to one exact serial ordering we can choose any serial ordering of our final database state this is going to allow us to make better decisions and have more flexibility in how we choose to interleave our operations so that we can maximize performance and get better parallelism so if you had to pick one serial ordering as your final output then there's only so many ways you can interleave the operation to produce that state but if I allow for any possible serial ordering if I can target any possible outcome based on any serial ordering then I can choose way more number of arbitrary interleavings so another key thing that this is sort of not obvious here too is that the this is not something that how to say this for now we're going to assume that we have all our transactions that we want to execute in our schedule all at once and we can understand the correctness of reasoning within those transactions but in reality in a real system you're not given all these transactions all at the same time things are showing up and arriving at different times and you may not know exactly what's going to come later so you don't know how to exactly target certain serial ordering this is sort of saying that if soon I have a finite number of transactions that I want to execute how can I produce an interleaving that generates a serializable schedule we'll talk about how to do dynamic stuff when we talk about the current year protocols in next class alright so now that we know that what it means for a schedule to be equivalent to a serial ordering and therefore it's a serializable schedule now we're going to understand a bit more about in terms of correctness how can we identify that a schedule is serializable how can we identify that the state of the database that it will produce when I interleave my operations in this way will be equivalent to a serial ordering so to do this what we're going to do is we're now going to define our correctness in the base of the term of conflicts so we'll define a conflict as being one where you have two transactions that are different and they're going to do some operation on the same object where at least one of them is going to be a right to that object right and so the three classes of conflicts are going to be rewrite conflicts and read conflicts and write write conflicts why no reread conflicts it doesn't matter right who cares right so we'll go through each of these one by one and also describe what the parlance or the vernacular way of describing what these conflicts are so the first one is a read write conflicts so these in the literature will be referred to as unrepeatable reads and basically this means that if I read something once and I go back and try to read it again let's say again I have two transactions here the first one T1 wants to start it wants to read some object A it gets back $10 then transaction T2 starts it reads on A it gets $10 but then it writes back $19 so now when transaction T1 wants to read A again it gets back 19 which is a different value than it had the first time so this is an unrepeatable read you're not able to read the same value from the same object within the same transaction this shouldn't happen because T1 is assuming that it's running by itself so it shouldn't see any effects of T2 so this is a read write conflict in this case here this would violate our serializability guarantee that we want to ensure next is a write read conflict again in the parlance this would be called reading uncommitted data or also dirty reads and the way to think about this is that it was modified from another transaction that you shouldn't have because it hasn't committed yet so T1 starts it reads $10 then it writes back $12 to A but now when T2 starts it reads A it gets back $12 which it shouldn't have because it shouldn't see the effects of T1 because T1 hasn't committed yet sort of happening in the middle of it so then it writes back $14 and then now the issue is that my transaction aborts T1 gets aborted for whatever reason so now the issue is that I've already committed T2 but it read data that was not committed and then also assume that after it read that there was an if cause it says if value equals 12 then write 14 so it decided to make some action based on the data that it read but it shouldn't have read that data because it's written by T1 and T1 didn't commit so now this case here T1 aborts and we have to go back and kill T2 but it already committed, we already told the application that it finished so this is bad because now we're sort of leaking incorrect data to the outside world so this is a conflict and we don't want this to happen and the last one is a write-write conflict and this is where we have we're writing over uncommitted data so I write A in my first transaction and it targets, puts $10 in and then T2 overwrites it then they write B add to Andy and then I write to Justin Bieber here so now when I commit A contains 19 but B contains Bieber and that's a conflict that we're going to see reordering it should either be $10 Bieber or $19 Andy it can't be any interleaving of the two so now based on these three types of conflicts now we can talk about in a bit more detail what it means for something to be serializable so strap yourself in because this can be a bit rough so there's essentially two serializable you can have you can have what's called conflict serializability and what's called view serializability so as sort of a spoiler what I'll say is anytime a data system says that they do serializable transactions they're supporting conflict serializable and to the best of my knowledge nobody actually does view serializable because you have to understand what the application actually wants to do in order to be able to view serializability so the way to sort of think about this is that conflict serializable is more restrictive but it's we can actually do this and implement this in our database system view serializability will allow for additional types of parallelism but requires you to know what the application wants to do so nobody can actually do this so again the way to when someone says they do serializable transactions they really mean conflict serializable okay so the definition of conflict serializable means that we'll say two schedules will be conflict serializable if they're going to involve the exact same operations or actions in the exact same transactions and then all the pairs of conflicting transactions will always be ordered in the same way for the two different schedules and so now we can say that a new schedule S will be conflict serializable if that it is conflict equivalent meaning the conflicting actions are always ordered in the same direction from a serial ordering so the way to sort of think about this at a high level is that if you can take your schedule and you can start swapping the operations in a certain order to produce a serial schedule and you can always swap them in the right direction then it's always going to then you know it's conflict equivalent to a serial ordering and again that's probably also very vague so let's go through an example and hopefully this will make sense so let's say I have our schedule here I have transaction T1 and T2 wants to do a read on A write on A, read on B, write on B so what we want to do is we want to take different conflicting operations and we want to be able to to flip them in sorry take non-conflicting operations meaning they're doing writes on different objects or doing reads on different objects and we will flip their ordering to try to move all the operations from one transaction to the top and all the transactions to one transaction to the bottom until we end up with a serial ordering right so the read on B and the write on A these are different operations they're not conflicting because they're operating on different objects so we can go ahead and swap their order same thing, read on A read on B we can swap them in order write on A, write on B, swap them write on B, write on B, read on A we can swap them and now we end up with a schedule that's equivalent to a serial schedule so in this case here the schedule that we started out with before we actually did the swapping is considered conflict equivalent to this serial schedule here yes so this is a way to determine whether a arbitrary schedule and interleaved operations is conflict equivalent to a serial ordering I did a hand motion in the beginning you can swap these things in this order and that's how we know that it would be equivalent to a serial ordering now I'm giving an exact protocol on how to do this let's look at one where you can't do this so here we have an interleaving read on A, write on A but I had this conflict between a read on A and a write on A and I can't swap their order so this is not conflict equivalent so this ordering here is not conflict equivalent to a serial ordering because I can't transform the schedule to one that's like on the other side so again this is providing you the intuition of what it means for somebody to be conflict equivalent to a serial ordering or not but obviously in a real this would actually be impossible to do because as I said it's not like you always know exactly what all the operations or transactions wants to do at the exact moment you need to figure these things out so we sort of need another way to figure out whether a our schedule would be conflict equivalent to a serial ordering rather than just flipping around their operations and the way we can do this is through dependency graphs so what's going to happen here is that we're going to have a node in a graph that represents every single transaction and then we're going to have an edge from one node to another if there's an operation on the originating node so if we wanted to have an edge from TI to TJ if there's an operation in TI that conflicts with another operation on in TJ and that operation in TI will appear earlier in our schedule than the one in J so sometimes also in the literature this is called a precedence graph you're just trying to figure out what operation from one transaction depends on another operation his statement is this assumes you never branched in the transaction again in this case here we're kind of doing this dynamically on the fly you don't need to know exactly all the transactions you have ahead of time right when we talk about two-phase locking and those protocols they handle all of that, they handle the dynamic environment so to say in this I should never say what I said in this, this is where you know what all the transactions want to do ahead of time and the universe of transactions you're trying to schedule is fixed two-phase locking will handle that in a dynamic case this is again providing you intuition what exactly that means okay so again so what we're going to say is that a schedule would be conflict serializable if we generate a dependency graph and there's no cycles in it because the cycles cause the problems where we can't swap their operations to put it into a serial ordering so let's look at an example, walk through it so this is the same one we have before read on A, write on A, read on B, write on B so we have here the write on A and read on A right, so the write on A appears first before the read on A from T1 to T2 so we have an edge from T1 to T2 in our dependency graph and we'll mark it with A to say what object they were modifying or they had a potential conflict on then we have the read on B, so the write on B in T2 and the read on B in T1 and then we have an ordering there so now we know that we have a cycle and therefore this is not conflict serializable because we can't go back and do that swapping to put us back into the correct state or into a serial ordering right, and the way to sort of think about this the dependency graph is identifying that there's an output of T1 that will depend on that is used in something that T2 will modify and T2 will modify something that T1 will depend on because of that cycle we have to make sure that we actually can't we don't want to interleave our operations in an incorrect manner so let's look at some other problems so now we want to maybe do something more complicated we want to add maybe some arithmetic operation so here we'll do is we have a read on A and then a modification on A there so we know we have a dependency graph edge from T1 to T2 and then we have a write on A and a write on A so we have another one there and we have a cycle so we know that this schedule here is not conflict serializable let's look at a more complicated example let's have three transactions now so we have the write on B and read on B so edge there we have the read on A the write on A and read on A so edge there and so forth keep going and going, right in this case here actually that's all of them write on A, conference of the write, read on A write on B, conference of the read on B and then T2 actually doesn't read A so that doesn't conflict there so in this case here this is actually equivalent to a serial ordering even though in this case the transactions are interleaved I could have it just T2 followed by T1, followed by T3 and that will generate the correct state because in the end I'll see that the final state of B will be modified by T1 and in the final state of A will be modified by T3 so in this case here even though T1 started before T2 in my serial ordering T2 would actually finish first and in this case here everything's fine so let's look at another one so let's say now we're going to do something more complicated let's say that T2 will run and then it's going to compute the sum of all the bank accounts and it wants to print them out to the terminal so what you know is not obviously a command you can do in SQL or in your database system it's just read a way to you could return a value for this transaction to your application so it's going to return back the sum of A and B so the first thing with that we have a conflict between write on A and read on A so we have an edge there and then we have a conflict on read on B and write on B there a cycle and dependency graph so it's not conflict serializable but there's actually a way to modify the transaction a little bit to run the exact same low level operations in our database it's going to produce a different answer in terms of what the application actually wants to generate but it'll run the same low level read and write operations in the database and actually with this exact schedule and it'll still produce a correct answer so instead of actually computing the sum what if I just check to see whether the bank account was negative or not and if it is then sorry as long as I have as long as it's not negative then I'll add to a counter and then I print out the count so it's basically counting up the number of counts that aren't negative and adding one to a counter so in this case here this is the exact same sort of read and write operations but I'll produce an answer that's actually still considered correct so this is an example where I can have an interleaving like this that is not conflict serializable but it still produces the correct answer so this is what's called view serializability yes how do you generate the ordering from the graph this one here yes so like it has another node say t4 like two other nodes t4 and t5 and both inherited from t3 both inherited from t3 like the t3 like there's edges from t3 to t4 and t5 how do you derive the actual schedule for the graph so you're saying if I have this how do I take this and generate a schedule we're not doing that we're given this schedule this is the schedule we have and we want to know whether it's conflict serializable or not we're not generating schedules yet that's what two phase locking does this is just saying if I have a schedule that's actually correct that's it yes so again that's what two phase locking does at purposes right now we're just dealing with the case of I have a schedule tell me whether it's conflict serializable or not that's the answer we're trying to solve and I'm doing this to have you guys understand what it means for something to be conflict serializable so that when we talk about two phase locking and we talk about different variants of it we know that it will guarantee to be generate a serializable schedule because any schedule that it will generate won't have a cycle in its dependency graph right and in actually some in some versions of two phase locking they actually maintain a lock dependency graph and they look for cycles and then you know you have a dead lock and you kill things which is sort of different than this yes this example this is considered this is conflict serializable so yeah so the way to think about don't think about this at run time this is like here's everything that exactly did right all these transactions committed tell me whether I interleaved them in a way that was actually conflict serializable yeah we're not worried about like a dynamic environment we'll talk about in two phase locking okay alright so what I'm going to press upon you on this one again is there's a different way if I don't care about exactly ensuring conflict serializability and I just care about high level semantics of our application that I can use to figure out whether something is actually correct or not is what view serializability is so I'm not going to go through all these examples it basically just says that if the end state of the database is the same as it would have been with a serial ordering regardless of what happens in between while the transactions are running then that's considered okay and again that's a very vague so let's go let's do this example alright so here I have three transactions T1 wants to read on A and then do it right on A T2 just reads on A and T3 reads on A so these are actually what are called blind rights for T2 and T3 because I didn't read A before I actually before I actually modified I just said I'm just going to overwrite whatever is in there I don't care what was there before in the case of T1 it actually reads it first so here now we know that we're going to have problems we'll have a cycle because we have all these conflicts one wants to read A but then it wants to write on A and T2 wants to write on A and then T1 would overwrite it but the key thing though is that the end of the schedule the thing that actually remains in the end is that there's a the final value of of A is whatever T3 wrote to it so who cares what T1 and T2 did to it right at the end of the day it's just T3 wins the write on A is there and that's all it matters so this is actually considered view equivalent to a serial ordering like this right even though I interleaved T1 and T2 in a way that conflict serializability would not allow T3 was the last writer and it wins so that's okay and that's equivalent to this serial ordering it's just this last right here so serializability is going to allow for all schedules that are conflict serializable in addition to these blind writes and some other corner cases right the tricky thing is though and the reason why no database system as far as I know actually can support this is because the database doesn't know that it was okay that these three transactions got mungled up or just got written this actually depends on what the application wants so this will require you to do program analysis in the application code to be able to reason about what the transactions are actually trying to do in terms of reads and writes to make a decision like oh yeah even though this is not conflict serializable I can still allow these interleavings because this blind write will always make sure that everything is always put into correct state right so the other important thing though to point out though is that their conflict serializable and view serializable are actually a bit restrictive and there are schedules that would produce a serializable answer or sorry serializable result or schedule that the protocols we're going to use would actually not allow so in some cases in two-faced locking as we talk about there'll be examples where clearly they say oh well this is okay I don't have to abort this transaction I can let this schedule go but because of the rules that are in place by that protocol it won't allow you actually to do them and you'll have what's called a false abort or your abortive transaction that didn't actually need to be aborted and then again this happens because the database doesn't know what correctness actually truly means in terms of the application it just knows these reads and writes and knows how to keep them ordered correctly so it can't figure these things out and again so this is why most databases will support conflict serializability because you actually can enforce these efficiently without requiring you to know anything about what the application is trying to do and then there's some special cases where you can implement these in the application or handle you as a business organization that are not done by the database system at all so the example he gave before about you know I'm not sure what you're trying to buy but like an airline ticket airline companies always over book planes because they know some percentage of the people aren't going to show up and therefore you don't have a wasted seat you can sell more seats because you know more than two people aren't going to show up and then you're always still running at full capacity and in the cases where they get it wrong they either drag that poor guy off the plane bleeding or they offer you money to get on the next flight because they know in the end they do the math and say well in practice this in the end we make out because with a certain guarantee a certain percentage of people aren't going to show up and therefore we can always over book so that's sort of what I mean by these special cases there's cases where maybe we want to allow for additional concurrency but that's being done at the application level or at the organization level and not within the database system so to wrap this all up the way to sort of think about the universe of schedules is that you have all possible schedules you could have for your transactions right, any possible interleaving and then a small subset of these are the serial orderings where you execute transactions one after another and then around that would be conflicts or like serializable schedules and around that would be view serializable schedules so again anything that's serial is by definition conflict serializable and view serializable anything that's conflicts serializable is by definition view serializable but not necessarily a serial ordering and then when we talk about two phase locking there will be some other sort of subcategories in here that you can have as well alright so to finish up real quickly now go back to that acid thing of going through all the examples for durability we've covered a lot already when we talked about logging and shadowpaging again we'll have a whole lecture about this in a couple weeks but basically again any changes you make to the database you know you want to make sure that everything will always be persisted even after a crash or restart if you ever tell the application your transaction committed now maybe the case that you submit your transaction you say go ahead and commit and then the data system crashes before you get your acknowledgement in that case your transaction didn't actually commit or it may not have committed because you never got back to the acknowledgement soon as the data system tells that tells you your data is durable your transaction is safe then it has to guarantee that everything will always be there and so we'll use logging or shadowpaging to ensure this and we'll talk about how to do that in a couple of classes alright so again acid or these three properties Adamicity, Consistency and Isolation Isolation, Adamicity are the major things we'll talk about current currency control durability we'll talk about and logging recovery and consistently we can talk about when we talk about distributed transactions alright so again to finish up the currency control recovery is probably the most important part of the data system this is a clear example of why you want to use a data system for your application to store your data rather than trying to you roll your own file format or whatever else and the other important thing is that everything we'll talk about when we talk about currency control for the most part this is all going to be automatic meaning when we start taking locks on things for the most part you're not going to tell a data system lock this table lock this record the data system can all figure these things out for you and this sort of frees you up from the application program and not worry about do I hold a lock on this or do I hold a lock on that but different data systems allow you to do different things sometimes you can provide hints about locks and things like that but we'll cover that next class alright so any questions about concurrency control yes what kind of operations are happening on the data so the question is is there any way to let the data system know what kind of operations are happening on the data system like hints so you can do if you start on the visibility this applies some things you can do but I would say it's probably really hard to do and you ideally want an automatic way to do this and that means you have to do program analysis and support any possible programing environment and nobody does that right yeah I would say in general it's hard I mean if you run everything to store procedures then you can maybe figure it out for some of these things I think it's more than just though it's more than just like what operations the applications kind of perform it also has to do this term of correctness about what what is an allowed thing for me to do to see or it's an okay result for my transactions that only human can define and whether you can codify that through hints I don't think anybody's tried that that question so say it again so the question is any schedule that is view serializable is correct what's your definition of correctness yes so the way it will generate a result that is equivalent to some serial ordering schedule yes in that sense yes it would be correct yes yeah so when we talk about two phase locking two phase locking will produce serializable schedules that are conflict serializable not view serializable it may generate a schedule that is equivalent to both view serializable and conflict serializable but we can't guarantee that conflict serializable yes correct yes it would be a major software engineering undertaking that I think would sort of be impractical right so I also say too and I realize people are going to complain on Twitter or YouTube about this so serializability is the goal standard in database systems it is what you want to achieve ideally in your application because you don't have to worry about any inconsistencies you don't have to worry about any transactions you know that you're going to be executing transactions as if they were executed in serial order what we'll talk about next class are actually different isolation levels that relax some of the conflict restrictions that we talked about before to get better performance so people will complain I'll preface this in the next class the dirty secret is that most people don't run with serializable isolation or serializable transactions most data systems don't actually even support it if you use Oracle for example you say in Oracle you say I want a serializable isolation level because I want all the things we talked about here you don't actually get it, you get something lower so in practice serializable is something important to understand but this doesn't actually show up it's not as why they use outside academia as people think it is so in the last 30 seconds the third is going out today we released tonight what you're going to end up building is two parts there will be a lock manager because two phase locking so you do strict two phase locking and regular two phase locking and then you also go back and take your B plus tree that you put in the second project that you do tonight and you're going to go back and add support for latch crabbing be able to take latches as you go down to allow multiple threads to modify the index at the same time because the B plus tree you're building in this first assignment is only single threaded there's no protection mechanisms next class if you guys want me to I will post the the project right up tonight and again please don't cheat on this please don't plagiarize, we're going to run through moths on autolab and then next class is two phase locking and isolation levels so every minute rush time, is there any questions? done, awesome thank you guys