 What a fun, so last class we sort of rushed at the end the discussion on lock granularities. So I want to go through that again because I think this is kind of important because this is going beyond what normal two phase locking sort of covers like you have to understand how we're going to do this when you may want to do locking not just on the individual tuples. So last time we said that in all the examples I gave before under two phase locking we said the lock was essentially on an object and in actuality you usually envision this as being on a tuple. But in some systems you actually can have a lock hierarchy where you can have these different elements of the database all have individual locks that you can acquire on them, right? So you can have a database and it has tables, it has tuples and tuples have attributes. So the way to sort of think about this hierarchy is that if our transaction comes along and it locks the table that implicitly locks everything below it in the hierarchy. And the idea here is that if we want to do a very large update to our database or our table maybe instead of getting a lock on an individual tuple we've got a lock on the table or a page and that will cover a larger segment with fewer invocations of the lock manager. So the example, the first example we shared before was two transactions with the first one you want to get the balance of my bank account, the second one you want to increase the balance of Joy's bank account and then our question is what kind of lock should they obtain. And what we talked about was that we introduced a new class of locks, new types of locks called intention locks, that allow us to provide hints to other transactions or threads that are running in our database system at the same time in our lock hierarchy be able to say that there's something going on more explicitly down below but you can figure out what that is, essentially what that is at a high level in the upper regions of the hierarchy, right? And the three classes of intention locks were going to be shared intention inclusive attention and shared intention exclusive. So again, attention lock essentially is at the higher level, tells you what's going on below it without having to traverse down and look at everything below you, right? Intention shared basically says that down below there'll be a shared lock. Intention inclusive says down below there'll be an exclusive lock and then shared attention inclusive will lock whatever node in the higher level that you're at under shared mode but then it gives a hint that says down below there are exclusive locks and the compatibility matrix was like this, essentially intention shared is compatible almost everything except for exclusive locks and then as you go down they become less compatible and you may not be able to intermix them at the higher levels, right? Again, the high level of the way to think about this is at the top levels you take intention locks or the explicit locks that's shared or exclusive but then at the leaf nodes you have to either take shared or exclusive. So we use our example of these two transactions, T1 wants to read my bank account record from R, it's just gonna read this one tuple. So at the table level in hierarchy it gets an intention shared lock because it wants to say down below I'm gonna take a share lock out of tuple and then when I find my tuple then I take the explicit share lock. And then the transaction two that wants to update this tuple over here, it can take a intention exclusive lock at the table, which is compatible with the intention shared lock and then down below it takes the exclusive lock on it, right? So we do a more complicated example of three transactions. So we'll have the first transaction wants to scan R and update a few of the tuples. Transaction two wants to read a single tuple out of R and transaction three wants to scan all of the tuples that are as well. So the first guy will do a scan on R. So he wants to read all these tuples and update the last one. So it can take a shared intention exclusive lock. So that implicitly now says all the locks below or all the tuples are now locked in shared mode, as well as the table essentially being locked in shared mode. And then it can take the exclusive lock on just the one tuple that it needs to modify. So now, again, the second transaction wants to read a single tuple and wants to read this one over here. It can land on the table, take an intention shared lock, which is compatible with the shared intention exclusive lock. And then go down below and get the shared lock that it needs. And then the last guy wants to do scan all the tuples. So we want to get a shared lock for all of these guys. So you could go down below and just take shared locks as you go along, but it'll end up being when you get to the last one, you won't be able to get it because the other transaction has the exclusive lock on it. But because we try to put a shared lock on the table, this actually will get blocked or denied because a transaction holds a shared intention exclusive lock. And therefore, we don't even need to go check and try to acquire the locks for everything at the bottom, because it will get blocked right away. Yes? Can you give me one more time for T1? Why do you need a shared intention exclusive lock instead of an intention exclusive lock? All right, so transaction one was scan a few tuples and update them, right? So your question is why is he getting a shared intention exclusive rather than, right? So we're going to scan all the tuples, right? We want to read all of them. So if you did a shared intention exclusive, sorry, if you did a intention exclusive, right? That would mean that these guys would not be in under our shared locks, right? Then you had to go down and take explicit share locks on each of them. By saying shared intention exclusive, the shared part, the first part, the S, that now means that the table is in shared lock. And implicitly now all the tuples are under that shared lock. So I only had to acquire one, to lock all of them, I only have to acquire one lock, the shared lock on the table at the top. But since I'm also going to do an update, then I also want to get an exclusive lock, so I need intention exclusive there anyway. So you could have acquired a shared lock on R and then try to get an intention exclusive lock on R, but now that's two calls into the lock manager. And the lock manager would have to be smart and say, well, I know you're the same transaction, so maybe I'll convert you now. Or I'll try to figure out how to upgrade your lock so that you can have this intermixing. Whereas now it's one lock request. I get everything locked in shared mode. And then the last guy, I get my intention lock, right? And again, anybody else who comes along would know by just looking at the table lock what's going on below, or at least have an idea what's going on below. So his question is, will this work with B plus trees? Hold that thought, we'll cover that later. The answer is no, right? In B plus trees as we'll see, you don't actually have intention locks. You don't need them. Okay, so this last guy wants to get a shared lock because he want to scan all the tuples. That's the minimum number of locks you need to scan everything, just the one shared lock on the table. But that'll get blocked because it's incompatible with the shared intention exclusive lock, right? So again, in practice, what we want to do is we want to minimize the number of locks that a transaction has to take, right? Because that could potentially increase parallelism and fewer invocations into the lock manager table. And these intention locks are essentially the hints we're going to use to figure out when we're allowed to do things. And the protocol for acquiring these locks is essentially always going to be top down. So you go down the hierarchy and acquire the proper locks you need at the different levels of the system. Now in my earlier example I showed you could have a database, you could have a table lock, that didn't show, but you could also have a page lock. And then the bottom you could have a tuple lock or attribute lock. In practice, it's usually you don't have locks on individual attributes because that's too fine grain and it's too heavy weight to maintain. Typically you have page locks, sorry, typically you have database locks, table locks, and tuple locks. Maybe some of the commercial systems you can also acquire page locks, but that I don't know, but he actually does that. And then I also sort of mentioned this before is that we can use lock escalation to upgrade what the lock type we have is. We always have to go in one direction. So if we have a shared lock and we want to say now we want an exclusive lock on an object, we're allowed to do that. We can't go in the other direction, right? So the last class we spent all this time talking about these different locks in two phase locking. And in all the example schedules that I showed you, where we had the reads and writes, but then we had the lock and unlock commands. In practice, you as the application programmer don't actually write lock and unlock commands. You can do it and I'll show you how to do it in the next slide. But the way to think about, again, two phase locking is that the application programmer doesn't need to know anything about concurrency control, right? It doesn't need to know anything about isolation or things like that, right? They would just want to assume that their transaction is the only transaction running at the time of the system. So they don't want to worry about reading and writing data from other transactions that may be going on at the same time. That's sort of what the serializable is guaranteed provides for us. But it may be the case where the database system needs a bit more information and know actually what you want to do. Yes? So your question is if your lock hierarchy includes the database at the top, do you have to acquire a lock for that? Yes, you have to acquire, so the way this multi-graining layer to your hierarchy works is you have to acquire locks that at every level going on the way down. This one? Say when? The confatability matrix or the hierarchy? This? Okay. Yeah, so his question is why didn't I take the database lock? Yeah, it was just trying to illustrate this. Yes, I should have started at the database, correct? Yes. Okay. Again, sort of the reason why we always have to go from the top down is because of deadlocks, right? If we go top down, someone goes top up, then we might be trying to wait for each other's locks, right? By enforcing that hierarchy, enforcing that direction in the same way that we did with deadlock prevention, wait or die, wound and wait, we ensure that there could be no deadlocks. Okay, so again, the major thing I want to point out is like the way in practice you would use 2-phase locking as an application programmer is that you actually don't know about it, right? You're not going to call a lock explicitly. The database system is going to do this for you, right? And this is part of the reason why usually there isn't, you know, most 2-phase systems that implement 2-phase locking do strict 2-phase locking because there's no way typically for you as the application programmer through SQL to say, all right, I want to release this lock, right? It just releases them all when you commit. So in the SQL standard, actually you can, actually, this may or may not be in the SQL standard, it is not, right? So there are ways to explicitly lock an entire table. So in Oracle, Postgres and DB2, the command is lock table in particular mode and then your table is share exclusive. One of the things I like about Oracle and Postgres in some ways and to a lesser extent DB2 is a lot of the terminology that's used in the textbook, not just our textbook and other textbooks like, you know, shared and exclusive, you know, the syntax in SQL in Postgres actually matches up quite nicely with this, right? Whereas in my SQL, the modes are read and write, right? So you can lock an entire table and in this way, in SQL server they have this sort of extra syntax, you say with a table lock and then what mode you're in. It has to be part of a SQL query. And then in my SQL, you can say just lock the entire table, right? And typically you would do this for things like, if I need to do a bulk update, right? I want to prevent anybody from modifying, you know, the entire table while I do something, so you lock the entire thing. And actually implicitly also too, when you do like an alter table to maybe modify the schema to add or drop a column, the data system will take the entire lock on the table as well while it does this. At least in some systems. Some systems can be a bit smart about it. All right. Another cool thing you can do in SQL is provide a hint to the database system to say that I'm reading this tuple now but I'm probably going to update it later on or modify it later on. So rather than taking a shared lock when you actually read the tuple, you can tell the data system go ahead and actually take the exclusive lock now, right? So basically the syntax is you say, select star from table and then we have your where clause and then you add this for update at the end, right? Without this for update under two phase locking, when you read this, the data system takes a shared lock for you, but if you say for update, then it'll try to do an exclusive lock. Okay. All right. So that's pretty much fit for two phase locking. I will say that it's in use in almost every single database management system, even ones that they say they're using multi-version concurrency control. They're actually using two phase locking as the actual scheduling protocol. The thing that actually figures out how to interleave operations in the context of a multi-versioning system which we'll cover in two or three lectures from now. And again, the way to think about this two phase locking compared to what we talked about at the very beginning when we talked about serializability is that this is a dynamic protocol that doesn't know what transactions are going to do ahead of time. It sees a transaction start and sees a bunch of queries show up and it figures out whether you're allowed to read or write certain things based on what you want to do and what locks you need to acquire and what other transactions hold those locks. Right. And by itself, two phase locking can guarantee conflicts serializable schedules but is it still susceptible to cascading abort? So if you use strict two phase locking then you can get conflicts serializable schedules without cascading abort. Okay. So any question about two phase locking? Okay, awesome. So now let's switch over to do the next lecture which is index concurrency control. So in this class, in this particular lecture we're going to actually talk about two things. So we're going to talk about how indexes affect what we have to do in concurrency control in so related to like two phase locking or time stamp ordering concurrency control. But then we're also going to answer the question he had of how do you make a concurrent B plus tree? Or do you want to use two phase locking or another locking protocol? Okay. And so the second part of how do you actually build a concurrent B plus tree is what the third project one of the components of the third project. All right. So when we talked about transactions in the first concurrency lecture and under two phase locking, we made this big assumption that our database was static and meaning that the number of objects we had and our database was fixed. We didn't have any inserts. We didn't have updates. We didn't have any deletes. But now if we introduce those operations, we start to hit other problems. That two phase locking by itself is not going to be able to solve for us. So now this example here, rather than having low level reads and writes, I want to show actual SQL statements. So the first transaction is going to read our people table and it wants to find the maximum age of a person that has their status set to lit. And then our other transaction will come along and it will insert into the people table somebody with age 96 and their status to lit. And then we're going to execute that same query that we had before where we get our max age. So now the issue is that when the first transaction runs the first query the first time, it gets the value 72. But then somebody goes ahead and inserts into the people table age 96. So now when we execute this query again, we get 96, right? So this is obviously wrong, right? Under what we said before about serializable schedules, if we were executing these transactions in serial order, the second guy shouldn't be able to see 96 because they saw 72 before, right? So assuming we did two phase locking, this would actually still happen, right? And it's sort of obvious why, right? Because when we ran the query the first time, if we take shared locks on all the individual tuples in order to read them, when that next guy comes along and inserts that tuple, it was able to insert an entry that couldn't have been locked because it didn't exist before, right? So config serializable will guarantee that in our two phase locking, will guarantee that we can achieve serializable schedules only if the number of objects in the database is fixed. When we solve this problem next class, I mean actually this class right now, right? So this is the issue we've got to deal with, right? And part of the problem we're possibly dealing with here is that again the ignoring the lock granularity stuff, right? If I only lock the individual tuples, then the first transaction can't lock a tuple that actually exists, right? So the way we're going to solve this is actually relying on our indexes that we could possibly have to lock things that we haven't gotten yet. And so there's a couple of approaches to do and I'm going to highlight at a high level two of them. This is actually, we could spend hours discussing this and so I'll cover this in the advanced class, but I just want to give you a high level what actual real systems do and how you actually implement them is quite difficult and again we'll focus on that in the next semester. So the actual first idea to solve this problem, actually this is called the phantom problem, right? This is another anomaly like dirty reads or unrepeatable reads, right? The lost updates. The phantom basically means that there was a tuple there before and I read it and I come back and now it's gone or there wasn't a tuple there and now I come back and it actually appears and it can affect things like aggregates. So the actual original idea from the IBM guys in the 1970s to solve this phantom problem was called predicate locking. And the basic way to think about this is that for any possible predicate you could have on your table you would sort of maintain this multi-dimensional data structure that could figure out whether the thing you're trying to, you know, another query would overlap with a predicate from another query, right? So in my first query in that first transaction I did my aggregation and I said where status equals lit. So rather than trying to lock all the individual tuples that satisfy that predicate I could have this virtual lock that says for any possible tuple with this predicate I now hold a lock for it. So now when the second guy came along and tried to insert that record that satisfied that predicate then you would hit the collision or hit the lock conflict and it wouldn't allow it to be due to the insert, right? So this is what the IBM guys actually proposed in the 1970s when they first sort of said oh here's this phantom problem that can occur under two phase locking and here's a potential solution to it. They didn't actually implement it as far as I know because it's actually quite difficult to do, right? In my simple example here where status equals lit it's one thing we could just easily check that. But now think about if you have a where clause with a bunch of conjunctions or disjunctions, right? Now you have all these different multi-dimensional spaces you need to see whether there's an intersection and that would be really difficult to maintain. So in practice nobody actually does predicate locking even though it seems like the obvious solution. Now there was a proposal called precision locking that's sort of a rough approximation of this that came out in the 1980s and as far as I know there's only one database system that does that and that's hyper out of Germany and we'll cover that in the advanced class, right? So what we're going to want to need now is since we can't have these sort of arbitrary predicate locks we're actually going to rely on indexes to approximate this kind of locking and the hope is that we can rely on our indexes since they're already a data structure that knows about ranges and knows about values on particular attributes we can rely on that to help us do our locking efficiently, right? So assume that you have an index on the status field and what we can do is that we can find all of the pages in our index that correspond to where status equals lit and then we can take a lock on those pages and now we know that if anybody tries to insert a new tuple with status equals lit they will try to go to one of our pages that we have locked and that essentially prevents them from being able to do that, right? The tricky thing of course is that if you don't have any records with status equals lit then you need to have some virtual page or some kind of marker to say, all right, here's where a page would exist if they had the status equals lit. Let me go ahead and lock that and then that way if anybody comes along tries to insert a tuple the index would figure out well this is where you would go if you did exist and somebody else already holds the lock for that. Yes? It looks like it's only going to work if you have an index built on the attribute so it doesn't talk to the problem in general. Yes, so his statement is this doesn't work unless you have an index built on the thing that you, it's in your where clause that you want to lock. You're absolutely correct, yes, right? So in a OLTP setting, right, in an OLTP setting you want to avoid all sequential scans so anything you're going to do a lookup on you're probably going to have an index so otherwise you're just scanning the entire database and all your transactions are going to run super slow, right? So then now if you say, all right, well, I don't have an index and I'm okay with sequential scans or you're probably in an OLAP environment where you're not trying to insert and update things all the time and you probably wouldn't hit the phantom problem and actually nor would you actually probably even care, right? If I read one, you know, if I'm not trying to make critical decisions about the max age of somebody when I want to do analytics, I don't care if it goes from 72 to 96. But in an OLTP environment where you're trying to use your operational database to change things on the fly, make other decisions, maybe you do care about strong serializability. So you would have an index, right? Yes? But even if you didn't have an index, what, like, best solution, like assuming that you were given if you had to do this insert or update, would it be just an activity that's on the fly? So this question is, if you don't have an index, what could you just create one on the fly? No, right? Because building an index is not cheap. If your table is huge, right? Take a long time. Yeah, yeah. His question is, would this not work on a column store? Why would it not work on a column store? All right, so his question is, his question is, if I want to lock an index page, we're talking about indexes not the actual date itself, right? So if you have an index, you lock the index page. It doesn't matter whether it's a row store or it's a column store, right? Again, and so in this example, we're trying to use the index as the sort of central lock location for us to be able to figure out whether someone's going to insert something or update something that may interfere with our transactions, right? So related to his question earlier is like, what can you do if you don't have an index? And essentially what you can do is, you could take a lock on every page, which would be kind of wasteful, right? Just to make sure that nobody else ever actually changes anything, right? In addition to inserts, we also have to worry about someone having an existing status. We go ahead and change them to the lit and then we would have another problem, right? So we lock every page that does this. We also can lock the entire table. It's essentially the same thing as locking every page. But again, that's sort of wasteful. There are other methods to do this without an index. Precision locks or the predicate locking was one example. There also is sort of graph data structures. You can figure out whether there's conflicts between other transactions. Postgres actually does that, but that's a bit advanced. We're not going to cover that here. In practice, most of the times when database systems say they support serializable transactions and they truly support serializable transactions, then they're going to do some variant of index locking that I'm sharing here. And the reason why I'm saying they say they're serializable and they truly are serializable because subsystems like Oracle is notorious for this. If you say you want serializable transactions in Oracle, it'll come back and say, yes, you're now serializable, but in practice, it's actually a lower level. And again, we'll talk about this later on. And in a lot of applications, that doesn't matter, that doesn't care, and that's a good trade-off from a system development standpoint. The alternative to doing locking either with index or without indexes that I've mentioned is essentially just do what's called repeating your scans. So for point queries, it's essentially easy, because if I update something, I'll hold exclusive lock for it, so I don't have to worry about somebody else updating it for me. The issue we're trying to deal with really here is if I do a range scan and someone inserts something into my range that I didn't see the first time I came through. So one approach to get around this is that you just keep track of the tuples you read for every single scan your transaction does, and then when the transaction goes and commits, you go back and run that scan again and see whether you get the same result. And if you get the same result, then you know there weren't any phantoms and then you can go ahead and commit the transaction. All right? So it seems kind of wasteful, but actually this idea of repeating your queries just to see whether it matches up with what you did the first time, it does show up in other cases. As far as I know, I don't think any commercial system actually does this approach. The only other major system that I know about is this thing called Silo, which came out of Harvard and MIT a few years ago and it's an in-memory database engine and they support serializable transactions by just again just rerunning your scans when you go to validate the transaction and if your scan set matches, then you know you're serializable and you go ahead and commit. If it doesn't match, you abort and try the transaction again. Right? So the next thing to sort of address is the sort of been alluding to this all the last couple of lectures. And I never know when the right time it is to actually teach you about, you guys, about lower isolation levels because you kind of need to know what it means to have serializable isolation before I can tell you what the lower isolation levels are and to understand serializable isolation, you have to understand, you know, complex serializability, two-phase locking and phantoms and all the other anomalies. So I've been referred to this couple of times that serializable is really nice and from an academic standpoint, it seems like the gold standard of what you would want, right? Yeah, your transactions are going to run as if they're in isolation or isolated from each other even though we may interleave their operations in different ways. But in practice, most database systems actually don't enforce this, right? They run what is called at a lower isolation level. And the reason is because the performance overhead of doing all the things you need to do to ensure that your transactions run at serializable isolation is expensive, right? And it's actually difficult to program, right? It's difficult to implement and it's difficult to get... You know, I had this actually run efficiently. And this is part of the reason why, again, the NoSQL guys, when they first came out, you know, almost, you know, eight or nine years ago, a lot of them didn't do transactions at all in the beginning let alone any serializable transactions because implementing concurrently told correctly is actually really hard. And then there's cases where they come out with transactions and it often comes out wrong, right? When Cassandra introduced lightweight transactions a few years ago, someone immediately found a bug in it, right? And there was another example of a system called Aerospike where they claimed they had strong consistency in transactions and then this guy in San Francisco came up with this benchmark that actually proved that they weren't and they had to go change all their marketing literature, right? So serializable, again, from an academic standpoint, seems like the ideal case, but in practice, most transactions, most applications want to run at what's called a lower isolation level. And a way to think about isolation level is that you're telling the database management system to what extent the transaction that you're going to run is allowed to be exposed to the other operations of transactions running at the same time, right? And so the things we talked about before, the dirty reads, the unrepeatable reads, and the phantom reads, these anomalies, if you want to avoid all of these and have true serializable isolation, then the database system has to do all the stuff we talked about so far and enforce them and make sure that your transaction is safe from other transactions, but that limits the amount of parallelism and the amount of concurrency you have in the system and can slow you down, right? So again, the way to think about this is that for these anomalies, these problems, when you run at lower isolation levels, you're not guaranteed to be, to incur these or hit these problems, the database system is basically saying, I'm not going to do anything extra to prevent you from actually not having these problems, right? So in the ANSI standard from SQL92, they specify the following four isolation levels. Now, there's actually way more or a handful more that we're not going to cover here, but these are the main ones you have to understand. And again, they fit nicely with the, well, what we talked about so far. So the very top was serializable isolation. And again, remember I said that in practice, this is always going to be conflict serializable because most data systems aren't going to be able to implement view serializable. So if you run at serializable isolation, you have no fandoms, all your reads will be repeatable, and you have no dirty reads. But then below that, you have repeatable reads where fandoms may occur, right? Because you're not going to do the index lock that we talked about. Below that, you have read committed where the fandoms may occur and unrepeatables actually may happen as well. And at the very bottom, you have read uncommitted where all of these different anomalies can happen. And as I said, just because that you tell the data system you actually want serializable, it doesn't actually mean it's going to actually truly implement that. And the same case for sort of the one at the bottom, if you say you want read uncommitted, I think maybe in Postgres, you actually can't get that because the way Postgres does concurrency control, it would actually be more work for them to make you do read uncommitted versus running at the higher isolation level that they normally support. So in some cases, some systems don't support the extra things you need to be serializable. In some cases, they don't support the extra things you would need to get to the lowest level, right? And again, the way to sort of think about this in this table is that the different anomalies at the highest level, the data system will guarantee, or at least it's supposed to guarantee, so sometimes there's bugs, will guarantee that you cannot have these problems at all. But then as you go below, the reason we say maybe instead of yes is because it depends on what other transactions are running at the same time and whether you're actually going to be exposed to them or not. And it's just saying the data system is not going to make sure that they don't happen. So the way you actually implement this in the context of two-phase locking is that under serializable isolation, you're essentially going to do strict two-phase locking. You're going to acquire all the locks that transactions need ahead of time. And then you plus you do the index locking that I talked about before, right? You lock the pages of the indexes that match the where clauses of the things you're doing lookup on. And then if you go down to repeatable reads, you get basically everything from serializable except they don't do index locks. Under recommitted, what happens is that you get it's two-phase locking but shared locks are released immediately. And at the very bottom, you basically have taken no locks on anything. You can read anything you want. And we can do this in MySQL and Postgres. I think we tried last time and we couldn't get it to actually do it. So the way you use this in SQL is that you can use this command called setTransactionIsolationLevel. And then for the isolation level part, you put in one of the four types that I said before, right? And not all data systems are actually going to be able to support all these different isolation levels under different scenarios, right? This is going to occur, again, if the data system doesn't support the upper and lower isolation levels, it just won't provide them. It won't throw an error. It usually says, yeah, you go ahead and if you say you want to read uncommitted, it'll say, yeah, I got it, but you may actually be at a higher level. You have to actually read them in in order to figure out what it actually means. And then the default actually depends on what database management system you're using, right? The SQL standard doesn't say anything. It just says, here's what you can provide and it's up for the vendor to specify what they want to use. So this is a great table from 2013 that Peter Bayless came up with. He's a professor now at Stanford. And it was a sort of a survey of a bunch of different database systems that were out of the time. And he lists what the default isolation level is and what the maximum isolation level that they can support. And so the first thing to point out is only VoltDB in Ingress supports serializable by default. And VoltDB was the system I helped with when I was in grad school. We built the system h-store and they became VoltDB later on. And I'm not gonna talk about how VoltDB works, but essentially all your transactions run as store procedures on a single thread by themselves at a partition. So no other transaction could be running at the same time. So you can't have any of these anomalies so by default it's serializable. And then Ingress was the first database management system that one of my advisors Mike Stonebreaker built in the 1970s. It's still around today. And actually I learned a few weeks ago it's actually open source. It's a weird open source and we get to sign up to get access to it. But it's open source. I haven't poked around and looked at it to see what it actually does. But if anyone wants to try this out for the extra credit I think it'd be very interesting to see what the original database system for the 1970s can do with the current one. It's obviously not the same code, right? But it'd be interesting to see how much it's matured or aged over the years. So the main thing to point out is most of these times you see that the default isolation level for most of these systems are recommitted. And actually we've done a survey of database administrators earlier this year and we found that we asked them what isolation level do most of your transactions run in most of the times it's recommitted. And we can't prove this but we think it's because it's just the default. And it's good enough for most things. I think Facebook most famously runs their entire MySQL cluster with recommitted and for their environment that's perfectly fine. And the other thing to point out too is on the maximum level some of these guys can be serializable but some of them aren't and the only other weird one here is Oracle has something called what's called snapshot isolation. We're not going to cover that in this class so maybe I'll cover it on their multi-versioning. But this is actually a really interesting one because it's not in the four things that I listed before it's actually this other isolation level that can occur that can have a different type of anomaly that is not covered by the ANSI stand that I'm described here. So again when you say you want serializable isolation on Oracle you actually really get snapshot isolation. And for a lot of applications that's good enough. So another thing you can do also for the data system is provide a hint is that you can tell it whether your transaction is actually going to read only or not. So when your transaction starts you can set the access mode using that SQL command and you can tell it whether you want to be read write or read only. And so the reason why you may actually want to use this because if you're read only then there are some optimizations that the data system can apply where it doesn't have to worry about other transactions checking to see whether you wrote anything, right? You can maybe take a snapshot of the database and just run with that and as long as that's consistent you're fine. So I think I know in later versions of my SQL and Postgres in all the commercial systems you can do this. But not all the data systems are going to actually take that hint in consideration and apply different optimizations, right? So just because you can tell the database to your system you're read only doesn't necessarily mean it's going to do extra things to make your transactions run faster, right? That's what's up with the implementation. All right, so any questions about isolation levels or these access modes and things like that? We can do a quick demo and play around this in SQL server or sorry, in my SQL and Postgres if you want. Okay, so let's do this. All right, so we're going to use the same the same sample table that I had before from last class the Wi-Fi connects, good, right? So this is Postgres. So we have that same table that has the two tuples. So what we'll do is we'll start a transaction at the top and that top transaction will be under serializable isolation and we'll have this bottom one be under read uncommitted. So this first guy will update the first tuple and just increase the value by one. The bottom guy will update the second tuple and just increase the value of one. So this guy should be able to read the table now and see the change from the guy at the top. He doesn't, right? Because the way Postgres actually implements concurrency troll is actually using time stamps and everything so they actually can't make read uncommitted happen because it would require them to do extra work. So you don't actually truly get read uncommitted. But now if I try to do that select here, this actually may stall. Let's see what happens. Yeah, let me do it, right? And so, yeah, now you only see your own update. But now if I go ahead and commit this, the top guy, now down below, now I can see it's changed, right? So this one, so in the bottom case here, I was running at read uncommitted and when I did the select on the table when the top guy hasn't committed yet, I didn't actually see it. The top guy then commits, now the bottom guy can see it, which is what you get on the read committed, right? So let's try the same demo in MySQL. So same thing, I have my table like that. So in MySQL, I think you have to start the transaction and then set the isolation level. So this guy, down below, we'll start him, reconnect it, that's fine. And then we'll set ourselves at read uncommitted. So same thing, the top guy will then do update on the first tuple. The bottom guy does an update on the second tuple. Now, let's see, so before again, when I ran this in Postgres, when I did the select on the bottom transaction, he didn't see the change from the top guy. So let's see what happens here. Doesn't see it either. So now go to the top, do a select, he doesn't see it either, right? So now I'll commit this guy, he don't see it either there either. So now if I commit, then I finally see it. So it looks like, unless I'm doing it wrong, MySQL actually can't do read committed either. No, that can't be right. So we're doing something wrong, okay. So I'll start a new transaction, say we're serializable, update this, start this guy at the bottom, say we want to be read uncommitted, and let's see whether we can read that now. There we go, now he's able to read it. Why couldn't I do that before? That's fine, right? So here, the bottom guy sees the right that the top guy did, right? It sees 101, right? So now if I roll back the top transaction, and I come back here, and now do my read again, right? Now the value disappears, right? So this is not, you get this example of unrepeatable read, right? So this bottom terminal here, this is currently in a transaction, right? So we were running serializable, we shouldn't be able to see that, we shouldn't see the value flip. But in this case, we do. Are there any questions about isolation levels? Yes? Is there a way to start a transaction and say, I don't want anybody else to see what I'm doing? Yeah, so the way isolation level works is it only specifies for your transaction. So you say, you're telling the distance what you're allowed to see. Other people can still see your stuff, you can't prevent them, right? So actually you could do that by, if you really wanted to do that, you could lock the entire table, right? So I think in, so now here if I go, so I lock the entire table in here, and I try to read from it, and it stalls, right? So again, it really specifies what your transaction is allowed to see. It doesn't affect what other people can see. And eventually this will time out. I forget what the default is in my SQL. Let me show you also how to start a transaction in read-only mode, right? So, I forgot the syntax. What was the slide? Sorry. All right, let me actually show you also select for update too, because that's kind of cool. Yeah, so set transaction in rewrite read-only. So basically what will happen is the, if I say begin transaction as read-only, and then now if I try to do an update, it throws an error and says, you can't do that because you're in read-only mode. And again, whether the data system actually does it protect, does it wait for somebody to protect you, or it does optimizations based on whether your transaction is read-only or not, depends on the implementation. All right, so the last one I'll show you real quickly is the select for update, because that's actually something is very useful in practice. And a lot of times I think you'll see that in real applications. All right, so we have two transactions now running in serializable mode. So if I do star here, I can read that, select star from that. They both can read this. But now if I do select star from transaction demo, for update, what should happen here? So both the top transaction, the bottom transaction, are running in serializable isolation, and they both read from the table. So what do they acquire? Shared locks, right? And now we're doing select for update. This for update says I want an exclusive lock. So this let me do it in Postgres, because the way they do snap to isolation in the commercial, you're allowed to read that because it's still consistent. But if I try this, the same thing in my SQL, which I think I'm still holding this lock, so yeah, okay. So we'll start the first transaction at the top and the bottom in serializable isolation. The first guy will do a begin, select star from transaction demo. Second guy does the same query. They both can read it. Now he does a for update and this should block. Yeah, so this one blocks, right? So, and again, the reason I'm showing you Postgres and my SQL at the same time is because it just goes to show. Even though I said I want both these transactions in both systems, running a serializable isolation, what that actually means and how it's actually implemented and what protections they provide can vary wildly, right? So there is a textbook definition of serializability and there's different ways to implement them. And based on how you implement them, you get different behaviors, okay. All right, so now the question that he had the very beginning was are we going to use two-phase locking on our indexes? And the answer is no. And the reason is because indexes are actually different than the actual data in our, the tuples in our tables, right? And at a high level the way to think about this is that the logical contents of the index is the only thing that we care about. We don't actually care about the physical structure of the index, meaning if we read it, so what I mean by that is if I read it at one moment and I say, is there a record where key equals Andy? And it come back and says yes. And then some other transaction comes along and it modifies a bunch of stuff, but it doesn't touch that one key I read before. So I did splits, I did merges, the physical data structure now has changed. So now if I come back and say, do you have a record with key equals Andy? As long as I get back to the same answer I got the first time, I don't care how the index actually change underneath the covers. So the only thing we care about is protecting the logical contents, which is what the index locking and all the other stuff I talked about before. The physical contents can change at any moment. But of course now, as we saw when you implement your hash table, you could have multiple threads trying to modify the physical data structure at the same time. So we need a smart way about protecting these things. And so the reason why you don't want to use two phase locking, if you sort of think of it as your index as abstract tree, I say I would transaction once update node H and transaction two once update node I. If the first transaction comes along and I use two phase locking and I acquire all the locks as I'm going down, I can't release anything. Because soon as I release one of these locks, now I'm in the shrinking phase of two phase locking. And if I have to come back and probe this index again, I can't acquire new locks because that would be a violation of two phase locking. So that means that if I use two phase locking on my index, I'm going to hold that lock for the entire point of the transaction. And now I've acquired an exclusive lock on the root of the index. So that means no other transaction can actually read this index. So it's essentially useless, right? So the reason why we're not going to use two phase locking to protect the physical data structure of our index is because it's going to require us to hold locks too long. So we need something better. We need something that's specifically designed for a B plus tree. And that's going to maximize the amount of concurrency that we have. So the technique we're going to use is called lock crabbing or latch crabbing. And the basic idea is that as we traverse the tree to find the thing that we're looking for, we're going to acquire locks or latches on our way down, this is a good example where I intermix the word latch and lock. And here I really mean latch because it's the low-level mutex. So we're going to acquire latches for our parents as we go down where we're starting point, and then before we go to the next node, we have to acquire the lock or latch for that guy, and then we traverse down to that. Then we check now to see whether the node we're at is considered safe. And if so, we can then release the lock and latch above us at our parent and then keep traversing down. And so by safe, I mean that we know that the node that we're looking at would not split or merge, because if you split, then you have to take the node you're at, add another one, potentially further down the tree, and then you will have to modify the parent to say, here's an out pointer to the new thing I just split. Same thing for a merge, if I'm coalescing different siblings, then I'm going to have to update my parent and say, you don't have two pointers, now you just have one. So once we know that at the current node we're at, in our level in the tree, we're not going to have to do a split and merge, then we can go ahead and release any locks that we've taken above us. And the reason why it's called crabbing is to think of like a crab walking, you're taking locks and latches as you go down that way. I realize my hand gestures are probably not helping illustrate this, but I'll show some examples. All right, so for searches, again, you start at the root, and then you take a shared lock on the parent. And then if you know it's read-only, once you get to the child, you can go ahead and release the shared lock, and then keep going down that way. In the insert and deletes, you always have to take exclusive locks on your way down, and then once you have the lock on your child, you check to see if that's safe, and if so, you can release all the locks that you have above you. So I'm going to go through a bunch of examples and I'm going to help illustrate it. So here's our really simple B plus tree, and what I'm going to do is I'm going to label the nodes with actual letter labels so that we can understand what we're dealing with when we go down. And so in our first example, we want to do is we want to search for the key 38. So on the lock crabbing, we acquire the shared lock at the root on A, then we acquired the shared lock on B. And at this point, we know it's safe to release A, the lock on A, because it's a read-only operation, it's a search. So we're never going to modify anything. So as long as we get to our child, we can go ahead and release the lock on the guy above us, right? Then we keep going down and do the same thing, just keep grabbing, inquiring, and traversing. Then we get to the bottom and then we can read the record that we were looking for. Yes, in the back. It's essentially a pair-wise comparison in each level. What do you mean by pair-wise? So you didn't check on the baby's fly in bed, eat, or talk about it at the time? I'm not sure what you're saying. I'm missing what you're saying, sorry. So, start at the very beginning. I always have to get the shared lock on the root, right, I get that. Then now it's safe for me to check what keys I have in the page and figure out what direction I want to go. So now I know I need to go to B. So while I still hold A, I get the shared lock on B. Once I land there, then it's safe for me to release the lock on A, right? Cuz if you did it a different way, if I said, all right, I need to go, if I'm at A here, and I know I need to go to B. If I release A, and then now jump to B, someone may have modified A at the moment as a race condition. The modified A at the moment I jumped to B, and now that pointer that I had before is gonna go to nowhere, right? We're really talking like microseconds here but it's still race condition, it still could happen. So we have to be careful. Yes? So why do you search 38A, then you will find a count of D, right? So why do you touch D, then you first release D or first release C? So I'm at this point here, I'm at B, right? I can release A, now I can acquire the last lock on C. I jump down to C, now it's safe for me to release the lock on B. Your question was what, sorry? So I get here, same thing, I have to acquire the lock on B first, then traverse there, then it's safe for me to release the lock on C. So you release C first, then release D? Correct, yes, yeah, so. Yeah, so again, this is sort of a reoccurring theme. You wanna make sure you release and acquire locks always in the same direction. So we're gonna acquire locks on a way down and we release them behind us, right? And that avoids any deadlocks. Cuz there's no, nobody could have a lock at a leaf node and try to acquire a lock above us, cuz the only way they could get to us, the leaf node would be traversed down and take our latches or locks as we go. So there's always in one direction, so there's no dead locks. All right, let's do a delete now, let's say we delete 38. So here we'll acquire the exclusive lock on A, then we jump down to B, right, and get exclusive lock on B. And now at this point, because we know that we, if we deleted an entry from B, that we would actually end up having to do emerge or coalesce with somebody else, right? Cuz we don't know what's gonna happen below us. We may actually then do emerging all the way up the tree. And if we delete an entry from B, then we'd have to merge B as well. So B is considered unsafe at this point. So we can't release the lock on B because we may have to modify A. So then when I jump down to C, now this point C will be over half full. So if I delete an entry from C, then I know it can absorb that without having to do emerge. So we don't need to modify anything above us. So now it's safe for us to release the latches or locks on A and B, all right? Then we get down to the bottom, all right? We get to D, D again is more than half full. So if we delete an entry from it, we'll be fine. We can release the latch on C, and then we can actually then apply our delete. Yes? So that was this point here. So when you're at B, right? If below us in the tree, we have to do a merge because we're doing a delete. If we have to delete something, and that delete percolates up to B, where we want to delete the entry from B, then B would be less than half full. So this is going to have to merge. And when you merge, you have to update your parent to say, here's the pointer to the right thing you should be looking at now. So at this point, at B, it doesn't know what's going to happen below because we haven't got there yet. So at B, we may, if someone below us has to merge, we'll have to merge. Which means we have to update our parent. So at this point at B, we can't release on A. And only when we get down to C, do we see that, all right, well, if I do delete something below me in the tree, I'll still be more than half full. So I'm not going to modify C and have to merge it. So therefore, I can go ahead and release the latches or locks on A and B. Yes? When we get to D, we would also release the latch on C. Correct, yes. So this is one way, right? So now if someone else comes along and wants to delete 44, right, that would still be fine because you could delete node E and I and D would still be half full. And then you'd delete a key from C and that still would be half full. So you wouldn't have to merge all the way up. OK, let's do an insert. Same thing, exclusive lock on A, keep going down to B. And at this point here, we know that if we have to split below us, B has enough room to handle that. So we can go ahead and release the latch lock on A. Keep crabbing on our way down. We get to the bottom. And we have enough room for E, so we can go ahead and release the lock on B and C. And then we can do apply or insert into there, right? Again, it's basically checking to see whether it's safe. And safe means that if I'm doing delete, will I still be half full if I have to merge below me? And if I'm doing an insert, will I still have enough room if a new key gets put into me? OK, so let's do now a more complicated insert, 25. Same thing, exclusive lock on A way down. And we do crabbing, so we release them as we go. We get to this guy here. We can release B, because we have enough room. Now we want to insert into H, but this is not going to have enough room for us. So at this point, we still have to hold the lock on F because we know that we're going to do a split. And there's a new key written into F. And so we need to make sure that we have exclusive lock on that, prevent anybody else from modifying it or reading it at the same time we're doing this. So we'll go ahead and make our new entry for the key we're trying to insert, or the key that we're merging out from the split. And at this point, we still hold the exclusive lock on F so nobody can find our new page yet. So it doesn't matter that maybe we haven't updated the pointer exactly where we want to go yet, or it doesn't matter where we put the key 31 in F first or we make the page first, because no one else can be able to see this, so everything will be fine. So now at this point, we can go ahead and release all our latches and our insert operation is done. The back, yes. This question is, how would you go about doing bulk inserts? So I'm not going to cover this, but things get tricky when you want to start doing things that go across siblings. And the way you typically handle that is when you need to jump to the next sibling, you just do another traversal and take the same thing down. Because the issue is, again, if we start scanning along the leaf nodes, I may acquire a latch or a lock for these guys down here, but I didn't come all the way down to get there, so somebody else could be coming down to do something else as well. So a lot of times you'll see bulk insert is sort of the same thing as a range scan, like you're updating a lot of things and they'll span multiple pages. You just have to traverse down and do one after another. Typically, bulk inserts, as far as I know, it's not like I can take a 20 keys and inject them all at once. Each key will be a separate insert. Bulk inserts are typically done when you want to re-build the entire index. And that one, you have exclusive access while you do this, so you don't do this crabbing thing. So an obvious question is, what's the first thing I did in all my examples that I showed you? For the insert, the update and the leaps and the searches, what's the very first thing I always did? Lock the route, exactly, right? And in this case, for the modifications, I'm taking exclusive lock. So that means that for a brief window, this thing has basically done, it's a serial data structure, meaning nobody else can do anything while we're, you know, why we hold this lock on the route. And depending on the operation to lay out the tree, I may have to hold that lock for the entire duration of the operation because I don't know whether I'll be splitting emerging down below. So locking the route every single time is always going to become a big bottleneck in a lot of environments, a lot of applications, when you're updating your indexes and inserting new entries all the time. So a better locking algorithm that came out of the 1970s from this paper from these German guys, Bayer-Schlocknick, is that what we're going to do is whenever we want to do a modification, an insert or a delete, that we're going to assume that the leaf node will be safe, meaning it's not going to have to split, it's not going to have to merge. And so we'll take shared locks all the way down. And except when we get to the leaf node, then we'll take an exclusive lock. And then once we figure out when we're down there, is our leaf node actually safe? If it is, then we're fine. Can we go ahead and do our operation? We're not going to have to do a split or merge. But if we get it wrong, then we just restart the operation and take the exclusive locks that we normally would. So let's do delete 38 again. So now I'll get a shared lock on A, whereas before I got an exclusive lock, and I'll do my normal crabbing where I can release my locks as I go down. And then when I get to C, I'll go ahead and try to acquire the exclusive lock on D, then check to see whether I'm still safe. If yes, then I can do my operation without ever having to take exclusive locks on the way down. So this is an optimistic protocol or algorithm where you assume that splits and mergers are rare, which in practice a lot of cases, that's actually a safe assumption. And therefore, you take shared locks all the way down, and you don't have to lock the route every single time. Yes? So when you release all the locks on A, then you release them on A. So his question is, I think the next example will do that, right? So his question is, if I do crabbing all the way down and I get to this one here, and I'm doing an insert, so I know I'm going to have to split, so now this node is not safe, I'm going to release any latches that I have, or locks that I have, and just restart from the very top. Right? So his question is, if there's another transaction, say it was insert 24, right? We're trying to insert 25 or something. If they're here at F with a shared lock and they're blocked waiting to acquire our exclusive lock, we do our check, and we save it to restart, and then we restart, but in between that time, they actually then can modify it because it's their turn to go do it, right? Your question is, is that OK? Right, again remember what I said before, I don't care about the physical data structure. I only care about the logical contents at a higher level in like two-phase locking and concurrent control. The index locking sort of covers that, right? So I don't care if somebody else inserted something to my page, and that page that I thought was there before is no longer there because something else happened when I came back the second time. I don't care, right? And that's sort of the beauty of this, and another example why you don't want two-phase locking is because that really strong protection is not necessary for this, right? OK, cool. So again, at a high level, the way this works is that the search is the same. Insert and delete is basically doing the shared locks all the way down, except when you get to the leaf, then you get the exclusive lock. If everything's fine, you're finished your operation and you're done. If not, then you restart the operations from the beginning and take exclusive locks on the way down, right? And again, the idea here is that we're making this big assumption that most of the times we're not going to be unsafe, so therefore taking shared locks and maximizing the amount of parallelism in our system or in our index is a better, is a good idea. And the cost of having to restart in the case where we're not safe is so minimal compared to the benefit we get from the increased parallelism. Yes? It looks like you could just make a tune of a forever saying that, oh, I've shared lock from zero to the depth. Basically, anything you could stop at the second mill and be like, OK, I'm probably deep enough down to three. I can just start grabbing exclusive locks because not many people will be grabbing that. Yes, his statement is that instead of only having to acquire the exclusive lock at the very leaf node, you could take exclusive locks as you get closer to it. Yeah, basically you could make a tune of a like two or three or something. Yeah, yes, you could do that, yes. OK. So again, the last couple of things I want to stress are in the multi-granary locks that we talked about before, you basically release them from the bottom up. And then for our latching here is that we release them from the top down as we go. And we're going to do this as soon as possible to maximize the amount of concurrency or parallelism we have in our system for other threads to do modifications. So any questions about crabbing? OK. And actually, I'll say a quick spoiler is that in the advanced class, we will cover lock-free data structure, lock-free indexes. This is something that's in vogue in the case of systems. And in practice, at least in our experiments, we have found that actually the lock-free data structures don't work as well as crabbing. So even though this idea is from the 1970s, it actually still is state-of-the-art. It still actually works the best. OK. So indexes make concurrency control a bit more tricky because it's this thing where it's not exactly the primary copy of the data and other transactions can read it, but we don't want to use two-phase locking to protect it in the same way that we protect our tuples and our tables. And building a current index is actually tricky and is important to maximize the amount of parallelism you can have because that's, in a lot of cases, in O2 applications, the transactions can spend most of their time doing their index lookups. So if you have an efficient index that is high performance, that'll make a big improvement. For OLAP queries, it doesn't matter as much because you're just doing large table scans anyway. And then most applications do not execute with serializable isolation, even though, again, the textbook and everything and a lot of academic papers make this big assumption that, of course, this is exactly how you want to run your things. So I will say, though, I think although serializable isolation is probably too strong for most applications, the lower isolation levels that actually provide some guarantees or protections is actually what you want. If you just run sort of a no-SQL system without any notion of transactions at all, or even like eventual consistency in other things when we talk about distributed systems, the amount of time your application programmer has got to spend if you have to reason about what is actually the correct state of the data that I'm actually trying to read is actually becomes problematic. And if you go read the Google Spanner paper from a few years ago, they had this opening paragraph that says, we thought we were all big on no-SQL and eventual consistency and no transaction support. But it turns out all our programmers were spending their time trying to reason about all the anomalies that can occur in their data and became very expensive from a labor cost. And so they found it's actually better to have a data system support, stronger transactional semantics, and have a small cabal of really smart people how to figure out how to make those transactions run really fast and provide that for everyone else from their organizational standpoint, that was a better decision. So Sierra Leis will actually maybe be too strong for most things, but some protections or some transaction support, I think, is important in your database system, OK? Any questions? OK, so next class, even more concurrency control, because I love it so much, right? Now I'm going to talk about a different class of concurrency control called timestamp ordering. Remember I said last week that there's basically two categories of concurrency control. There's the pessimistic and the optimistic, right? It's two-phase locking or timestamp ordering. These are the only two categories that you can have. And then after that, we'll discuss multi-versioning, which is sort of another way to organize your database and run transactions that actually will use either timestamp ordering or two-phase locking. So when a system says they do MVCC, they're still going to be doing either two-phase locking or timestamp ordering, right? So we'll go through that and understand how that works, OK? All right, guys, awesome. Thank you so much. See you on Monday.