 Okay, so two quick things. Some of you just started project one, so I'm going to view a finished project number one. The AutoLab thing is finally online after us fighting them. So thanks to Dana for getting that set up. So what will happen is everyone who's already, I'll change the URL where you can submit now to AutoLab. And so that'll be the grading thing that we had before. And so if you've already submitted over the weekend, and you got an email from me saying that here's the results of it, I need you to submit it again, right? So just because we can't take what we have on my own machine and easily import it into AutoLab. So I'll send an email, I'll post this on Piazza of what you need to do, right? It shouldn't take you that long, right? It should just be a login and load it, and that way we have all the scores in one place. So any questions about project number one? Again, it's not meant to be super hard, it's really just meant to be, you know, just get your feet wet in the system, go through the process of actually setting it up and getting it running on your local machine, okay? All right, the other interesting thing that happened this weekend was we got, our system got posted on the front page of Hacker News, right? So that was kind of cool. So people were talking about it and saying they were excited about the kind of work we were doing. So the only bad thing is that I read the comments and I saw this one from somebody in the course, right? I'm in Andy's class right now and his hygiene is terrible. He smells like he has rotten slices of rotten ham stuffed in his pants. I shall not descend in a row. So no, look, I wouldn't be serious here. I don't know who Rico Swabi 1994 is, but like it's a medical thing I have, I'm a bit uncomfortable about it, so I'm doing the best I can, all right? So if you have a problem with how I smell, just let me know and I'll see what I can do. All right, so for today's class, we're gonna focus on concurrency control. So now we're actually gonna start talking about, you know, before we talked about high level in memory databases, we talked about history databases. So now we're actually to go and discuss how do you actually build one of these components and what do you actually need to do? And in my opinion, I think concurrency control is one of the most interesting topics in database systems because this is how we're gonna allow, have multiple transactions and multiple queries running at the same time, and the databases can intermix them together and still make sure that everything turns out okay. So we're gonna start off with talking about different types of transaction models. So this is gonna go well beyond what is taught in the intro course and talk about what other types of transactions you can have. Then we'll have a overview of concurrency control. So I'll talk about two phase locking and timestamp ordering. And then we'll discuss the paper that you guys were assigned to read, this sort of long view of how to do concurrency control on a CPU with a thousand cores while we're calling it a many core system. So again, this is sort of set up now, the idea of this lecture is to show you sort of the challenges you have to face, some problems that can occur with concurrency control. And then on Thursday and next week, we'll now look at actually the modern implementations of these ideas in real systems. All right, so before we begin, we have to define what a transaction is. So everyone here should have some database background. So every single one of you should be able to give me a definition of a transaction in the context of a database. She's wearing a MongoDB shirt, so I'll skip her because they don't do transactions. That's okay. You're also wearing a Mongo shirt, all right, two of them, okay. Anybody here? So somebody who definitely took 615. So everyone should be able to give me a definition of transactions, what a transaction is. An atomic unit of change. So you said atomic unit of change. Sort of, yes, we're looking for something more high level though, right? You can do a compare and swap, that's atomic, but that's not, I wouldn't call that a transaction. Okay, so the definition I like to use for a transaction is that it is a high level sequence of actions that are executed on a shared database that's trying to perform some function for the application, right? So this doesn't necessarily have to mean it's to begin and commit, but it's some higher level thing that we want to achieve in our code, right? It could be something like transferring money from one account to another. That's the canonical example people always use for transactions because what you don't want to happen is you take money out of the first account, then the system crashes and you lose everything, right? You want to make sure that it's sort of what he was saying, that those actions occur atomically. So as part of the way to this, we're gonna say that transactions are the basic unit of change and a database maintenance system, and then we let no partial transactions are allowed. Again, that's the atomic comment that he said, right? So this is what we're trying to achieve in our system. So when we talk about now actions that are gonna occur inside of the database system, we can classify them in three different ways. So we can have unprotected actions, protected actions, and real actions. So an unprotected action is something that the database system is doing internally as part of its normal operation. And so we're saying that they're unprotected because the effects of these actions cannot be depended on for the correctness of the execution of a transaction. So let me give you an example. So let's say that I need to write out five pages out the disk. The disk controller can only guarantee that only one of those pages will be written atomically. You can't guarantee that all five or none of them are gonna be written, right? And so we say that it's not gonna, these actions are not gonna provide any of the asset guarantees that we want in our transactions with the exception of consistency. So what I mean by that is, again, if I'm trying to write out five pages, it's only gonna guarantee that it can write one of them atomically, because that's what the hardware does. But it'll guarantee that that right I do for that single page will be consistent or atomic in itself, right? So it's not like I'm gonna take my byte stream that I wanna write out to my page and then I get back half of it when I try to read it again, right? So these are, again, these are unprotected actions that are occurring in the system. We're gonna use these as building blocks to do more complex things. Yes. Does that mean that if you have multiple rights to the page, the order is not guaranteed? So this question is, if you have multiple rights to a page, the order is not guaranteed. So when you say multiple rights to a page, do you mean like in memory or on disk? Well, let's say it's in memory. So if it's in memory, then if it's in memory rights some pages, the order that they occur depends on some higher level scheduling thing. And there's not really to talk about here. I'm like talking about like something like, the disk page is probably the best way to think about this. I wanna write out five pages. The system can't guarantee that all five pages will be written atomically. It can only guarantee that one of them will because that's what the hardware provides you. Okay? All right, so the next thing we have are the protected actions. So again, this is what you normally think of when you think of transactions, right? These are like the SQL queries or the things that we're doing on behalf of the application, right? And so the key thing about these guys is that we wanna provide full asset guarantees for them and that the system will not externalize any of their results until it knows that all of the actions have completed. So what do I mean by externalize? Right, if you discuss serializable theory, serializable transactions or schedules, what does it mean to externalize a result and a database? Correct, yes. It means that the effects of one transaction, the modifications that a transaction made to a database are now viewable to anybody else, any other transaction when you commit. Right, again, when you have the serializable schedules, you cannot read the rights of transactions that have not committed yet, right? We'll relax that when we talk about different isolation levels and different types of current tool schemes, implications you can have. But for our purposes here, we'll assume we have this running with full serializability. And the last type of actions, what we'll call real actions, and these are the things that affect, I'll say the physical world or the real world, that are outside the scope of the database system and therefore it has no control over the properties of those actions at all. So the example that I always like to use is say you have a transaction that makes a bunch of modifications to the database and then halfway through it decides to send like an order confirmation email to the customer and saying like, yeah, we processed your order. But then it goes, executes more queries and it realizes that it can't actually commit and it has to abort and roll back all its changes. So it can't retract that email because that's sort of out, it's left out in the network and it can't go and pull it back, right? So the database system is gonna provide, the current serial schemes provide the guarantees, the asset guarantees for these protected actions for these queries and it's gonna use the unprotected actions to make this happen and then for these things here, we just can't do anything, okay? All right, so now, given this definition, we can talk about different types of transaction models. So we're gonna go through a bunch of different types and what I'll say is that what you learned in the intro class is flat transactions and then there's some more complicated models you can use but they're not as common as flat transactions. So again, I'll go through each of these one by one. So the flat transaction is what you'd expect when you call begin, right, to start a transaction and then you have a bunch of more, one or more queries followed by either a commit or a rollback or also the database system could end up aborting your transaction if there's a conflict, right? Which is not you invoking it but the database system says you can't commit and it'll go ahead and kill you. So you basically have two examples here, right? The first transaction was to do a read on A, follow the write on B, so you have the begin statement that sets up all the internal state you need for this transaction, right? And we'll talk about what that state is in a second. Does it read, does it write, and then it commits and then at this point here, the right, the modification to B is now visible to other transactions, right? So we're running the sterilizable isolation model, okay? And then for transaction two, again, we do the begin, the read on A, right on B and then it invokes a rollback. So therefore this change that was made to B is then blown away and then it reverts back to its original value. So again, when you take an introduction database course, when you go out in the real world and you program database applications, chances are you're going to be writing these flat transactions. But no one will ever call them flat transactions, you just refer to transactions. I can't prove this, but I would say probably 99% of the transactions that are executed in the real world follow this approach or this model. So what are the problems of flat transactions? Why do we actually need other models? So there's sort of three examples that I always like to bring up, the three issues. The first is that the applications not going to be able to rollback any sort of portion of a transaction, right? Under the flat transaction model, it's either all or nothing, right? There's no partial rollbacks to say like, I modified a bunch of things, but then the last one I modified, I didn't actually want to do that. I want to just roll that back and let all my other changes commit. The other thing we have is that all the transactions work that you do is the loss if the database system crashes or fails before that transaction actually can finish, right? So think of the example like I have to update 100 tuples, I get to the 99th one and then someone trips over the power cord or my system crashes, all of the work that I did is essentially lost and I have to come back and redo it, right? I'm not saying that's a bad thing necessarily because that's what the database system will protect, you know, is protecting for you, but from sort of an application standpoint, it's a lot of wasted work. And then the other issue we have is that each transaction can only occur at a single point in time and this will be more clear when we start looking at examples of the types of applications that you can actually support very easily with flat transactions. So the two examples we'll give about why you want something more complex than flat transactions, the two canonical examples are always these two here. The first is when you do multi-stage planning, when you want to make sure that you can do complex things in multiple stages and you want to make sure that all stages are as successful or none of them are. So the example would be something like, say I want to book a trip from Pittsburgh to Florence in Italy. So what I need to do is I want to be able to book a flight from Pittsburgh to JFK from JFK to Rome. And then when I get to Rome, I want to be able to book a train ticket from Rome up to Florence. And so what I want to happen is I want to be able to make sure that all those reservations can occur or I don't want to take the trip at all because you don't want to sort of book the flight from Pittsburgh to JFK and the JFK to Rome and then there's no train for whatever reason to get you up to Florence. You sort of be stranded there. So under the flat transaction model, it'd be difficult to do this because you sort of have this huge transaction where you're trying to do all of these different things. The other issue we're going to have now if you want to do bulk updates. So I sort of alluded to this before where I talked about how you can't do partial rollbacks. If I want to update 99 tuples or 100 tuples and I get to the 99th one and I want to undo that last one, I can't do that easily in the flat transaction model unless I actually program the logic in my application. So unless you have sort of code to be able to revert your own operations in a transaction, you can't do this very easily. So, what's it? With the mighty stage planning, why can't you just do the reservation seating and roll it back if one of the reservations doesn't work? The question is, in the multi-stage planning, why can't you just do the transactions one after another in serial order? And then if you get to the last one, you can't get the thing, like the final train ticket, roll that back and roll back all the other ones. Yes, you can, we'll discuss how you actually do that. But the point I want to make is you in the application have to write the code to do that rollback. We'll see what that looks like in a second. But you're adding more complexity now for things that the database system should be able to manage for you. It's a good point though, but we'll come to that. All right, so the example I gave before was dealing with a hundred tuples and that doesn't seem like that big of a deal. But what if you have now a billion tuples? Like say you're a large bank and you need to go through and update the interest for all your customers and it's gonna take you hours to compute this and do all these changes. So you don't want to happen is you don't want to get four hours into doing this transaction. And then because of what concurrential scheme you're using, if it's OCC or two phase locking, you may have a conflict and you have to abort your transaction. And that means you're gonna throw away all the work you just did. And so essentially this transaction is never actually gonna be able to finish. So the flat transaction approach is not good for this because it's essentially gonna be one giant transaction. And so the way you see banks and other companies get around this is they usually take the database down or do maintenance. A lot of times you see this in financial websites, they'll say we're doing maintenance at like three a.m. on a Sunday. And that's because they're doing bulk updates and other internal things to fix up the database because they don't want to have interference with other transactions that are running at the same time because then the bulk update will never actually finish. So given these limitations, now we can go through sort of different kind of models and see how we can overcome them. So the first type of transaction model that goes beyond what flat transactions could do are called transaction save points. And the basic idea here is that we're gonna have, we're gonna be able to explicitly declare that we wanna save the current state of our transaction in the database using and assign it some kind of handle we can then refer back to later on. And then as we process our transaction continue, we can decide whether we wanna roll back to that previous save point or we can release it and just sort of save some memory by giving up some of the internal metadata that we're maintaining for the save point. So I'll go through an example, right? So let's say that we have a transaction here and it wants to do a write on A, a write on B, follow by write on C. So what I'll do is I'll say that this arrow corresponds to essentially like the program counter sort of tells us where the database system is as it's executing the transaction. So when we get to the begin statement, we're essentially always gonna create a save point. So any kind of database systems are doing this now because the save point is essentially how you're gonna keep track of what the transaction is actually doing. So even if you haven't declared the save point yet, it doesn't know you're not gonna declare one in the future. So go ahead and create sort of a hidden one for you. So we'll say this is the new save point but it's not really visible yet but it's just in terms of the system. So then we do our write on A and then what'll happen is we'll store the rewrite set of the transaction inside our save point. So this is keeping track of what modifications we made because we need to know this when we actually run the conversion protocol to figure out whether we have conflicts or not. So then now we get to the save point command here and then what I'll do is I'm gonna specify that the name of the save point or the label of the save point is the number one. So that'll set this name to be now one and then this save point becomes immutable and then we go ahead and create the next one, the next placeholder where we'll store all our next changes. So then we do a write on B and that ends up in our new save point and now here it says we wanna do a rollback to one. So what'll happen is we'll blow away all the changes that are being queued up in the new save point and then we'll roll back to the state of the database to where it was at this state point here. So we're able to do a partial rollback. Then of course also we have to create a new save point because now we need again sort of a placeholder where we're gonna put our new changes here because once we declare a save point with a label as a handle, it's immutable. We can't make any more changes to it. Then we do our write and it goes in our same new save point and then we commit and then the database will only see the updates to A and C. It won't see the change to B because that got rolled back. All right, this is pretty clear, pretty easy. We'll look at a more complicated example. All right, so now we're gonna do a write on A, write on B, write on C, write on D and then we'll intermix them with save points and releases and rollbacks. All right, so again when we start off we always create our new save point and then when we do our first write that lands in that location. Then we'll go ahead and save it and we'll give it a label and then create a new one. Now we do our write on B, that goes in the new one. We save that one as well and we get a new one. Now we do a write on C and we save that one. All right, so now what we're gonna do here is now we're gonna do our release on two. So what's gonna happen here is release doesn't roll back any of the changes like rollback does. It just blows away the handle. So the change is still there. It's just now you can't go back and try to roll back to number two because that doesn't exist anymore because you released it. So I think this is, most database systems implement this as a purely semantic thing. So you still have a save point because you need a place like a data shocker to store all these changes. It's just now you're not able to roll back to it. So I don't think this actually saves a lot of memory to do this. It's just, it's part of this equal standard and you sort of have to support it. So you don't get any huge benefit by blowing away the save point because I don't think systems actually blow it away. It's just there but it doesn't want you to refer to it anymore. So now we do our write on D and that lands in the new save point. So now the question is here, it says we want to do a roll back to three. So the question is what should actually happen? So the way these save points are actually maintained internally, they're essentially like these little link together. So when we release two, this, because this guy depends on this, this also released save point three because they're sort of chained together internally. So you actually can't, this is actually an error and a roll back the entire transaction, abort it, right? This is just again, because when we release this one, it also ended up releasing this one as well. So you wouldn't be able to roll back this one because when you got this, anything that came up to it, up into the new save point also got wiped away too. And again, the changes are still there, it's just you can't roll back or release it. Is that clear? Okay. So save points are nice and all, but I'd say, again, most systems will support them. They're not as common though as nested transactions, the next model we're gonna look at. So for the save points, you can think of them as sort of like these sort of linear check points of the transaction as you go along. And I don't want to use the word checkpoint because that sort of means something else in a database system, but it's like a snapshot, right? Or save point. And so it's a linear sequence and you can only roll back them individually. But with nested transactions, we can be able to generate is a hierarchy of transactions and the outcome of a child transaction in this hierarchy will depend on the outcome of its parent. And it recursively goes up the tree or the hierarchy until you get to the root. So what can happen is a nested transaction or an inner transaction can decide whether it should commit or abort on its own. But then if it wants to commit, it doesn't actually really commit until its parent commits. So I'll go through an example. So here we're doing it right on A, right on B, right on C, right on D. And so the first thing I'll point out is the scope of the root transaction is the topmost begin and the bottommost commit, right? Everything else is just occurring in it. So anything else that occurs inside of this depends on what the outcome of this root transaction will be. So our program counter starts off and we do our right on A and which again, assuming here we're maintaining internal save points we're not labeling them so we don't have to show them. But then we get to this begin statement here. So essentially what will happen is the database system will know I'm already running a transaction. So let me make a sub-transaction. In this case I'll say it's 1.1. And then all of the operations now get sucked over and now are in the sub-transaction and our program counter moves over there. So then we do it right on B, then we hit another begin and this creates a sub-subsection, so 1.1.1. And the same thing, all of the operations now slide over to that guy. So then we get down now to the commit and then the data system will know that the sub-transaction has finished but because it's a sub-transaction it doesn't know whether it's actually truly finished until it knows whether it's parent has finished. So then we come back here and do the right on D in the parent transaction but then we hit the rollback command. So not only will that roll back all the changes that are to the B and D here but it's also gonna roll back the change to C here because this depends on this. So then now we come back and resume control over here and then we get the final commit. So in the end, the only thing that ever got written out to the database was the change to A. All the other ones got rolled back because this guy rolled back. But again, these are probably more common, probably the second most common type of transactions that you can have. And so the way the database system actually sort of maintain this you'll have sort of a linked list of these transaction states and you just need to know that one guy depends on another one or you have a tree structure for this. So in the case of Nestor transaction and save points all the operations that are occurring are all in the context of this single global route transaction. But it may not, that doesn't really help us for the bulk update example or the multi-stage example because we're still relying on this one global transaction for the entire scope of all the operations that we're doing. So another more complex model we can have are called transaction chains. And this is sort of the example that she brought up earlier. So the idea here is that we have multiple transactions that are independent of each other, execute one after another. And what will happen is the commit statement of one transaction will be executed atomically with the begin statement of the next transaction in the chain. The idea is here that like you can just immediately switch over and start executing the next transaction but you release all the locks that you were holding for the first transaction when you move to the next one. In the case of the next transaction if I call begin and commit and then call another begin and commit the route transaction is still gonna hold all the locks that all its sub transactions acquired as it was running. All right, so the difference again, the commit when we commit here on chain transactions are a lot of free locks and we're not gonna be able to roll back any previous transactions in our chain unlike we could with save points and with the nested transactions. These things are operating independently of each other. So a really simple example, we have a chain transactions of three chains transactions right on A, right on B, right on C. So this first guy will start off, does this right and then when he commits, this occurs atomically with the next transaction that then gets fired off and does its commit. But when the next guy starts, this guy rolls back but this doesn't change any of the modifications of these other guys here. So let me take a guess why you wanna do this. What does this buy us? So it's, sorry, go ahead. It saves time just in case you wanna just do the last transaction and you don't have to start from the beginning. So common is it saves time because if you need to just roll back to this guy, you don't have to roll back all the other ones. That's correct, yes. One of the deficiencies though is let's say we get farther down the chain and we recognize that we actually wanna roll back number two. Under this model, we can't do that. But we'll talk about how to fix that in the next few slides. But the basic idea here is that when we talk about snapshot isolation next class, the benefit of doing these changes is actually because the commit and begin are atomic, you know that you're gonna have a consistent snapshot of the database going from this to this that nobody else is gonna be able to cut in between and make a modification. Now, other transactions can come back as you get farther down the chain. They're allowed to come back and read stuff that you've already written because these guys again are independent in some ways because they've already committed and those changes are now are visible. But it's just the transition from one transaction to the next become the atomic, which sort of saves you the problem of trying people to wanna cut in and make changes. Okay, so again, the change transactions helps up our multi-stage problem but it doesn't help our bulk update issue, right? And it seems like the change transactions are actually what we want. But the point that she brought up earlier about how if one of the transaction fails and you wanna roll everyone back, under the change transaction model, you can't do that automatically. You have to write the code in your application to do this. Now, you may say, well, what's the big deal, right? Why can't I just do that? Well, the issue is now, if you wanna be able to roll back automatically in your application when you have these change transactions, you have to write the code to do this and that means you also have to maintain state in your application for what the current progress is or the status is for your transaction chain, right? So now you have state of what the transactions are doing actually inside of the system, the database system, but then you also have it now in your application and that's usually a bad idea because you're gonna have problems keeping those two things in sync. So we need a way to have the database system be able to automatically reverse the changes of transactions in our chain when somebody down the line actually fails, right? Now to do this we're gonna have a technique called compensating transactions. The idea here is that this is a special type of transaction that's gonna be able to reverse the effects of another transaction that's already committed. So this is sort of a higher level concept of transactions. This is not something you can write in SQL, like there's no compensating transaction command in SQL as far as I know. This is usually done in the sort of application framework that'll provide the ability to write these compensating transactions. So the key thing about how compensating transactions need to work though is that when it reverses the change of this already committed transaction, that reversal has to be done at a logical level rather than a physical level. Can everybody take a guess what I mean by this? All right, so what would be done at the logical level? What's an example of a logical thing? A logical operation in the database system. A SQL query, right? The physical change would be like the actual low level modifications that you made to the attributes in a tuple. So let's look, really a simple example. So say that I have a counter on S3 for the course and every single time somebody enrolls in the course, I increment that counter by one, right? Now, if the person gets kicked out of the course, let's say I enroll and I get kicked out for whatever reason, possibly because of hygiene, but I need to decrement that counter by one. So if you do this at a logical level, the compensating transaction that wants to reverse the addition I did when I enrolled, will just take the current value of the counter and subtract one by it. The physical reversal would be what was the value of the tuple that I modified when I made the change the first time? So let's say that when I enrolled in the course, there was 10 people and so I added one to it, now the current counter is 11. So then I get kicked out of the course a few weeks later, but in that time, there's been a bunch of other people that have enrolled in the course. So if I do this at the physical level, what'll happen is I'll say, oh, well, when I did it before, the old value was 10, let me put 10 back in there. But if 10 people have already enrolled the course after I join, then the correct value should be 20. So this is what I mean by doing this at a logical level because it's gonna be too hard for us to do this at the physical level because by the time we get to reversing our transaction chain, if we're doing that one billion tuple update, some other transactions that may have already modified our tuples, and we're not gonna be able to revert back exactly to the value that we had before. Don't you already do this to ShadowPagin kind of like make the changes there and then commit it? So her question is, don't you already do something like compensating transactions with ShadowPagin? Yeah, let's make the changes somewhere that's before you're rated onto the disk. Okay, Stephen, in ShadowPagin, you make changes somewhere else on disk before you sort of, as in the shadow copy, before it becomes the master. It's not quite the same thing, right? So this is like, maybe let me go through the example and it might be more obvious what I mean by this. I can answer your question if it's not clear. Actually, yeah, hold up, let me come back to this. We'll talk about Saga Transactions in a second. Let's talk about Saga Transactions and you'll see what the compensating ones look like. All right, so the final model we're gonna look at are called Saga Transactions. So this is an older model that came out in late 1980s by Hector Garcia Molina, the same guy that wrote the main memory survey paper you guys read last class. And so the basic idea with Saga Transactions is it can allow us to have a transaction chain from T1 to Tn, along with their corresponding compensating transactions, C1 to Cn minus one, right? And it's n minus one because if we get to the last transaction in our chain and we update it, we know we're never gonna need to roll it back. So we only need up until the second or last one to do a reversal. And so the database system will be able to guarantee that one of the two things occur. Either all the transactions will commit in the order from T1 to Tn, or the transactions will commit in the order from T1 to Tj, followed by compensating transactions Cj to C1, where j is less than n. So again, the idea here is we'll get from one to j, something happens and we need to roll back our entire chain. So then we go back in reverse order and execute all our compensating transactions to put the database back into the state we think it should be in from where we were when the chain started. So I think, again, showing an example of this will make this more obvious what's going on. So our transaction chain has three transactions, T1, T2, T3. And what they're each gonna do is they're gonna take, they're gonna do an increment or counter by one. So A, B, and C. So we do our chain where this guy commits, we immediately spawn transaction two, that runs, it commits, starts three. And then for some reason something happens and this transaction has to abort. So now we need to go execute our sequence of compensating transactions. So this is where we now we do our reversal, we don't care what the physical value is with the physical value of the object we're modifying in our database, what it was from before. We just know that we need to take that current value and subtract it by one. And that will reverse the changes that occurred from this. So then we invoke this and then this invokes this. So this sort of a chain in the other direction. So this is why it's not exactly shadow paging because you're modifying, whether shadow paging or not is independent of how you actually execute these compensating transactions. You're just making a change at a logical level to reverse operation from another transaction. So clearly now also too, should be obvious that there's certain types of things that are not gonna be easy for us to revert, right? If there's like any kind of aggregation or complex operation that is combining like multiple tools from a join, it's gonna be hard to reverse that because it's hard to map sort of an aggregation back to its original tubules. So compensating transactions and these typically are used for sort of community of operations like these, like incrementing, decrementing counters and things like that. So I'll say that no database system actually supports these Saga transactions. So again, the idea is not new. It's been like 30 years now. It's not a new idea what is, but there are some application frameworks that actually provide these constructs for you. So I know for like, in Microsoft and their .NET framework, they have a more expensive version of it. They have support for writing change transactions and compensating transactions to do Saga's. You may be very often too, when you got in the real world, you may find yourself basically re-implementing the same idea here, right? But it's good to know what the, you know where this idea derived from. So any questions about Saga's, yes? So what's the benefit of this compared to to sort of use a big transaction? So this question is, what's the benefit of using sort of transaction chains and the Saga transactions versus one giant transaction? So if you have one giant transaction updating a billion tuples, you have to hold all the locks and maintain the read-write set for the entire transaction. You know, if it's only going to take a few seconds to run, even the extra few seconds is a long time. But if it's going to take hours to run, you're basically holding locks to preventing any other transaction from running at the same time. Or if you're running on an optimistic or currency-cured approach, you know, by the time you get from to the end of the chain, clearly some other transaction is going to come and modify the database and then you have to roll things back. If you have one giant transaction, right? Because what'll happen is in like OCC, you, when you go to commit, you check to see whether anybody has modified anything since, from when you started. And if you have a really long transaction, the probability of that is pretty high. So again, this is why I was trying to say, like the, a lot of like sort of legacy systems, financial websites and other things like that, financial firms, they just put the database in maintenance mode in order to do these bulk updates because they don't have a nice way to do these sort of compensating transaction approach. All right, holding locks for hours is a bad idea. So, so any other questions? Yes? This only applies to like single transactions. If you have transaction which involves the other transactions like right to A plus B and the value of the years changed after the transaction, that would be very hard. Yeah, so he makes a really good point. So his statement was, in here I'm showing sort of sort of these simple one operation transaction, one action transaction, so they update one thing and commit. But what to say if you have now a, another transaction that took the value that read the value that both A and B wrote and did something based on that, right? And so the problem with this is like, you're not gonna guarantee serializable order because I can, I can read things written by this and I can read things that have not been written by another transaction and that would violate the serializable order for the entire scope of the system that's still okay in the context of a stock of transactions because each of these guys are independent. So yeah, there's this weird thing where like, this is not exactly serializable to do this. If you look at the macro level, but for each individual transaction, we can guarantee that they're done in serializable order. Yes. Back to the, to the safe points. Yes. In all the examples you showed, it looks like you were just manually rolling it back in the course of the transaction, but is there any way to use the safe points to have the system, if there's conflict or something like that, yeah. So his question is, in my example here, I'm invoking rollback manually. Is there a way in a database system to recognize that I have a conflict and for some operation that occurred after my safe point and then you roll back to it and then retry those queries again? Absolutely, yes. Some of them actually do that, yes. I don't think any open source guys do this. You'll see this bit more when we talk about, we'll talk a little bit about this when we do store procedures. One of the issues is that, in this case here, we don't actually know what the program logic is, like that's on the application side, that's actually invoking these things. So if you roll back to a safe point, you may come back and get now a different value when you would do a read and that could violate the serializable order. Like say if I write to D, say this is a read to D instead of write to D. If somebody else, some other transaction wrote to D and I then try to read it, that could violate my serializable order. So therefore I would have to roll back. But in this case here, if it's a blind write, then that might be okay. Yeah, so some systems actually can do this, but it's hard. Yes. Can you explain why, when you wrote this to, why the safe point is here and why the write to D? So the question is when I did release to here, why did implicitly safe point three get released? The issue is because they're sort of chained together, right? When you release one of these guys, it releases everything up until the current new safe point. That's just how it's implemented. We can pop up in postgres after class and I'll show you how it works. All right, yes. So between the two chain transactions, is that allowed to do other operations? So for transaction change, your question is, from when I go from one to two, is any other transaction allowed to come in here and do something to the database? Yes, but like, because again, these are independent. The idea is that like you can, you don't inherit the locks, right? Because if it was a nested transaction, then you get all the locks from your other transaction or from the inner transaction. It's like you release all the locks and switch over here. Somebody else could come in and try to modify things. Actually, no, take that back. There'll be in the serial order of the system, of how it's competing the schedule, there'll be no other transaction that can occur before this. It can occur after this, right? But actually, how does that enforce? Let me think about this. Yeah, actually, I don't know the answers and I don't want to say the wrong thing, right? Because if nobody else, if no other transaction come in here, then this is just a one giant transaction, right? So that's not real, right? And it could be also the case where like, there's some other transaction that's modifying things before, you know, this thing even gets started and we have to allow for that too. Isn't the whole concept this to have breakdown of big transactions into small ones so that you can, you don't have to work all the locks at the same time? Correct, yes, because when you commit, you release all the locks, but then this thing is just done atomically and now I'm thinking like, what does that actually mean? And I don't want to say the wrong thing, so let me come back to that, all right? But again, transaction chains aren't that common, the nested transactions are more common and the cyber transactions are sort of the overarching thing you can do to solve all the problems we talked about. All right, so this is actually a very good conversation. Excellent, okay, any other questions? Okay, awesome. So, Concertitial. All right, so the paper you guys read again was this evaluation of these Concertitial protocols running on these sort of future 1,000 core CPUs. And so again, if you, from the intro class, you would define the Concertitial scheme as the protocol the data system is gonna use internally to manage how it's gonna interleave the operations of transactions running on a shared database, right? And the idea is that we wanna be able to provide the atomicity and isolation guarantees for our transactions and the isolation guarantee will provide this illusion that each transaction thinks that it's running with exclusive access to the database even though it's actually interleaving the operations, right? And so the goal is for a Concertitial protocol if we wanna have, you know, if we're trying to achieve serializable isolation is that we want the end state of the database to be equivalent to one where we executed the transactions in serial order, meaning one after another, even though we're actually interleaving the operations. So for our purposes here, for this lecture, we'll only focus on serializable isolation. We'll talk about the other isolation models in next class, but the serializable isolation is sort of the gold standard of what you want in a Concertitial protocol because it allows you to not worry about, you know, whether you're reading dirty data and other anomalies and just assume that you can run as fast as you can in serial order, even though it's actually interleaving. So there's another type of consistency that you can have in a system, external consistency, and that's where if you submit transaction one followed by two, the system will guarantee that one excuse for us followed by two. I know a system other than Google Spanner actually does this and so this allows us to have more opportunities to come up with better schedules for interleaving these operations because we don't care about what actually serial order you want. So in order to actually have, you know, interleaving these transactions and make sure we don't have any incorrect updates or conflicts, we need to maintain internal state for three different things. So the first thing you need to do is maintain the undue log entries for the original values of tuples or attributes of these tuples that get modified by transactions. So for these, we're gonna store these in an memory data structure and I'll be vague on what that data structure actually is but you'll see this when we talk about the actual implementations next time. But the key thing about the undue information in a memory database is that they can be entirely dropped when the transaction commits. So remember that in a disk-oriented system, the undue log entries have to be written out with the redo entries before the transaction commits because if you ever have dirty pages that you need to revert the changes to for transactions that it didn't actually commit, you need the undue information. But in memory database, we don't actually flush the pages of memory out the disk so they can never be corrupted or dirty. So the only thing we actually need to store on disk is just the redo log, right? So these are the new values of our tuples that we modified by our transaction. And again, because we have to flush these out the disk and that way if there's ever a crash, we can revert back, we can restore those changes. And again, we don't have to do this because when we take checkpoints, we're only gonna be writing out things that have committed. And this will be enough for us to be able to go back and get the correct value. So the last thing we need to maintain is also the read-write set. So these are sort of tuples, identifiers to the tuples that the transaction has read or written. And so what these data structures actually are will depend on what concurrency protocol you're using. So the two types of schemes that we're gonna care about are two-phase locking and timestamp ordering. And actually these are the only two types of concurrency protocols you can have, right? If anybody comes to you and says, hey, I have this other protocol that actually doesn't fit in any of these two categories, they don't know what they're talking about because as far as I know, this is the only two categories you can have. And this is not something that I invented, right? This goes back to like the 1970s, early 1980s when like Jim Gray and Phil Bernstein were trying to figure out what these transactions actually meant, right? So on our two-phase locking, it's a pessimistic approach where you're gonna assume that transactions are a conflict. So therefore you require them to acquire locks for any object they want to read or write before they're allowed to do that operation. And compare this with a timestamp ordering scheme that's optimistic where it assumes that the conflict's gonna be rare. So you don't require transaction to acquire locks. You allow them to proceed and do whatever it is that they wanna do. It's as you check either as they go along or when they actually try to commit to see whether a conflict actually occurred. So I'll go through examples for both of these. So for two-phase locking, right? Assume we have a transaction here, it wants to do a read on A followed by a write on B. So again, under the protocol, we say that the transaction has to acquire locks on those objects before it's allowed to do those changes. So we have to get a lock on A and a lock on B. Now I'm being vague here on what A and B actually are. It could be a tuple, it could be a page, it could be a table, it could be a database, it could be a node, it doesn't matter what actually is. The protocol is always the same. So the first part is called the growing phase because this is where we acquire all the locks as we go along. And then as soon as we release one lock, we enter the shrinking phase, right? And so when you enter the shrinking phase, you're not allowed to acquire any new locks. Once you get back one, you can only keep giving back more of them, you can't acquire new ones. And this is how you're gonna ensure that you can generate schedules that are serializable. So now for those of you that took the intro class, can anyone tell me what the difference between two-phase locking, the normal two-phase locking and the strict two-phase locking? Protocols? In strict 2PL, you only release it at complete. Correct, she said in strict 2PL, you only release the locks at the very end. So I'm sort of showing that here, but I could, when I release the lock on A, I could do another write on B here because I still hold the lock forward. But under strict two-phase locking, it's sort of when you commit, that's only when you give away all your locks. And the reason why you wanna do strict two-phase locking because you avoid cascading abortants. All right, so let's say that we have another transaction, T2 running at the same time, and it wants to do a write on B and a write on A. So let's say that we have a two-course database system and that each transaction is gonna run on a thread, running on a separate course, so they're gonna run exactly at the same time. So again, we'll use these errors to indicate like a program account for them. So when they first start off, they both wanna acquire a lock. So this one gets a lock on A and this one wants to get a lock on B. For this, we'll assume that there's no other transaction running at the same time, so this is allowed to proceed. And then because they hold the lock, so that they're corresponding objects, they can each then do the read of the write that they wanna do on it. But now we get to this guy here and this first one wants to get a lock on B, but that's being held by transaction two and this guy wants to get a lock on A, but that's being held by this guy here, so they both have to stall, right? And what is this obviously called? Deadlock, right? So in this case here, the database doesn't have to do something to make sure that it breaks this deadlock because otherwise it'd be stalled indefinitely. So how the database system or the currency protocol deals with deadlocks depends on its implementation. And there's basically two types. So the first you can do deadlock detection and this is where you're gonna maintain internal data structures to keep track of all the transactions that, one transaction is waiting to acquire the locks from. And then you'll have a separate thread that runs in the background periodically, looks around to see in these waits for a grasp that are being maintained for these lock dependencies, checks to see whether there's a cycle, meaning there's a deadlock. And then if it finds one, it uses some kind of heuristic to go ahead and kill one of them to break it up and allow them to proceed and keep on running. So the heuristic you can use depends on what you actually wanna do. You can do things like kill the transaction that has running for the shortest amount of time, kill the transaction that holds the most locks, the fewest locks. No one of these approaches is better than another and all the different commercial database systems that do this support different things. The other approach is to do deadlock prevention and this is where you're gonna check to see when a transaction tries to acquire a lock. If that lock is being held by somebody else, then you know that there could possibly be a deadlock. So you're not gonna allow it to do it just right away and when wait, you'll choose some other strategy. So if the lock that the transaction is trying to acquire is not available, you can either wait for it, commit suicide, or my favorite one, the most gangster one, is you can just kill the other transaction and steal their locks. And again, the same thing, there's no one approach is actually better than another. It really depends on what the application is trying to do. So now we get to timestamp ordering. And so for this one, we don't have locks and we're just gonna let the transaction just go along and we'll just check as we proceed to see whether we're violating the serializable order. So for this, what'll happen is we're gonna assign the transaction some kind of timestamp or identifier when it first enters the system. And this could be either the wall clock time, it could be either a logical counter or some kind of combination of the two. But for our example, let's say it's a simple counter that we're always incrementing one for every single transaction. So now inside our database as well, we're also gonna keep track of the read and write timestamps of all the different objects. So the read timestamp would be the greatest timestamp of the last transaction that read the object and the write timestamp would be the greatest timestamp of the last transaction that wrote to it. So now as we execute our transaction, when we do a read on A, we will check to see whether we're reading the right version or whether we're the newest or the transaction with the highest timestamp would do this read and if so, then we can update it. And this is again, this is acting as a sort of a signal to other transactions that may wanna write to this object that there has been something that has come along and read this at a certain time. Now we do a write and the same thing, you check to see whether a write timestamp is greater than the current timestamp. But also we wanna check to see whether a write timestamp is greater than the read timestamp because we don't wanna write to something in the past that an object in the future or transaction in the future has read to. So we're allowed to go ahead and update that. And then let's say we get to this middle part here where the transaction stalls for whatever reason. Could because of the network, could because it's computing the 1,000th digit of pi, it doesn't matter. But in between this time, some other transaction comes along and modifies object A and its timestamp is greater than ours because it started after we did. So when we wake up and we wanna now do our write on A, we would see that we can't do the write because some other object, some of the transaction has modified this object and its timestamp is greater than ours. Because then this would violate the serial order of our transactions. So the data system will prevent you from doing this. So I'm reading super high level here about like how you actually do this. We'll discuss in the, how you actually implement this in a modern system in next class. But this is the basic approach which you, how you would do this. So this is what we teach in sort of the intro class. So the protocol that I showed you, it will refer to as the basic timestamp ordering. With the exception to our system, we think we're the only ones that actually does this. And when we talk about MPCC in more detail, we'll see why this actually, doing this approach is actually a good idea. So now another variant is optimistic occurrence control. And the difference here is that instead of having this sort of global database where you check all the read write timestamps as you go along, what'll happen is you'll stir all the changes that a transaction makes in a private workspace. And then only when you go to commit, do you check to see whether there was a conflict, right? When the basic timestamp ordering, I check as I go along. And the last one is multi-version concurrency control. So when I originally wrote the paper that you guys read, you know, with MPCC, we sort of picked the sort of, but I consider the standard way to actually implement MPCC. And it's actually not saying there's one standard MPCC protocol is actually not correct. I would say, you know, the problem with some of this language in concurrency control is that there's, things get reused and it's kind of hard to understand what actually is what, right? So OCC or optimistic concurrency control is actually a specific protocol or algorithm. But all the timestamp ordering protocols are considered optimistic as well, right? But basic TO is explicitly different than OCC. So the same thing for MPCC. MPCC is sort of a class of protocols you can have that's doing multi-version internally for the system to maintain different versions of objects. But there's also the implementation that I described in the paper that's one specific implementation of MPCC. So our purposes, we'll just assume that there's only one and only one. But then when we discuss next week, MPCC is a class of protocols, you'll see actually there's different variants of it. And actually you can use two-phase locking with MPCC even though we don't do that here. So the basic idea is what we're gonna happen is instead of overwriting the current version of an object directly or putting in the private workspace, we'll always create a new version. And we'll sort of update this version counter as we go along and we'll use timestamps to figure out what correct version should be that you're allowed to see. All right, so the two classes of protocols we have are the two-phase locking and the timestamp ordering. And these are the acronyms or abbreviations we're using in the paper to describe the different implementations of them. So when the first approach for the two-phase locking, these are used in these systems here. The two-phase locking was the first concurrential protocol that was developed by IBM for System R back in the 1970s. And for the most part, a lot of the systems in the 1980s ended up using this, right? The timestamp ordering guys are more common now. In particular, the MVCC implementation, right? So this is now used in Postgres and Oracle. These systems came out of the 80s, but it's also used in most database systems that's been built in the last five or 10 years, right? As we'll see next week. And this is what we use. We use MVCC in our own system. All right, so for the paper you guys read, we were looking at taking these protocols and running this on a 1,000-core chip simulator to see when you actually really try to scale these things up. What are the bottlenecks? And our hypothesis was that there actually really is no difference. If you implement the state-of-the-art implementations for all these different protocols, the rate really is no difference up to 32 or 64 cores. But when you go beyond that, beyond what the CPUs we can have now, then you actually start to see how these things differentiate. So to do this, we built our own testbed system called DBX1000. And this system was designed to be an in-memory DBMS with a pluggable lock manager, a pluggable current control manager. The idea is that we can have a single uniform platform that we can drop in and out the different protocols without having to rewrite all different parts of the system. And so for this, it's sort of a bare-bone system that only focuses on current control. So it doesn't have, doesn't support running SQL queries, doesn't support connecting to the network. It doesn't do logging and doesn't have concurrent indexes. And the reason why we did this is because in order to run this in this graphite CPU simulator, we had to have this thing be as fast as possible because the chip simulator was 10,000 X slower than the real wall clock time. So if you had, say, an experiment, or if you had something that the script would have run for a minute on your laptop, we then run for 10,000 minutes on the simulator. So we're talking days. So he spent a lot of time working on the system actually, some of the core parts of the system actually written in an assembly because that was the fastest way to actually implement things. So the other thing to point out, too, also is the chip simulator is a, it's actually modeling a tile-based CPU architecture. So if you're familiar with sort of the Numa stuff from Intel, we can have sort of two sockets or four sockets and there's a QPI bus that allows you to talk in between them. So this tile-based architecture is different from that. So you have sort of a single CPU socket and you have this two-dimensional grid of all these individual cores. And there's a mesh network that allows you to communicate with your neighbors. So directly in your neighbors you can send a message to, but in order to send a message to a core that's far away, you sort of have to be sort of carried along the internal network of the chip. And so this is actually the architecture that some of the newer CPUs that Intel is putting out now actually have. So I don't mean like the KB Lake stuff, the new thing that's coming up this month. I mean like the Intel Xeon or the Knights Landing, all of those things are tile-based architectures. They're essentially, I think the Knights Landing is 64 Pentium IV cores, right? Pentium IV from like the 1990s, right? And so they follow this tile-based architecture. And I'm not a computer architecture person, but when we talk to them they say, yeah, you have to have a tile-based architecture when you want to go to really high core accounts because the NUMA model doesn't actually scale. So this is not a NUMA system, this is a NUCA system, so non-uniform cache-vending system. All right, so, for our target workload, we're using YCSB, so YCSB was developed by Yahoo Research in 2009, 2008, and this has sort of become the sort of standard key value store benchmark that people use to test OATP systems, right? It's actually really simple. It's like you have this single table and every single table or every single tuple has a single primary key and then 10 attributes. And then most of your transactions are doing read and write to a single tuple. So for our purposes, though, updating a single tuple per transaction is actually not that interesting and it's not entirely realistic for real-world applications. So we modified the workload so that every transaction will read and write to 16 tuples, and that'll increase the amount of contention you can have in the system. We're also gonna vary the amount of skew you can have in the transaction access patterns so we'll be able to find that there's a hot spot in the database and that most of the transactions are trying to read and write to that hot spot. And then for all of these, we're gonna run in the serializable isolation level. So for the first experiment we did, we're gonna run all the protocols up to 1,000 cores with a read-only workload that's completely unskewed because this is sort of the best case scenario you could ever have for a concurrency protocol because there's no complex transactions that never have to abort. So this is just really measuring what's the sort of internal overhead of maintaining the state you need to maintain to execute transactions inside the system. So I'll also say too, as you scale up the number of cores, that corresponds to the number of worker threads that are in the system. So for each core, you're gonna have one concurrent transaction running on that core. So if you have 1,000 cores, you could have 1,000 concurrent transactions running at the same time. So the first thing to point out is that you see that the deadlock detection, the no-weight two-phase locking algorithms actually scale the best. This is sort of expected because there's never gonna be a deadlock because you're acquiring shared locks on all your objects so there's no conflicts. And the internal state you have to maintain is actually really cheap. In this case, you're never waiting to acquire a lock from another transaction so you don't have to update the weights for a graph. And actually, you can see the no-weight one is actually even slightly better than the deadlock detection one because again, there's no internal state you maintain because you would always check to see whether the lock you need is available. And if so, then you go ahead and acquire it. The second thing you see is here is this little knee here for the NVCC and the weight and die protocols. So what we attribute this to is the bottleneck of allocating timestamps, right? Because again, the timestamp has to be some kind of global counter that's unique for every single transaction. So now when you get up to 800 cores and you're trying to do nine million timestamps a second, that sort of single counter becomes the bottleneck. And the last one we see here is OCC actually does the worst. And this is because any single time you read and write to a tuple, you have to copy a memory and put it into your private workspace. So this is the overhead of copying data. All right, so now we can look at workloads where actually there's writes. So for this one, what we're doing is a write-intensive workload, which I think is 50% reads, 50% writes. And we're gonna run this on medium contention, meaning there's a 40% of the transactions we're trying to access 20% of the database. So the first thing you see is that the deadlock detection algorithm actually now does the worst, right? And this is because there's always conflicts from transactions, there's all these deadlocks, so they're all trying to acquire the same locks, and we're killing, we're avoiding transactions over and over again, and that's why performance tanks. We also see now the wait and die and no way deadlock detection algorithms or two-phase locking algorithms actually do the best here. And this is because again the, because we're an in-memory system and we're running everything to store procedures, the cost of aborting a transaction is super cheap, right? Because to roll back a transaction is just reversing these changes in memory and then we can come back and merely fire off that same transaction again. So although we're getting the best performance, as we'll see when we look at the breakdown graphs, we're actually aborting a lot of transactions here. Just enough of them are getting through and we're actually getting useful work done. And then for here we see the timestamp algorithm sort of in the middle here and then we think that's sort of slopping off here because you're, this is the overhead of copying things to make sure that you have repeatable reads. All right, now on our last one, this is sort of like the main graph that you should remember from this paper is that we're gonna run this the same bright attempt of workload but now with a higher contention where I think it's sort of 75 to 80% of the transactions are all trying to access the same 20% of the database. So the first thing you see is that weight, no weight does really, really well up until about this point here and then it falls in craters and it gets the same performance as everyone else. And again, it's getting better performance because there's actually no complexity in how the protocol actually works. I check to see whether I inquire the lock. If not, then I kill myself, roll back and come back right away. And so I can abort very quickly and roll back and restart. But at some point it comes to be this is the only thing that the system actually can do. The other thing to point out here is that for OCC, remember that it was the worst when we were looking at a small number of cores. But it's the one that actually ends up being the best, right? When you get to a thousand cores. And the reason is because the way the validation protocol works is that it'll guarantee that one transaction will always be able to finish. So let's say I have a thousand transactions all trying to finish at the same time, right? One of them is always gonna succeed. The 99 other ones are gonna, 999 other ones are always gonna fail. So what you see here is that you actually get basically the same performance on a thousand cores as you would with one core, right? So this is sort of like the key thing, the main takeaway we had from these experiments is that there's sort of this area in the middle here that we haven't really quite figured out how to actually do, get better performance with current control. Cause right here, at a thousand everything fails. But over here, things are actually doing okay. Cause this is like the 32 to 64 core range. This is sort of what we're at now today. This is what our current systems can run on. So it's sort of this again, this middle part here is sort of the research area that people are looking into because we don't really know how to go beyond this point without hitting to the bottom over there. So again, we can do a better understanding of where all the time is being spent. So this is a breakdown of the system. We can measure how much time is actually spending in different components. So the red bar corresponds to useful work. So this is actually updating the table or the tuples. The big yellow bars are the time you're spending aborting transactions. And as you can see, this is the dominating factor with the exception of the MCC for all these other protocols, right? Cause basically in the case like no weight, for example, I try to quite a lock. I can't abort and roll back and come back again. So that's where you're basically spending all your cycles there. So again, the main takeaway here is that there needs to be something else we need to figure out how to do to reduce the number of conflicts and reduce the number of abortions we have to do. Cause this is basically, the yellow part is just entirely wasted work. Okay. All right, so what are the bottom next? So there's three categories of things and I sort of labeled here for each protocol, which ones suffer from this. So I'll go through each of these one by one. Lock thrashing, types of allocation and memory allocation. So for lock thrashing, the idea of lock thrashing is that as every transaction has to, when a transaction has to wait to acquire a lock, being held by another transaction, it sort of gets backed up into this convoy where the longer that I have to wait for somebody else causes anybody else that's waiting for me to have to wait even longer too. And at some point, you get to this tipping point where it's just all the transactions are doing nothing but waiting and you just don't get any useful work done. So actually there's a way to measure this phenomenon by just removing deadlock detection and prevention entirely. And what you'll do is you'll have transactions acquire locks in primary key order. Cause that way you can never have a deadlock. So if I need lock one, two and three, I'll acquire them in that order one, two, three. Cause if I hold lock one and two and I'm trying to acquire three, I know that nobody else would be waiting for one and would hold three. So you have to acquire that in that order. It's obviously not something you can do in the real world, but allows us to avoid this overhead of actually the protocol and just see whether we can identify this lock thrashing issue. And so in this case here, we have in this graph, we're showing the black line and the red line correspond to different skew levels. So the theta eight would be the high contention, the theta 0.6 would be the medium contention, and then zero would be without any contention at all. So you see this dip here where it sort of plateaus and then it drops down. So if we had more cores, you would see also this too with the medium contention. So this is the effect of lock thrashing. At some point, you just wait too long and everybody else is waiting for you. And then the whole system just keeps waiting and waiting and waiting. So what's really nice about this experiment is actually when you actually look in the textbook and look up lock thrashing, you see graphs, it's sort of theoretical graphs that look exactly like what we see in our system when we measured it. Like as you have this convoy, from everybody trying to wait for each other and then everything just falls apart. For time step allocation, the issue we have is again, we need to be able to assign these time stamps to transactions that are globally unique and correspond to their serial order. So in a bunch of different ways you can actually implement this. So the worst approach is to use a mutex. And we'll cover this more as we talk about locking, latching and other things. You should know, it's never using mutex. I don't know if we do this yet, but we should have a script in our system to make sure that nobody actually writes any code using mutex. So it's always a bad idea. It's the worst option. So I don't feel like I even wanna teach you how to do it because it's like, I don't want you to do it. So the next best approach is use an atomic addition like a compare and swap. This is nice because you don't have to go down to the kernel if you need to grab the mutex. But the problem is that if you have a lot of cores all trying to do compare and swap on a single cache line, then it gets really expensive and validated everywhere. A batch atomic addition we'll see when we talk about silo. The basic idea here is that instead of going, doing a single compare and swap to get a new timestamp, you do compare and swap to get the next timestamps and give that to a thread. And that way that core thread only has to go back and get the next batch of timestamps when it runs out. And then two different, some more experimental ways to do this is use a hardware clock and a hardware counter. So the hardware clock of the idea would be here is that there's a, the chip periodically sends this update to all the cores and say, here's the new timestamp at some nanosecond or microsecond scale. And this automatically happens at every single tick and it gets sent out to all the cores. So now if you wanna say, I need to get a new timestamp for my transaction, you just look at your local core and get its current value for this special register. And you don't have to send any messages over the network. So this exists in Intel chips and not AMD chips, but only for the last, I think like year or two. And when we talk to the Intel guys, they're like, yeah, this is nice and all, but we're not sure whether this is actually gonna be something, you know, we can keep supporting, you know, in the future when we have more cores because this is actually kind of expensive to do. The other approach that is discussed in the paper, which is actually not implemented yet in any real system is to have this sort of special hardware counter that you can then do a, that's sort of sitting in the center of the socket, that you can then do a compare and swap on and get new values without having to do a cache invalidation to every single core every single time it's modified. So the idea was that your core would then send a message that would go to the network and you do a compare and swap update this counter and get back to the current value without ever having to talk to any other core. But again, we implemented this in the simulator, but this is not actually something you can get from real CPU. So when you measure now what the performance you can get from these different approaches, again, you see the very bottom, UTX is always the worst, never use it. The best is actually I think the hardware clock and the next is followed by the hardware counter. But in our, all the approaches we'll talk about in the next two classes, we're basically gonna use either the atomic approach, the compare and swap or the atomic batch comproach. So in this graph, it shows that the atomic batch is actually better. There are some scenarios where the atomic batch is actually worse. Because let's say that I get a batch and the first time stamp is less than, or greater than some other time stamp that I'm conflicting with, or it's actually less than. And there's a conflict, then I go back and restart my transaction and get the next time stamp in my batch. That's still gonna be less than some other time stamp. So therefore I would burn through my entire batch and have to go back and get the next batch. But again, it's no better than just doing the straight atomic thing, but yet you're spinning the transaction multiple times. So again, we'll talk about how the atomic batch stuff works. But so the basic thing here is like, this is basically what everyone does. The more expandable things we think are better, but nobody actually does this. I would also argue too that like, this is doing, you can allocate 100 million transaction time stamps a second. That's on a single CPU. That's pretty significant. Nobody really needs 100 million transactions a second on a single box, right, yet. So by the time people actually need to do this, we think the hardware will catch up. So that's why I'm saying that I think doing the atomic atomic batch is good enough. We don't need the more special things. All right, the last one is to do memory allocations. That's really slow. The basic issue is that at any single time you have to copy things now to make sure that you can read it again, whether it's in a private workspace or making different versions. That actually gets expensive, right? Before when we were on, we had to write things out the disk. Doing a mem copy was nothing, right? But now if we're worried about running bare metal speed and not writing things out the disk, then copying things all around can actually slow us down. So I'll talk more about this when we talk about storage, how you actually store things in memory system. But the bottom line is also to remember if you never want to use the default libc malloc, everyone always writes their own, everyone always uses things like jemalloc or tcmalloc. Now in the case of Shingal because he was crazy, he decided to write his own version of malloc. Don't do that, right? There's better ones out there to just use those. But you never want to use the default libc one. So we use jemalloc in our system right now, in the pedal of time. All right, so we don't have a few minutes left, but I'll finish up real quickly, talk about this last protocol. So there's additional, there's additional currency protocol that was talked about in the system for the system called h-short that I helped build. The basic idea of how this protocol works is that you're going to split the database up into disjoint subsets called partitions, or if you're coming from another SQL background, sometimes called shards, and then we'll assign the partitions to a single threaded execution engine that runs exclusively on each core. And then what'll happen is in order for a transaction to execute and modify data to the partition, it has to wait to acquire the lock for that partition before it's allowed to start running on it. And because it's a single threaded execution engine, it knows while it's running, there's no other transaction running at the same time. So therefore, you don't need to maintain any lock information, any dependency information, don't need to maintain multiple versions, because you know that you're the only guy could ever read and write the things that you're writing to when it runs. So now if you need to touch, say, if you run your transaction, and you recognize that you need to touch data from another partition that you don't hold the lock for, you actually have to abort the transaction and restart it, because it's sort of like two-phase locking. You can't acquire locks after you acquire the first one and you start running, if they're sort of restarting the entire thing. So the main takeaway I want to show, the main thing I want to show you here, and you'll see this same graph, but in reverse order coming from the other direction, in the silo paper that you'll read on Thursday. But the bottom line is here, is that A-store does really, really well, and is much better than all these other protocols, because there's basically no overhead of actually assigning transactions to run on a core, right? But then up in here, then the timestamp allocation becomes a bottleneck, because now up here, we're doing like nine million transactions a second, which is that same bottleneck we were hitting before when we looked at the MVCC and timestamp ordering, the first graph. And then it basically does just as well as the other ones here. So again, really great performance up to a certain point, but then the timestamp allocation becomes a bottleneck. But then you look at it and say, well, isn't A-store much, much better than all these other approaches? Why don't I just always want to use A-store? Then they have this one graph here where they show, if you have multi-partition transactions, so you have to acquire locks for multiple partitions, then the performance actually becomes terrible, right? So these bottom lines here are what happens when the transaction has to acquire locks from either eight partitions or 16 partitions. And this is when you either have everything be single partition. So again, you'll see this in the silo paper where A-store will outperform silo, all your transactions only have to touch a single partition. But as soon as you have to touch multiple partitions, it'll actually, it'll get terrible performance and silo is actually better. So I'm gonna bring this up now because you'll see this again in the silo paper and this is helping you understand actually what's going on, right? Because again, think about this. If you have a transaction that has to touch multiple partitions, say you only want to, you wanna do most of your changes on one partition and maybe just read one tuple from another partition, you do that read and then you continue with the main partition and you never go back to the second partition, but you still hold the lock for it and nobody else can run at the same time. So that's sort of why these things get worse. All right, so to finish up real quick. So hopefully from reading this paper, you would get the sense that concurrency control is really hard to actually get it to perform correctly. You have to worry about all these different things and we haven't even talked about how index is played into this. But it's also really hard to actually perform well and perform efficiently when you have a large number of transactions running at the same time. So the other key thing to point out is that in this paper, it only evaluated pure OLTP workloads and actually the next two papers you'll read are purely over OLTP things. But it comes more complicated now and what was best for this paper we showed here may not actually be the best thing when you wanna do HAT workloads because now you're gonna have transactions or queries that wanna read a lot of data and aren't actually gonna update things. So maybe you don't want to use the same type of protocol we were using here and it's not clear yet actually in the literature and the research what is actually the best concurrency protocol you're gonna use when you have HAT workloads. Okay, so next class, we'll start off talking about different isolation levels and we'll focus on snapshot isolation because that's sort of the thing we'll worry about for the next two classes. We'll also give you a crash course on store procedures and then we're gonna spend most of our time at the end talking about high performance optimistic concurrency tool, how you actually do this in a modern system. Okay, any questions? All right, so I'll send an announcement about the AutoLab for project number one. Again, if you already submitted it over the weekend, you'll have to submit again to the new image and everything should just work for you. Okay? All right, I'll see you guys, actually know real quick. If you're interested in learning more about Peloton and getting involved in the project, we have our team meetings on Wednesday on the 9th floor here in Gates. So if you're thinking about doing independent study with me, either as a catch term project or independent study for credit either in the fall or the summer, then you should maybe come to the meeting and we can talk about what kind of things you can work on. Okay? So again, I'll send this announcement out on Piazza as well. Okay? All right guys, thanks.