 So today's lecture is we're going now sort of back to where I said we should have been at the beginning of the semester Now start talking about concurrency control and transactions and updates to the data system, right? So last class I said that I did query compilation At the very beginning because I think it's something that's important We should focus on at the beginning But it's it doesn't really fit into the flow for the rest of the semester But because you needed this for the first project because this is gonna be something that you're gonna counter all throughout the semester I pushed it to the front and then now we're switching over to where I normally go through in the semester and start talking about concurrency control and transactions So some quick administrative things Project one is due tonight at midnight I know there is one particular issue we're having with Clang format, but we'll take care of that When you get the grading and then project two will be released on this Wednesday coming up So for this project you want to be a group of three three people Because it's gonna take about a month or so to work on it And then this will end up being probably the same group you're gonna do for the third project of the final project So choose your group wisely. So I'll send out a sign-up sheet on Piazza We can go on to a Google spreadsheet and fill out who the the Android ease for the people within your group If you don't have a group There'll be like a free agent list where you just add yourself there and then come Wednesday If you can't find a group to be in I'll just start pairing people together Okay, so again, this is gonna be working with them throughout the entire semester So so try to fix somebody who think you can get along with right last last year We had fights and had to break people up. That was never good the year before we had somebody with hygiene problems I had to figure out who to who smelled the worst This and who's not the second worst to put those two on the same group So they didn't bother everyone else. So I'd like to not do that. So hope that you guys can resolve this yourself, okay? The a next sort of announcement as well is that now that ever, you know We've sort of gotten past the add drop deadline and everyone that wants to be in the course is in the course I want to announce that if you want to go beyond the things that we're talking about in in this class There's two opportunities to to get involved in other things outside of the course So the first is that the date the senior database group will have their weekly meetings on Wednesdays or sorry Mondays at 430 immediately after this class up on the 8th floor And so for this is just my students and other researchers at CMU will come and talk about You know the various research projects they have going on related to databases So we're actually having a speaker today will be presenting His paper on the BW tree, which is one of the papers that you guys be assigned reading in in two weeks the other thing is that we also have our Peloton developer team meeting on Tuesdays at 12 p.m. In This on the 7th floor and this is basically where we don't really talk about core research stuff We talk about the ongoing development of our system. What are the bugs we have? What are the main features we're trying to build and you know And everyone can sort of get involved and help out if you want to go beyond the stuff We have in the course Okay, so again both these are entirely optional I'm just sort of saying that these are things if you want to sort of go beyond the things That we're going to discuss in the course in the paper or in the papers in the classes And then you want to get maybe get involved in a research project These are two opportunities to to to enlighten yourself. Okay, and again, they're they're entirely optional And then we food at the at this one there's cookies of this one food at this one and everyone loves food, right? All right So for today's agenda, I'm gonna start off talking about some background material So I've been sort of debating for the last couple years as I teach this course to try to figure out What's the right flow right order for me to explain topics related to current control because there's a bunch of background stuff That you're not necessarily going to get from an intro class that I think is important But I don't want to have a class. It's just all you know all intro material because there's no paper you can you can read So what I'm gonna do instead is sort of sort of the next three lectures will be on current control And I'll sort of present a little vignette or a little off off side side things about You what you need to know in modern concurrency or modern transaction processing that are relevant to our discussions and Then I'll talk about the main thing for the paper that you guys read so today I'm gonna start off talking about store procedures and then we'll get into OCC the optimistic control algorithm that you that you guys read about and then we'll read about modern implementations of this So silos a modern implementation and then tick tock is the other one that we'll focus on okay All right, so at the beginning of the semester We said that one of the big things that we're gonna do and in a modern database management system is Forgo having to store anything on disk and switch to an entirely a main memory oriented architecture So that means now since the database is no longer stored on disk. We're not gonna have any stalls From transactions when they try to go read data Right everything's gonna be in main memory, but everything's gonna be really really fast and because of this assumption about our architecture This allows us to reevaluate how we want to design our data's management systems components Like a current control is just one aspect of this, but there's other things we can do As well to just speed up processing speed up execution But just because we got rid of the disk and we didn't get this entirely And we still have to care about logging and writing out records when about changes that transactions make to disk Right that doesn't go away But for our purposes we'll ignore that for now and we'll cover that later in the semester But just because we got rid of disk from from page faults from from reading from tuples that aren't in memory There's still other stalls that we're gonna have while we process transactions that we can't get rid of and The one that's going to be the that's most relevant to our discussion for current control here is the network So someone posted on Piazza asking about It seems like in all these papers They're ignoring what the cost it is for sending messages over the network and this is true but I'm not gonna focus on enter to enter node communication for like as you would have in a distributed database system I really care about the the messages that the application server sends to the database server to execute queries and extra transactions Right because now if the disk is gone our transactions are super fast the next high pole in the tent the next bottom That we're going to have to deal with is actually going to be the network so these stalls are going to come from Applications that are written using what is called a conversational style API All right, so examples of this would be like ODBC or JDBC or like the you know The database specific wire protocol that the system supports like when you open up the PC go terminal connect to Postgres Right when you type your query hit enter, that's the conversational API That's that's sending requests from the application side or the client to the database server The client then waits for the server to send back a response with the result of that query Right, so now when you're doing transactions it this becomes this becomes problematic right so to say we have an application server and our database server and They're running on separate machines for our purposes. We don't care This could be run on the same rack or the same data center on another data center. It doesn't matter We still gonna have the same problem and The application wants to invoke some higher level method in the in its in its code Perform some higher-level operation in in on behalf of a user, right? So we're not talking about a single query we're exiting transaction here That's going to do some higher level thing like add something to your cart update your payment information and so The application server is going to hit some some you know It'll get some user requests needs to start exiting this transaction So the first thing is to do is going to call begin say I need it. I'm going to open up a new transaction Depending on what the protocol is or how it's actually implemented You may or may not have to send a a network message for this begin for our purposes. It doesn't matter And then we start a transaction and we get the first query one execute And so what's going to happen is the application server is going to you know There'd be a command to say execute the SQL query and that's going to go over the network Using whatever the protocol actually is land on the database server And then this application server threads are going to block it has to wait until it gets back the response They can't keep processing because they didn't know what the result of that query was So when the query lands on the database server, there's a bunch of stuff We have to do before we actually even can execute it Right, so you have to parse the SQL Convert into a bunch of tokens and after x syntax tree Then we run it through the planner and the binder and that's where we take this You know the names of tables and do lookups in the catalog and find the catalog objects Then we run it through the query optimizer That's going to actually convert it to the logical plan to a physical plan Right, maybe optimize it if doing joins and then we can actually execute the query And then when we get the result of the query then we can send it back to the the application with the result now at this point here The the database server is going to is going to wait To figure out what the next query is this transaction is going to execute So it's not really stalled right other transactions could be executing queries at the same time But for this particular transaction that we started We have to maintain all the metadata about what it actually did and it's in its read-write set and everything in anticipation of it sending the next query So that means that if we're grabbing locks for this transaction We're holding those locks for for why we're going back and forth between the application server and the network There's an application server in the data server over the network So then we get a result here then it can do some processing excuse the next SQL query Same thing you send a message you block get and then send our result back And then again, we're still waiting for the next query or the next result from from this transaction then at some point we We get our commit message then we send that that message over to the to the database server It could then finally check to see whether this transaction is allowed to commit Right do some validation process and if it does commit then it can write out the log and then release all the locks Or whatever else that I was maintaining for the database server, right? So here the problem is again It's the the transaction itself may actually really really fast these queries are usually pretty simple Like look up a key update a record based on its key Or we're not doing large large updates typically in transaction processing workloads So the queries are gonna be really fast, but the it's the network traffic that's gonna kill us The in terms of like the parser plan or optimize a part that you know That's not a huge bottleneck, but that can be slow if your queries are really really simple And so with the no sequel guides came out They basically said we want to get rid of all these things and just have the application be written using the the database system you know API commands on On on the you know to execute queries on the on the on the database server rather than having to deal with sequel So what are some solutions you can have this problem? Well, the first is that we can use prepared statements. We can tell the data system ahead of time Hey, here's the queries. I'm gonna execute and here's what here's little placeholders where I'll give you values if you want to substitute Constance when I invoke it, right? So this move removes all that preparation overhead I said before you don't the parts to sequel and plan it and optimize it every single time you invoke it You generate the query plan cash it and then you invoke that prepared statement by based on name and then you avoid that all that that upfront cost These these definitely help, but I would say most applications are not written using prepared statements Most applications are being using what is ad hoc sequel, right? You construct the sequel string in your PHP code and then you then you invoke it The next solution is use batches So instead of sending one query get back results and the next query to get back the result You can try to combine multiple queries into a single request Send those over the network do all the processing and get back multiple results Right and that again that avoids the round trips You can't always do this because sometimes the app the application we've written such that the Output of one query could be used needs to be used as the input for the next query So you have to see what that output is before you can figure out how to construct the next the next where you want to send You can kind of finagle it maybe with nested queries try to avoid this But there also be cases where the application server will get the output of one query Then it has in the actual application code itself And if clause it says if the output is dis execute that query if the output is something else execute this other query So for that you have to get back the result and you can't execute everything in batches But the one we're going to focus on here real quickly is store procedures And the idea was store procedures that we're going to take all the application logic That was running on the application server the program logic We're actually going to embed that directly inside the database system And then no longer do we have to do this back and forth to get a query result process it get it next query result process it We can now make a single request almost like an RPC call Into the data server to execute that single function it executes all on the server side and we get back a single result All right, so the way to think with store procedure again It's a logical unit in the application code that's going to form some particular task On behalf of the application that would normally be written in the application But instead we're going to bet it inside the database server so there is actually a a store a standard-defined Store procedure programming language that you're everyone's supposed to use called sequel PSM But nobody actually implements the standard directly, right? Like so this first came out in like sequel 92 or so they were sort of opposing or actually came out a little bit after the sequel 92 standard came out And then it was actually ratified in sequel 99 so it's been around for a while But nobody actually implements it exactly as as it is The most common implementation is what's called PL sequel and again at a high level looks very close But the sort of the nitty-gritty details deviate from the standard and this is what Oracle DB 2 and my sequel use I think this is my sequel 8 has this Postgres has something called PLPG sequel again I don't know the exact details of where it differentiates between PL sequel But there are some some Postgres idioms in there that make it not 100% compatible and then Microsoft has this thing called transaction sequel or T sequel This actually came from CyBase from the 1980s 1990s. Remember that said the sequel server code is actually originally based on on CyBase And they they they originally ported CyBase to Windows NT and then since then they've deviated so much But they still They still maintain the support or from the inheritance of the transaction sequel that stuff that got from CyBase Right, so you can definitely cannot take transaction sequel and run it in these other guys So again the high though idea is that we're taking all that same application code We have before and we're going to embed it inside the database system So now on the application side when we want it where we normally would call the PHP code or Python code Whatever it is to do all those steps now we just execute a single sequel query in this case It's call another since it's exact or execute and we just pass in the input parameters as if as if we're invoking a function And now it's one round trip to send over that request of what we want to execute all the queries are Pre-compiled as as as prepared statements. So they're gonna run really fast It runs as a transaction and send back the final result. So now it's one network round trip back and forth All right, and this is this is how you get rid of that overhead of those the of a conversational API So just give you a high level idea of what PL it sequel looks like or a sequel PSM So you have sort of this this main body of the function or the store procedure And then you have the various constructs you'd expect to have in a regular programming language The This looks a lot like Pascal if you ever taught Pascal like I was in middle school I actually don't care so much for the the what sequel PSM or PL sequel looks like The reason why it looks like Pascal is because it's actually based on this other language called Ada Which was an extended version of Pascal and the dude that was on the standards body for sequel who defined what the Supercedure programming language looked like was a huge fan of Ada So that's why we end up something like this right so has little sort again Pascal thing like you have to declare all your variables ahead of Time before you actually execute anything Right, so this is just at a high level what it looks like All right, so what are the advantages of using store procedures Well, as I said, we get rid of that all the extra round trips to execute sequel queries get back response Extrude secret is a bad response and now it's just one request to execute that transaction as a store procedure and we get back one result So that's much much faster We're gonna get better performance for our transactions because now the queries can be pre-compiled As prepared statements and embedded directly inside the database server and the data server can evoke those prepared statements directly From a software engineering side we get the benefit that we're gonna allow us our applications to reuse complex logic Across multiple implementations because everything's now embedded in a central location inside the database server So what I mean by that is say I have an application where that it has some complex function that wants to do something in my database if I Say I'm gonna rear my application and go from from a Java code to Ruby or Python or to have another applications for mobile devices I essentially have to then re-implement that logic in every single Different platform that I'm trying to deploy my application on But if I have everything as a store procedure for this lease for this complex piece of code The Python version and the Ruby version the Java version can all make the same call to the same function Inside the database server and I don't have to recreate it in those different languages So that's big deal and the other thing is that we you saw this in the in case of the silo paper Because all the logic we need to execute transaction is inside the database server We can transparently restart transactions whenever there's a conflict without having to go back to the application server and ask it to Manually restart the transaction So what I mean by that is say I invoke a store procedure And it starts executing and then say in the validation phase It realizes there's a conflict and has to abort that transaction So typically what would happen in a conversational API when you bought the transaction that gets sent back as an exception That is thrown on the application side and the application server then has to figure out all right I had a conflict my transaction got aborted. I need I need to retry it Right, so you need to handle the exception and know how to go back and retry the thing you were trying before If it's everything's on the server side, then if I hit a conflict and I could bought my transaction I need to go back and re invoke reap You know re invoke the the store procedure from the beginning and this is all transparent to the the application server So your transaction may restart ten times before it actually was able to commit and The application server doesn't know and you don't have to write extra logic to deal with those restarts So that's kind of nice So this all sounds amazing right? Yes your question is should an application developer write their entire application as a store procedure I think your question is I think you're asking if I'm writing a new application, do I have to start with everything as being store procedures in the very beginning Like so we say Richard me when the system boots up or like when the when you're actually developing the application When you when you say in the beginning, what do you mean? You know so so you install store procedures and they're registered in the catalog so every time you restart the server They're always there Correct, yes, if this is not there's no There's research that sort of does this but there's there's there's no magic tool that can take your any arbitrary programming language application and In decide here's the boundary of taking this as a store procedure and suck it out and automatically put in that No, no, no nobody can do that all right, so this all sounds amazing and It's early in the paper that you guys read in the silo paper and all the other papers were read about commercial in this class They make a huge assumption and that is a hundred percent of the transactions are going to execute as store procedures and this is not true we actually did a survey with DBAs last year and a Large percent of them have zero percent store procedures right and The reason it varies about why they don't they don't use store procedures as much as we think they do in academia A big deal is that they have trouble finding people that know how to write PL sequel or sequel PSM or PLPG sequel Right like it's easy to find a JavaScript Python developer But finding something who has experience writing these complex store procedures is It's much harder now It's not saying that the language is so archaic that it's nobody can write it We're all smart people. We can probably figure it out, but It's not something you come across Every day Now there are cases where some database systems like in Postgres, for example You can write your store procedures in other languages like Postgres will let you write to write store procedures in Java and Python and in JavaScript and Pearl See oracle will let you write your store procedures and see But when you want to write in these other languages Because those other languages can really do anything the database system has to do the extra stuff to make sure that you don't write a crappy store Procedure that takes down the whole system So for example in oracle if you you write your store procedure and see you link it into the database server When you are trying to invoke that store procedure at runtime oracle will fork a process and run your store procedure in a separate Sandbox that's separate from the database system and then has to set up an IPC channel to go back and forth Right because again, they don't know whether you wrote crappy C code that's going to then start trashing memory and corrupting your database Right, so there are other languages you can write your store procedures in but you have to sandbox them protect yourself from a software engineering side a big problem is that the The scope of the application now The entire logic of your application is now broken up into different locations Right, so you're gonna have say you're debugging your program and you're going along in your debugger And you're looking at my line that all sudden there's this invocation to a store procedure that you then got to go find where that source code is or in your and you're and figure out what what is actually do inside that your database server and Then if you have a agile software development environment where you may be putting out updates every two weeks That you may have to change your short procedure every two weeks And if you're doing rolling updates now you make sure you have the old version of the store procedure and the new version of the store procedure so that if newer versions of the application Want to reveal your store procedure that they're always running the new one so now the problem with this is that and slated this last comment here is DBAs do not like change So if you're trying to come every two weeks and say hey, here's my new store procedure They just don't take whatever code you give them at least a good DBA Just doesn't take whatever code you give them and install it It will actually bet it to make sure it actually just not gonna cause problems with other things running in your database server So the database servers the DBA is probably not gonna let you just have unfettered access to the the database system And let you install whatever store procedures you want And so the time it takes for me to make a change to your store procedure Between the time it actually gets installed it could be quite long So this becomes problematic and then likes like in sequel dialect in general Nobody really follows exactly the standard. So the different store procedure programming languages are not compatible It's actually even probably worse in store procedures than it is regular sequel. So it's you know Transact sequel definitely you cannot work in in Oracle I said these are these are definitely not portable. So you're almost really locking yourself in more so than you would with regular sequel All right, any questions about store procedures? yes So this question is what's the performance difference from running store procedures versus running a Conversational API when exiting transactions Oh, so you say if you if you if you just run the application server on the exact same machine as the database server That is actually not very common. Right. Most people don't do that, right? So going over the network is a big deal and we'll cover this later, but Going through the kernel To send TCP messages is bad, right? There's ways to get around this we'll cover that later But in general it's it's going over a physical medium. So it's always gonna be slow Now, I can't give you a number of what the percentages of the slowdown We're actually trying to run that those experiments now But I definitely say it's it's not trivial the network is slow, right compared to like running on the same box in memory So I would say it's it's significant So again, the reason why I sort of front-loaded the lecture with store procedures because as I said The silo paper makes a huge assumption that everything runs a store procedures. So we'll ignore the issues of Store procedures are not widely used as maybe people think they are for now and we'll just focus on the protocol So for this part of the semester now we're starting to talk about concurrency control And so the concurrency control is essentially the traffic cop or the coordinator inside the database system That's going to figure out how to interleave the operations of transactions running exactly at the same time So before when you're a disk base system We said we had to do this Because at any time a transaction could try to touch data that's not in memory Hit a stall because you're going to have to fetch from disk and you want other transactions to run at the same Time and still make forward progress. So now we don't have Disstalls anymore, but in a modern architecture environment, we have a lot of course And we want to you try to use all those course So we're going to allow multiple transactions to run at the same time on different course So we still have to do all the same protections We had before to make sure that we don't violate the serializable ordering of our transactions And this is essentially what the concurrency code scheme is going to provide for us So courage is all is essentially going to allow us to write application code to invoke multiple transactions at the same time in a multi-program manner and all the transactions are going to have the assumption or the illusion that they have Exclusive access to the database while they're running Meaning they're not going to see updates from other transactions that haven't committed yet And you know, they're they're not going to get overwritten by other transactions running at the same time and So they're going to think that they're running by themselves even though they're not and the goal for us as As we discussed last semester for concurrency tool was that we essentially want to have this interleaving generate a schedule that is equivalent to One where we execute the transactions in serial order meeting one after another and For our purposes we care about conflicts their lives ability I've used their lives abilities hard to do because you actually have to interpret what the application actually wants to do with the data So when we say serializable in these lectures, we really just mean Conflict serializable, right? That's what most systems provide So essentially the current role scheme is provide for us the the atomicity and isolation guarantees that you would have You want to have an asset system and we have to do this because we have Simultaneous transactions running on different cores that could all be accessing the same data So as a quick refresher for the different types of current role schemes that are out there, right? Remember I said that there are essentially two classes of protocols There's the two-phase locking and then there's the the the time-safe ordering Every class of furniture protocol that I'm aware of has to fall into one of these two categories Now I didn't make these categories. This was came out from the early work in the 1980s late 1970s This is not like this is some you know groundbreaking idea that there's only two protocols But if anybody comes and tells you says I have a country of protocol that's not one of these two They don't know what they're talking about. Okay So with two-phase locking, this is also be categorized as a pessimistic protocol And this is where we're going to assume that transactions are going to conflict So we're going to require them to acquire the locks for the database objects They want to access before they're allowed to access them Under time stamp ordering, this is an optimistic protocol. So we're going to assume conflicts are rare So we're gonna have transactions don't need to acquire the locks on the objects At first and only later on what we'll try to resolve whether there was actually conflict and then We use the time stamp when we assign these transactions to generate an ordering of the transactions that's equivalent to a serial ordering Let's go through each of these. So for two-phase locking Say we have a simple transaction. It wants to do a read on A followed by a write on B So in the first phase of two-phase locking, it has to acquire all the locks before it's allowed to do anything on those objects So in this case here It wants to read on A so it has to acquire a lock on that and it wants to write on B So it's acquired a lock on that first and then this is a sort of a super simplified example because we all know there can be different types of locks Right share locks intention locks exclusive locks for our purposes here. We don't care. We just have a single lock per object So this is called the growing phase because this is where you acquire locks That you're gonna need or to exit your transaction And then later on when you unlock them you enter what's called the shrinking phase So as soon as you unlock one object you automatically enter the shrinking phase And now you're no longer allowed to acquire any new locks So when you're in the shrinking phase you can only shrink the number of locks you have you can't go back and acquire more You have to do that you have to abort Another thing to also say is like in this example here It looks like the data that the transaction is explicitly calling lock and unlock In a sequel based environment, you don't actually do this right when you invoke the query the dages will automatically lock things for you and Typically they also only do strict to phase locking where you only unlock everything at the end. You don't do actually do explicit unlocks So to base locking was shown to be the first provably correct protocol from the 1970s that came out of of course the IBM guys Working your system are So let's see how this work with it with another transaction So let's say I have two cores in my system. I can run two transactions exactly the same time So this bottom guy here wants to do a right on B and a right on a and same thing in the in the grown phase It has to acquire the locks to the object supports a lot to do anything and then when he goes Unlocks one of them. It does it does the shrinking phase so say again, I have two cores these transactions start exactly the same moment in time and The these little red arrows represent the program counters so then transactions first start off the very first thing they're going to do is Try to get the locks they need so this guy's going to try to get a lock on a Which again, no other transactions running at the same time So it can do that this guy wants to get a lock on B same thing. Nobody's running the same time So we can get that so now we go over here and that they invoke the first operation He wants to do a read on a which you can because at the lock on a this guy wants to do a write on B Which you can do because of the lock right on the lock on B And the trouble starts is when we get here So now transaction one at the top wants to get a lock on B before it writes to it But transaction two holds that lock already So it's gonna have to stall and wait to try to figure out when you know, it's something important until acquire that lock Transaction to the bottom. Let's get a lock on a which you can't do because transaction one holds a lock Then we have what what is this called? Deadlock right so at this point here. We have a problem. We can't proceed And so we have to do something to either break this deadlock or avoid it in the first place So all the two days locking protocols that are out there Can be subdivided further based on how they handle their deadlocks So the first class is called deadlock detection and the idea here is that we're gonna keep this internal data structure to keep track of what locks transactions hold and what locks they're waiting to acquire and Then there's a separate background thread that periodically checks this data structure usually represented as a graph And if it sees a cycle it knows that there's a deadlock And then has to make a decision about how to kill one of those transactions to break that deadlock And the dumbest thing you can do is just kill all the transactions, right? But you obviously don't want to do that because we wasting you know wasting work So instead you're you're gonna try to figure out how to break kill one of those transactions that then break the deadlock Then the surviving transactions that then can acquire those locks and then go still make forward progress Okay, go continue until it commits and the guy you kill can then try to come back and re-execute itself so the heuristic you use to figure out what Transaction want to kill can vary in all the different systems, right? You can say what transaction has ran the longest what transaction has hold holds on those locks What transaction has updated the database more, right? All these different heuristics do different things in the commercial systems like DB2 You can actually can tune those to tune that heuristic to make decisions about how you break deadlocks But it could be there's no magic, you know, there's no magic formula hours We can use that's gonna work best in all situations depends on what your application is actually doing The other approach to do deadlock prevention the idea here is that there's not going to be a separate thread It figures out where there's a deadlock It's at the moment that the transaction tries to acquire the lock if that lock is being held by somebody else Then you make a decision about how to to proceed so the choice is either to just wait a little bit for Hoping that transaction that holds that lock will free it up and then you go ahead and acquire it You can commit suicide basically you kill your own transaction Give up all the locks that you have and then go back and restart yourself and try to come back and hopefully by the time You come back the second time that lock you didn't get for is now available Or the alternative is a bit more brutal or you shoot the other transaction in the head and steal us locks and then Then you get to run Again different systems do different things no one no one method or heuristic here is is Is is better than others depends on the application, right? You remember from the intro class This is all the wait and die wound wait stuff. This is this is what's provided here So two-phase locking is will come up later when we talk about multi-version current control for this class We're going to mostly focus on the other class of algorithms called time sample ordering So with time stamp ordering there are no locks. So there's no shrinking phase. There's no growing phase So instead what we're going to do in the basic time stamp ordering protocol Is that we're going to sign every transaction when it first arrives in the system? We're going to give it a unique time stamp And this time stamp is going to be used to order figure out what order those their operations should have proceeded in in real time And we're going to make sure that we don't have transactions reading things in the future that they shouldn't be reading And they're not overwriting stuff that somebody else didn't didn't get a chance to read yet So there's a lot of different ways to assign time stamps for now It's assumed that we have a single counter like a we add one to it for every new transaction And that's its unique time stamp All right, and then inside the database system. We're now need to also maintain two additional pieces of metadata for every single record So the first is going to be the read time stamp Which is going to correspond to the last the time stamp of the last transaction that read this object It may not have committed yet Just when it was ran it read it and you update this time stamp and then we'll have the right time stamp is the last time stamp The time stamp the last transaction that wrote to this object and same thing we can We can update this before the transaction it commits So when our transaction starts running and let's do a read on a the first thing we have to do is check the read time stamp And compare it against our time stamp to make sure that this is not in the future Right because we want to make sure that we can only read things that existed at the moment that we started So in this case here one ten thousand less than is less than ten thousand one So that's okay All right, and then we're going to go now update the read time stamp To add in our time stamp to say I was the last transaction that that read this object at this time stamp Then we come along to do a write on b same thing I want to check both the read time stamp and the write time stamp to make sure that they're not in the future So i'm not overwriting something that a future guide did not read yet And i'm not overwriting something that a future guy overwrote to In this case here, we're both fine because ten thousand is less than ten thousand one So we'll update the write time stamp to say we updated this object b with a new version or sorry a new value And uh our time stamp was this So now let's say this transaction goes off and does some other processing We don't know why we don't care, right? It could be it could be going back with the network to get back the next query doesn't matter But during this time another transaction came along and they read they wrote to object a And its time stamp is ten thousand five which is greater than ten thousand one So it's allowed to overwrite this change Right, so we update its write time stamp Then our transaction the tops gets back gets back and starts running again Now let's try to write a again, but now we're going to hit this and say oh Our time stamp is now less than the current write time stamp for this transaction So if I overwrote this then I would be overwriting a value that was written by a transaction that existed in the future So the way you think of these time stamps, there's like the physical time and the logical time so in physical time Another transaction ten thousand five wrote to this object And replaced the value but in logical time as defined by our time stamps to try to find a serializable ordering It occurred in physical time before me But in logical time in the future and I can't have this thing go back in time and lose this update So in this case here since ten thousand one is less than ten thousand five I can't proceed with doing this right and my transaction has to abort Yes Value This statement is Are we assuming that this this other transaction did not read the value? It doesn't matter Well, I'm just thinking what if it was trying to do something like I guess if it's inserting fresh Not inserting if it's modifying value directly then that makes sense for it to overwrite But what if it if it's new right dependent on the previous values if you wanted to add five so a statement is um if I uh If someone read this tuple Transaction ten thousand five read this tuple I would update the read time stamp and maybe inserted record c And now when I try to do it right here, you're right I'd check this And I would say I can't overwrite this because someone read it at this time stamp in the future And they didn't they didn't see the update. I'm about to put in So yeah, you you would fail that Yes So his statement is uh In this example here if 10,005 committed And I try to come back and overwrite a In theory, I could just drop that right from this guy here Uh because no one read it 10,005 committed everyone will see its version So can I just ignore the right? Yes, that is called the thomas right rule We will ignore that for now, but yes, that's that's one optimization you can do That's the most common one Okay, it's more so exact protocol doesn't matter It's more like I want you to get the idea of understanding of there's these There's these time stamps are going to have transactions And although the physical time of when operations occur may not exactly match what those time stamps are We can use those time stamps to figure out whether certain things should allow to happen versus others Yes Check the time stamp, but they're not getting modified by another threat His statement is his question is how do I check to make sure this these time stamps are not being modified by other threats? So here here who here knows what compare and swap is raise your hand All right 25. All right, we'll cover this Next class. I thought about deleting these slides. It does come up when we talk about silo So there's a atomic instruction you can do called compare and swap Where you say what's my current value? If it and then try to update it with it with a new value So it's not like you had I had to have a like a lock or mutex Then have if clause to do the update and then break out of that mutex in a single instruction I can update this thing and I I don't worry about other people updating at the same time Compare and swap as you handle that and I'll cover this next class All right, so the protocol I just showed you here. It's called the basic time stamp ordering protocol Nobody actually does this except for peloton Which for reasons we can take offline. I actually disagree with but the the the big thing that I'm also missing from this this Demonstration is that to ensure that we have repeatable reads mean if I read an object at this time And I want to come back and read it again And somebody else in the future may have updated it to make sure I still see the same value that I read before I have to copy the Those values into my private workspace and then it starts to look a lot like the occ protocol that you guys read about So again, nobody actually does this except for us for reasons that whatever But instead what everyone does is actually occ and so occ is a variant of Of a there's basic time stamp ordering protocol and this is also very confusing in databases So these are all considered time stamp ordering protocols And they're also considered optimistic protocols But there is a specific algorithm or protocol that's specifically called optimistic recurrence control So the class of these recurrence protocols are optimistic But one protocol is called optimistic Okay, so just keep that in mind. We'll just say occ for short And so with with occ what we're going to do is that we're going to write Store all our changes in the private workspace And then instead of checking the time stamps every time we do a read and write To see whether we have a conflict. We'll wait until the very end and then when we commit then we check for conflicts All right, so the the occ protocol is pretty old It was actually first proposed in 1981 by this guy hg kong. Actually, this was invented actually here at cmu Um hg kong is not a database person He's actually a networking person and they just sort of stumbled upon this algorithm And you know for their networking research and thought it would be actually useful for databases And this is one of the most seminal papers in in this area um The only issue about it is as we'll see as we go along the things that they use to describe certain aspects of the protocol Or not how I would use it to describe them Like the read phase you actually can do writes in the read phase, which Whatever, um So it's a time-stop ordering protocol again We're going to write everything we'll work into our private workspace And then when we commit we're going to check to see at the end whether there's a conflict So we go back to our example before So when we start we're going to do a read on a write on a write on b This is the same example that I showed in the other time ordering example In our database now, we don't actually need to store the read timestamp anymore We only need to store the write timestamp And that's going to be enough for us to figure out whether we're going to have a conflict So when my transaction starts I want to do a read on a I'll find I use whatever index I need to find the the entry that I want But then I'm going to copy to my private workspace and so the occ protocol is described in phases So this first part here when you're actually doing the operations on behalf of the transaction This is called the read phase and again, which is confusing because you can do writes in the read phase But it is what it is So when I want to read this object I'm going to end up copying it find it in the in the sort of the shared database the master database And then I'm going to copy it into my private workspace Like an exact copy with the record the value and then it's a write timestamp So now if I go back and anytime I read that same object again a again I can always check my private workspace first to see whether it's in there And if it's not then I go find it here and copied into it, right So now I was just over to do a write on a and instead of going into the the master shared database I always go to my private workspace here and I can always write to it I don't check for conflicts as I'm doing this So in this case here what's going to happen is We're going to replace the write timestamp though with infinity Because unlike before under the basic timestamp ordering protocol I got a timestamp when my transaction began Under occ when you start you don't have a timestamp. You're going to get that later So at this point I don't have a timestamp. So when I do my write I just set it to infinity to say I don't know what my time is going to be But I definitely modified this thing. So give me a timestamp later on so I know what to put there Same thing on on b. I'll first check when I want to do my right Do I have it in my private workspace? I don't so then I got to go out in the shared database Copy it in and then I apply my update and the same thing I get a I get I get an affinity for my timestamp So now at this point the transaction says I want to commit And unlike in two phase locking where you just can pretty much just commit right away because you've already acquired the locks All the that you needed so all your operations were protected When I call commit in occ I have to then switch into two additional phases to figure out whether I'm actually even allowed to commit Because I was a lot to run at the very beginning without any locks or anything So now when I commit I got to check whether this is okay So the first phase on a validate phase is when I'm going to go through my I'm going to go through my private workspace and check to see whether I've written to anything that could have been modified or written Modified or read by other transactions either in the past or in the future depending on what protocol I'm using And if there's no conflicts, there's no overlap to my right set and a lot of the people's read right set Then I enter the right phase where now I apply all my changes from the private workspace Into the global database, but at this point here when I went in the right phase now I get my time stamp So now when I write my changes from the private space into this shared database I can update the right time stamp with the one I was given up above All right, is this clear again? There's three phases Read phase you do all your reason rights update the private workspace The validate phase will do some checking to see whether we're allowed to commit Well, I'll show what that is in a second and the right phase is if I pass the validation phase Then I can apply all my updates and at that point I get my time stamp And then I go and can commit and I can blow away my private workspace All right again, so this basically repeats everything I just said so in the read phase again We're doing all our reason rights and everything always targets the private workspace Right, if we don't have in our private workspace, then we go get a copy from the shared database and we bring it in there Now on the validation phase this is where we're going to check to see whether we have a conflict From our read write set in our private workspace with other transactions that have either run the past or still running now So there's essentially two ways to do this. There's the backward validation and forward validation So in the I think in the intro class last semester, I only covered one of these just because to keep it sort of simple But in the actual protocol you got to implement both of these So the backward validation the way it's going to work is that we're going to look to see whether our read write set Overlaps with any transaction that had ran in the past and it's already committed So say we want to commit. Sorry. We want to commit transaction two And so at this point in time when I want to commit I have to look back to say what other transactions have committed before me And look at their read write set and essentially see whether I overlap with them And you can do this by looking at the right time stamp That's in the shared database and see whether there's you know Whether you you read or wrote something that is correct So in this case here, this what it's called the validation scope for this transaction Is considered this point in time back just for this transaction up here This other transaction three at this point in time is still running So it's hasn't committed yet. So we don't we don't need to check anything that it did We don't need to check these all these other guys With forward validation. It's the reverse So this same thing we want to commit transaction two at this time here So we only need to look at transactions that are still running in the future, right? So they will commit some point in the future and we need to decide whether we have written to anything that they read Or wrote to in their transaction and therefore that would cause a conflict when they go to commit So either one is is the same. It's really about Making sure all the transactions do their validation in the same direction So you can't have some transactions do backward validation someone do forward validation Everyone has to always go in the same direction And this is enough to guarantee that your your your ordering is serializable So sort of sort of think about this The forward validation is one of the sort of think about Right. So at this point in time when I commit I don't know anything that this transaction actually did yet and I don't care So if I wrote to something here and this transaction is going to read it But it doesn't read it to here then that's fine because when I commit then it'll go and read my latest change and that's that's correct If I wrote to something here And then this guy read read that same object Since at this point here my transaction hasn't committed yet So my updates from my private workspace have not been installed into the shared database When it this transaction the bottom data read it would read the old version the old value Because You know again this thing didn't install any updates So in that case here I would have to abort my transaction And then this guy would be able to proceed because it would have looked at the state of the database as it existed before this guy made any change So again the key here is everything always goes in the same direction And you check to see whether there's any intersection between the rewrite sets all these different transactions based on what direction you're going Yes Like other transactions His statement is can you do OCC when you abort other transactions instead of yourself? So at this point here I can't abort this guy. He's already committed. He's gone right for forward validation Yeah, I think that that would be correct. You can still do that And it's sort of when you when you do this in parallel you end up aborting other people as well Yes Your question is how do you do with the check and miss? What do you mean by that? If you check basically If you have through parallelism it communicates that you are checking some Once you said it's okay the transaction at the same time read that thing All right. So his all right. So his statement is While I'm doing validation someone else Like I do my check There's no there's no overlap in the read set So I think I'm okay. Then I tried to do my Then actually then do the read and I'm not going to get a new version because I haven't installed the updates yet Yeah, you have to make sure You acquire latches on the tuples to make sure that doesn't happen and we'll see this in silo Okay So the example that I've showed here and then how we teach this in the intro class is is what's called serial validation or single threat of validation So there's essentially a single global global latch A critical section that that the distance we use is to require that only one transaction can enter the validation phase at a time Um In a modern system that you wouldn't actually do that So with parallel validation, essentially what happens is that not only do you need to check Other transactions in the past or in the future while you're doing validation You also have to check the other transactions that are validating at the same time and see whether you have you have an overlap with them And so the way you're going to do this and we'll see this in silo Is that you essentially acquire locks and latches for the records in some fixed global order And that avoids any deadlock. So I I need to I need to validate Records a and b I have two threads. They're both going to try to acquire the the right latch or the right lock on a If one guy gets it and the other guy doesn't then that the other guy knows that well somebody else updated it too and I go ahead and kill myself uh And then because I'm always doing this in in a fixed order like an electro graphical or primary key order It's not like one transaction has acquired b and then a the other tries tries to get a then b If everyone always gets a first and then b followed by that then there can be no deadlocks And we don't have to do any coordination or have us up a thread Yes So his state his question is are we assuming that we're using store procedures for this Yes for silo for this discussion. Yes, but the protocol still works exactly the same as if you had a conversation with api There's no difference. It's just faster with the with the store procedures Yes So his statement is you could end up reading some value that could be corrupted What do you mean by corrupted that's a that word actually means something very specific, right? That means somehow You can't right so his statement is Could you read a trans could you have a dirty read? Could you read a object of value from a transaction that has not committed yet? No, because all your updates from other transactions go into the private workspace And the private workspace is private no other transaction will read into that So they're always going to read the shared database and that's always going to be a value That was created by transaction that successfully committed Right you have to get back you have to get past the validation phase before you get to the right phase So when you get to the right phase you apply all your updates and you know your transaction is committed The question I thought you were going to ask was does this mean that Uh, you could execute a lot of a really long transaction with a lot of queries doing a lot of updates Right and then only when you get to the validation phase you realize Oh, I have a conflict for the first query executed. So I have to abort and roll back and waste all my work The answer is yes That's a big difference between this and two phase locking Two phase locking you can never have that because if you have a deadlock or an issue on the first query you execute You would know that right away. Whereas in, uh OCC you have to wait to the very end to figure out whether you're actually allowed to commit or not All right, so in the right phase is basically as I said, you just apply all your updates and make them visible to the transactions And because we're going to use right latches or right locks on the individual records This ensures that they are they are are atomically visible to all transactions So nobody note, you know one transaction commits and does two updates No other transaction will be able to see one update and not the other. They'll see all at the same time Because you have to acquire the right latches on the records before you read them Which is cheap because we need to compare and swap So another big thing that we don't cover in intro class But is is super pivotal to this discussion here is how the hell do you actually get timestamps? Right in my example before I basically said that oh you just have a single counter You add one to it Right and you can protect it with the mutex And again, if I die and have a tombstone put mutexes at your enemy, right? That's the worst possible thing you can do Right because mutexes are really slow because because if you have a contention or conflict on them Which you would if you have a lot of transactions trying to update the same counter Your threads are gonna end up being a sys call down into the kernel I think because the operating system has to know not to to know to de-schedule your thread and that's going to be really slow So mutexes are the worst option You definitely don't want to do that The alternative is to use an atomic addition. So this is the compare and swap thing I said before So you have a single memory location That has your counter and then you can call call compare and swap to add one to it And you basically have a while that says if I do compare and swap and I fail I come back and try to do it again So that's really really fast because it's again, it's a single instruction But the issue is that if you have a lot of cores that are all trying to update this single timestamp Every single time you do an update then you the the cpu has to send cache and validation messages across the the hardware across the Either different sockets the different cores To invalidate that cache line that has that value because they got updated So if you have a ton of different Cores running transactions at the same time and they're all trying to do this atomic add on the single counter Uh, you're not going to get good performance. It's this is actually end up being on bottom that A way to fix this is actually what's used in silo is to use the back to atomic addition So instead of doing plus one plus one plus one on this counter You can do plus 10 or plus 20 So now a thread running on a core can get 20 transaction or time timestamp IDs Transaction IDs all at once and then hand them out locally to every single transaction that starts And only when it runs out of timestamps that it goes back and s some other the central counter for the next batch Right, so that solves that problem but the problem with this one is that if you have a Uh, there's a certain degenerative case where you could have a transaction At running at one core that has a higher time timestamp than everyone else so And it conflicts with this one transaction they're running on another core with lower timestamps because their batch is lower You could burn through all your Time stamps because every single time you restart you're going to conflict with that other transaction That a transaction is always going to have a higher timestamp than your batch And you can burn real quick real quickly through your batch and you have to keep going back So now two alternatives are to use uh a hardware clock And we'll see this in the cicada paper next week There's a new instruction that intel added where basically they have a single They have a clock that's guaranteed to be in sync across all cores And then you don't have to do any cash invalidation messages You just go you just call this instruction and read and read this clock Um, and it's not a hardware clock and sorry, it's not a physical clock meaning like corresponding to the wall clock time It's just the a number of instructions and nanoseconds Since the process started And then an alternative doesn't exist for something we implemented in a simulator rebuilt Is a hardware counter essentially think of this as a central single Value in the center of the of the cpu that everyone can can do a plus one on very efficiently with a single instruction And you don't have to do any cash invalidation because it's sort of treated as a as a special register So again, this doesn't exist. This is something that we proposed in in a paper that we wrote So just give an idea of the performance you can get with these different methods So this comes from a paper that I wrote with a pitch a student back at an MIT Uh, do an evaluation of different Curriculum protocols on a cpu simulator with a thousand cores So you can't buy you know, you can't buy cpu with thousand cores. These guys at MIT wrote want this thing called graph light that could simulate Uh, simulate one it would run on like 50 machines And run like 10,000 times slower than the wall clock time, right? So even though it was a single socket cpu at least the simulator It had to run a 50 machines even actually get this thing to run And so what you see in this graph here, you see a single sort of Multiple threads or scaling up the number of threads that are all trying to allocate timestamps at the exact same time Just a while that it goes through and says give me the next timestamp over and over again And so what you see is the mutex case Which I said which I said is the worst and you should avoid This is a good example why because you can never really go past maybe like two or three million timestamps per second And as you add more cores it gets worse and worse because the invalidation traffic From this uh compare and swap operation Overcomes the the cpu The one that actually works the best is the uh the harbor clock Which actually does exist in intel cpus But you can see how these other the batch mechanisms help a little bit But then if you add more contention Then again the cash coherence traffic starts to hurt you so the The only issue with the harbor clock that seems like clearly the one you wouldn't want to use The only issue with it if you go read the errata from intel They had this language that sort of says yes, this is this is this is the new instruction we're adding But it may not exist in future cpus, so Tread lightly in some ways right, but as far as I know the latest versions zeons all have it so It's fine and we use it and it works all right So this is just to show that you never want to use mutex The silo batching thing does help a little bit. I mean actually I would say It's very unlikely you really need a hundred million time stamps per second, right? That's a high number But that's that's where you would sort of fall off up there All right All right, so now with this now with basic occ And with ways to efficiently allocate time stamps now we can talk about in the remaining 20 minutes The modern protocols and in particular the one I want to focus on is the silo one that you guys read Because in my opinion, this is this is when this paper came out This was this has made a big impact and it's been very influential So silo is a single node in memory old speedy data internet system It's a new serializable occ with parallel backwards validation And it's only going to be able to execute store procedures So the paper was was published in s osp, which is the the top systems conference not a database conference So there's certain things in the paper that they use to describe Uh certain things that normally in a database world we say one thing they say something else So they don't say store procedures. They say like a one-shot api But it's basically the the same thing and then when they talk about the different phases of the protocol I think it's like phase one phase two phase three, right? That's the read phase validate phase and write phase that we talked about before So the key idea about uh silo Uh The the the motivation of the system's design is based on this observation that they want to avoid any rights to shared memory For any for any read only transactions, right? Uh And by doing that that's allow you to get better scalability on a sort of multi course multi core system Then you would in you know, if you if you weren't careful about how you organize things The other thing is that they're going to you're going to use the batch time stamp allocation that I said before and they're going to organize the the The the notion of time in the system based on on epochs So the reason why I had you guys read silo because This is going to come up the system is going to come up again when we talk about logging because they have a really good People on how to do efficient logging uh that that we're going to focus on And there's other aspects of the system that are going to come up in other parts of the system We talk about like how to do garbage collection with epochs right I also like silo because uh Eddie coler the guy that helped the professor that worked in the system helped write it and was on the paper He's a he's a professor at harvard. He's probably one of the best system professors in the world right now Right because not only is it come out with awesome ideas He actually slings code and if you look at his github of commit history He's always writing code all the time if you ever use hot crap to submit a paper Uh like the online submission website that's written by eddie coler um In the 1990s he wrote a lot of x11 games that ended up getting ported to weird places like he wrote His own version of mahjong and then for a while if he played mahjong on the airplane game Like system that was eddie coler's version of mahjong um So he's he's awesome, uh What's really this paper has been really influential even though it wasn't even published in sbsp There's been a ton of research in database conferences trying to improve uh silo So this is a this is the the program list uh for a session at sigma 2016 that I chaired on transactions And three out of the six papers were based on silo So they were took silo and tweaked in some way to try to make it faster Right my paper is the other one and these other two are clearly separate things Right, so there's been a ton of papers since silo has come out where people try to improve it in You know various aspects of it, but at its heart the system is pretty solid and it's it's it's very very fast Just giving an idea how crazy or awesome that eddie coler is Uh, he doesn't trust anybody else's code So if you look at the silo source code or for mastery the the index he built for silo He basically wrote everything himself Right like if I need you know if I need to parse json, I'll just find open source json parsing library and just use that He didn't trust anybody else. So he wrote his own json parsing library just for his database system, right? That's that level of uh I'm gonna say craziness. That's the level of dedication that that I highly admire Okay, so The core idea of of silo is that it's based entirely around these epochs So what's going to happen is it's going to divide the notion of time into these fixed length epochs That occur every 40 milliseconds. So I asked them why do they pick 40 milliseconds? Is there any special meaning behind that number? They said they just picked it and it worked and so that's what they used Right, but so the epoch could be any length. They just have to use 40 so what's going to happen is that All the transactions are going to start within this start within the same epoch Will be committed together in a batch at the end of the epoch So unlike in regular occ that I showed before your transaction can start at any time and commit any time Right under silo with the epochs. You can start with any time within the epoch when when you go to commit You have to wait until the epoch ends before you're allowed to commit And so what's going to happen is that at the end of the of the epoch that's when they actually do all the validation And and check to see what you know with the correct ordering of these transactions And the idea here is that since we want to reduce the amount of reads and writes to shared memory While the transaction is running. We don't let them do do any validation and any coordination During their normal execution. It's only at the epoch. We pay this big penalty to go read a bunch of stuff in across the different cores But because we're doing it all at once. We don't slow down the regular execution of transactions Uh during during the epoch Because again the core idea is that the threads only need to synchronize with each other At the beginning and the end of each epoch now They do have to handle the case where if you have a transaction that is longer than 40 milliseconds You have to do some extra stuff to make sure you refresh the epoch so that it gets moved over into the next one And the paper talks about how to do that So the way they're going to allocate timestamps is through batches as I said before And so there'll be one thread that's dedicated as the epoch thread So it's responsible for figuring out when the epoch ends and and fint starts and finishes So it has its own little counter that it it controls So again, you don't do any invalidation every time you update this with a parent swap So it says it says I have a new epoch Here's the epoch number and then it sends out to all the different worker threads They're batches of timestamps that they can use for transactions that they assign that are running locally to them And the idea here is that at no point when this work is executing transactions Does it ever have to coordinate with anybody else because it got its batch of timestamps at the beginning Now if you run out you can always go back to epoch Thread and ask for more But you know you you want to avoid that as much as possible So now the same thing when the epoch changes Or you you increment it then you go out and send the message to everyone else says all right This epoch has ended here's the new one everyone does their validation to see whether transactions can commit So let's look at the commit protocol So for every single tuple i'm going to prefix it in the header with a transaction id word So basically they're they're packing in a bunch of extra data within a single 64 bit word So you're going to have the epoch the batch timestamp right for every unique transaction within a batch running at that core Uh, and then you some extra metadata to keep track of the right lock for last Whatever you want to call it the latest version bit which you can ignore for now Then an absent bit to say whether this thing's actually been deleted or not Right, so we can pack all that in within a single 64 bit integer So now if I have actually a transaction here Let's say this transaction ran and it did a bunch of reads and writes So it now has updated it's it's uh, it's read set and write set and it's private workspace Then now it wants to go and commit We need we need to do validation to see whether we're actually allowed to to to apply or updates into the database So the first thing we have to do is you need to lock in the In the shared database The tuple we modified to make sure that no other transaction nor the thread Tries to do do the update at the same time Right, so this is being done on the the validation phase at the end of the epoch So in this case here, we we can go grab the lock on this guy Right, and that's fine because nobody else has got it We do compare and swap and to try to get it if we can't get it Then we abort ourselves and roll back right and we aren't transaction has failed So in this case here is we got it so we can proceed with the next step Where we want to examine the read set to make sure that We read the correct version that nobody else wrote to something that we should have read So for this we can track the The transaction word that's embedded in the header in the read set and check to see whether The epoch timestamp inside of the the tuple that's in the shared database is less than that And so to do this we have to acquire a The latch to do this quick check If we can't acquire that latch then we know that somebody else is is updating the same time We're trying to validate it validate it. So we think we have to roll back roll back and fail So in this case here that works fine The other transaction trying to do this then we try to validate this guy We would see that we already hold the latch for this So we know that it's ours and we're allowed to go ahead and we were able to read it So we don't worry about anybody else trying to update at the same time And then now at this we've done our validation. We know that everything's correct So now we can install our write set So what is a copy and then our update into the the shared database and then once we're done We can we can release the the lock And all our updates get installed All right Again, we'll do that same ordering that I said before where we make sure that we order the the order that we acquire The locks into the shared database is in the order of the Like the transaction tuple or sort of the tuple id or the record id or the primary key So that way all the threads are trying to acquire the locks in the same order And then when we release the locks After you apply our updates we do we do it in reverse order Right again to avoid deadlocks Another big aspect of this is how to do garbage collection Uh, so silo is going to what's called cooperative garbage collection So the threads that any object that was created by thread that thread is responsible for deleting Right in other cases when we talk about mpcc We'll see that other threads complete other objects made by other threads For our purposes here in silo we only can delete things that we did that that we modified so the Key idea of how this is going to work is that we can use what is called a reclamation epoch So the epochs are going to use we're going to use those to figure out how to order the transactions We're also going to be able to use them to figure out when it's safe to delete an object Because we don't want to delete an object and have a thread still looking at that Because then it may follow the pointer tries to actually interpret that data and it'll it'll invalid read and have a problem So this will come up more when we talk about um How to do garbage collection in in index data structures, but at a high level it's basically the same thing right and this uh It's called rcu in in in the operating system But the basic idea is again we use these epochs to figure out when it's safe to actually delete things So we'll cover this more uh next week For range queries, uh, these are also something that you have to handle because uh You can guarantee serialized will order by an occ by itself If you just make sure you order the updates to the to the the shared database correctly, but we can still have uh Range queries done on the index where fantasies could occur meaning someone deletes something in our range or something adds Somebody adds something in our range and we try to go back and read that same range again We would get different results So the way they're going to handle this, uh, and we'll see this later on when we talk about hecaton Is that they're going to keep track of the scan set Of how what nodes the transaction read on the index Then when you go to commit and when you do do validation You go basically go back and do the exact same scan again on the index and see whether you get the same result If you don't then you know that somebody else did something that modify your range uh Before you know Since the last time you ran it if you want to insert something that you can put in virtual entries to say Hey, I'm going to start this key in here. I haven't actually committed yet So don't actually put the value in but that will prevent somebody else from trying to insert something at the same time So again, we have to do this because whenever we have indexes We still have to maintain serializable order even have queries that touch the indexes So that's why they do this extra stuff There's other ways to do this like key range locking index gap locking and we'll cover that uh next week when we start talking about indexes So there's one graph I want to show you uh from the paper And the reason why I like this graph is it's a good example of of A sort of scientific experiment that is run on a different machine that corroborates results done in another investigation or another paper so In the in the side of the paper, they talk about this notion of a partition data store Or they meant they reference h store or volt db with the system I help work on Uh, and the basic idea here is that we're going to split the database up into shards or partitions And then we'll have a single thread execute all transactions at that partition So so it's it's single thread it means only one transaction can run out of time And if you have to touch multiple partitions, you have to acquire the locks and these two partitions before you're allowed to start running And so what they show is that when you have zero multi partition transactions or cross partition transactions So this means that at this point in the graph 100 transactions only touch one partition So if you use the h store model with or what they call the partition store That gets the best performance But as you increase the number of cross cross partition transactions The performance gets much worse because now I have to lock every single partition And because i'm single threaded no other transaction can run at the same time So I could hold lock for the entire cluster and they only have to update maybe one of you You know a one record per partition But I still have to acquire all those locks and nobody else can keep on running So what it shows is that around in their case here around 15 percent Uh When you have 15 cross partitions actions the regular silo implementation Actually Gets better So if everything single partition the h store model works the best But then when you have more multi partition transactions these the performance of silo is stable And so what I like about this is I when I ran my own experiments with the hro system We built we would exceed the exact same measurement Right so around 12 to 14 percent Multi partition transactions in your workload the performance would start to start to tank and fall fall apart So again, it's a nice example They had a completely separate system implement the same algorithm we implemented Uh, and they got the exact same result. So I like that All right, so, uh, we have what five minutes left. Um Yeah, I'm not going to get through silo or ticks off. That's way too complex. Um, so let's let's just skip ahead We'll finish up So any questions about silo Yes So your question is um Do you actually need to keep the entire read set In in in sorry the entire tuple contents in in your read set Uh, I don't think they talk about that you could just have the record ID, right? They they mentioned we indeed they say like they like spin on a lot So So this question is could you just always read from the shared database Uh And not worry about means making this copy into here. Um Yes, but it also depends on where this this data is actually stored So you may want to actually just copy it into your private workspace because the the share that portion of the shared database for the Obviously you want to read Might be on another socket So going over the qpi to the other socket would be really expensive Um, whereas reading this from your local cat like cpu cache should be really fast I So from a correct standpoint it would still work But for you know simplicity you can go either way So again, I that that was a bit rushed. Um I think the core thing to remember in case of silo is this notion of epochs The notion then trying to avoid the the rights to shared memory Only until you have to actually do the commit and for the garbage collection stuff we will uh We will we will handle that. Um when we talk about indexes later on So the the the other uh The other the thing i'm skipping over is this, uh OCC implementation called tick tock And at a high level what i'll say is what they do instead of having this separate epoch thread Uh handle out these time stamps for transactions You actually can derive the time stamps for for tuples for time stamps for transactions Based on the tuples that they read and write And that sort of has a decentralized way of allocating time stamps without having a single thread manage everything But at a high level it's still doing OCC It's still doing the the read phase route right phase and validation phase. So all that's still the same all right, so the major trade off to have OCC versus other transactions is that Sorry other concurrency protocols is that uh with OCC we basically have no runtime overhead Our very little runtime overhead while the transaction uh is running But we don't find out whether the transactions are actually able to commit or not When we actually go to do validation So if you have a a workload with a lot of contention where everybody's trying to update and read the same thing Then you could end up doing a lot of work that ends up getting wasted and you end up rolling back in uh In OCC there are some ways or some methods to actually do partial rollbacks and partial reordering of operations Um, but that as far as they know they all require program analysis. So they're not, uh They're not easily extendable. They're not easily uh implement them And as I said before silo is a very influential system that even today people are still trying to Crank out more papers make it and run faster, which I to me is not very exciting, but Other people do it Okay, um For next class. We're now switch over to multi version currency control So I gave us sort of a crash overview of two phase locking and we spent more time at OCC today Um, both of these protocols we can then implement when we actually do a multi version environment So the paper you guys read for wednesday will cover not only the concurrency protocol to do Multi-verging but also all the other design decisions you have to deal with When you uh build a multi version database system because most of the papers that are out there only deal with Here's how to do OCC or two phase locking with mvcc But if you have to worry about the indexes you have to worry about garbage collection You have to worry about how you actually maintain pointers between different versions When you build a system and so The paper you guys are going to read is actually one of my favorite papers that I I've ever written Right and I think so highly of this paper That when we submitted it we called it the the best paper ever And the reviewers came back and this was the the the main thing they complained about says, please remove This is the best paper ever from the title revise it. Um, we went through multiple iterations So I'll show you on on on wednesday, but it And the title is like something very banal of our like the empirical valuation of multiverse control, right? It's something The title is not do the paper justice and we put it that way. Um I think but the reason why I think it's the best paper is because it covers everything That you need to have to build it in memory multi versioning system And most of the data systems that are out there have been built in the last 10 years are all doing multi versioning And so they all have to make these design decisions that we're going to cover in that paper that No other sort of paper has really covered until now And I will say also too that this paper came out of this class. So the first time I taught this class I had three master students and a visiting student peachy student from singapore Basically do all the work we needed to do to to publish this paper Right, so that's what I'm also very excited to because it reminds me the first time I taught the course and we got a paper out of it Okay Any questions? I need something refreshing when I get finished manifesting too cold a whole bowl like smith and wesson One court and my thoughts hip hop related ride a rhyme and my pants intoxicated Lyrics and quicker with a simple moan liquor to summer city slicker play waves and pick up rhymes I create rotate add a wave too quick to duplicate philip rees as I skate Mike's the Fahrenheit when I hold him real tight then I'm in flight Then we ignite blood starts to boil. I heat up the party for you. Let the girl run me and my mic down Well oil record still turns with third degree burn for one man I heat up your brain give it a suntan to just cool at the temple to rise To cool it off with same eyes