 Okay, let's jump into the second lecture. So right now we're about like an hour south of Reno in this town called Mason We're kind of hold on to this house from We know from the old days like he used to roll. It's like a huge Dino Bravo fan I used to roll with JL in Seattle On the streets, you know rock up So we were just kind of out here anyway So we needed a place to go chill for a bit and kind of wait till things cool down So we're out here and he's kind of let me record the lecture here in this weird like shed thing He has on the side of his house. So So for today, what we're going to talk about is Is sort of get into now talking about transactions? So I want to first start talking about some background material about Sort of set the stage from what we're going to talk about throughout the the semester And then we'll jump into the materials of my transaction models creature protocols and isolation levels So the as I said in the first day of the first lecture This course is all about building modern data system So we're focused on how to build systems for today's OTP or transaction processing workloads and Analytical workloads and so the way to sort of think about where we're going in the course materials that the first three weeks Are going to focus on transaction processing So for emery databases, how can we execute transactions efficiently and ingest new data quickly? And then the idea then going forward after that is that we're going to focus primarily on How to do analytical operations or exclude complex queries on the data that we've collected from our from our transactions and transactions and transaction processing workload and then we want to derive new Information that can then inform whatever decisions our organization have to make have to make So what do you think about this is like? We're going to focus on how to execute transactions how to collect data quickly You know whether it's coming from website or whether it's coming from IoT sensors I want to see how can we get new data into our data as quickly and then what we wanted to be able to do is then If we collect a lot of data Then can we how do we execute queries that have to do joins aggregations other things do that efficiently as fast as possible? So the typical way that people think about Database workloads is usually along the lines of these two primary categories. I'm shown at the top here So all a TP and O lab so online transaction processing workloads are that first part I talked about where you're ingesting information. So this is where the application was to execute Operations very quickly that are going to read and update the database and the in the context of a transaction but the idea here is that the amount of data that they're going to Operate on per transaction exactly quite small So the the classic example that I always like to use is to understand. This is is something like Amazon So with Amazon you you go to the website you add things to your shopping cart You make purchases maybe update your address information Right all of those those operations that I'm describing those are transactions that are only going to operate on Just your data Right, so you can only go to the website and update your account You can only update your payment information So each of those transactions are only touching the small number of tuples for just to update your information Now contrast this with all that workloads are online analytical processing workloads These are where the the queries are concerned about Looking at the total data or large segments of the data So rather than looking at you know, just your account information They want to look at multiple people's account information and derive new information or identify different patterns Allow them to make better decisions in their organization So this is typically doing complex queries with joins that may be confusing like the kind of aggregation So again in OATP, we go to the website and we update just our account information with OLAP We want to look at maybe everyone's account information that in you know In the month of January in you know in the zip code one five two one seven that bought blue shirts or something Right, so you're looking at large segments of the table and you want to compute new information So again, typically this is how people think about database systems And this is typically how they're marketed or sold the Change we've seen in the last five six seven years is that we sort of seen a new Category of database systems called h-tab or hybrid transaction analytical processing workloads And the idea is that we want to be able to process Both OTP workloads and OLAP workloads inside the same database instance Without having maybe separate bifurcation or separate machines in a bifurcated environment So explain what I mean by that like a bifurcated environment. Let's look an example. So this is actually a pretty typical Database system setup that that you see in a lot of applications So you have your sort of front-end side your old OTP data silos And this is where you're gonna exit your transactions that are that it's going to ingest new information so the the The reason why they're called silos is that the transactions are really operating on just one database at a time Like they're not going across across machines across data instances. It's really focused on like for my database instance I do my transaction on just in the data inside of it So now you want to do analytics on this data you've collected across all your silos but you don't want to typically your Traditionally you don't want to do this on the actual data silo be able to be inside itself Because that's going to slow down your transactions and that's a bad idea because this again This is where you're you're ingesting information because you're placing orders. You're making stock trades You're doing some kind of financial updates, right? You don't want to slow this down because this is sort of the core part The core business logic or the purpose of the of the organization So what people do is they use what is called ETL Extract transform and load these these are frameworks that that specialize in Getting data out of these front-end old-to-be data silos Do some kind of cleaning or transformation that could put it into a sort of standardized form across all your silos And then you're going to load it into your database So again the way to think about what's going on here is say we have a bunch of old-to-be silos Where one database calls the field for someone's first name is first name first underscore name And then another database silo calls it f name. So in the transformation process We would then know how to fix the schema so it's standardized across all our database instances So this obviously takes time because you're copying data out of one database Putting it into some kind of boss or message patches in a construct or framework to then shove it back to your data warehouse So this takes time And it's expensive if you're going to be loading a lot of data So then we load all this data out of the ETL framework or toolkit into our analytical data warehouse And this is we're going to run our analytical queries So these are our complex joins our aggregations to extract new information from the data We've collected from the old piece of and in some applications You actually end up pushing updates from the back-end data warehouse to the front-end old-to-be systems To allow them to modify their behavior. So think of something like You know, Amazon wants to figure out what items to show you to make you buy them So you collect all the data from the old-to-be side of what people have bought you shove them through into your data warehouse You compute some complex analytic program to figure out what people that bought this bought that and then you make You send an update to the database in the front-end so that the next time the user comes back you show them You show them, you know certain products you think they're going to buy So what people want is they want to cut down the time it takes to to collect this data from the old-to-be side and then produce Actionable information on the analytical side as much as possible. So this is sort of where the H tech fits in So we're going to be able to run the same types of analytical queries that we typically can only run on the data warehouse But now we're going to run it on our front-end transaction processing database So you're not going to run all your your queries. So it's certainly You know, it's very complex things may not be suitable for running the front-end because that's going to slow down your system But the idea here is that things that could only be done in the back-end now we do in the front-end So we cut down that time it takes to transfer the data through the ETL pipeline Into the data warehouse and pushing it back. We can just do it to our analytical operations directly on the front-end The other thing to point out here, too, is the ETL process doesn't go away Right because we still have these data silos in the front-end where they're not talking to each other So we still need that to put it back into our data warehouse and certainly we're talking about in memory databases We don't want to store it, you know, maybe all our data in the front-end because that would be expensive So ETL doesn't go away. It's just things that we can only do in the back-end. We can now push to the front-end Another way to characterize this dichotomy or the distinction between these different workload categories is Through this nice sort of Venn diagram chart that Mike Sturmbricker came up in a publication a few years ago And so along the y-axis he's Remarking on whether the the types of queries that these workloads are going to support are either simple or complex and Then on the x-axis, it's whether the workload is focused on doing a lot of writes or doing a lot of reads So you see at the two extremes who have the OLTP Workloads down in one corner where the the operations are quite simple or you're not doing crazy 10-way or You know 20-way joins It's just doing joins and foreign keys looking up a small amount of data And you're doing a lot of writes because you're making updates based on how the outside world interacting with your application or your database and Then the other end the spectrum you have the OLAT workloads where it's mostly refocused because again You're computing you're looking at all the data you collected from the OLTP side and the types of queries You're doing are typically more complex Again, the h-tap stuff is sort of meant to straddle in the middle We started doing doing a little bit from from from OLTP side and a little from the OLAP side, right? So that's the way to sort of think about what we'll be talking about for the entire semester and part of the reason Why I'm starting off with us talking about transactions because if you want to be able to support h-tap workloads Then as we talk about all the analytical OLAP stuff throughout the course in the back of our minds We need to be thinking about well How would I actually apply this in a system where we're still exiting transactions and I don't want to slow those transactions down? right, how do I make sure I'm not stealing resources or scheduling things that I need for my transactional workloads When I want to execute analytical workloads So that in the back of your mind throughout everything we look at or talk about to that entire semester You should be thinking about all right How would I actually do this when I'm at you know this technique or build this kind of type type of internal component? When I know I need to execute transactions as well All right, so that's and that's through the background of where we're going for the next three weeks and why we're focusing on transactions first, but the Protocol and the the methodology we're going to focus on is called multi-version compared to your role and we're primarily only talk about that that is actually a Currency or protocol or implementation Method that supports H2P workloads quite quite well, and we'll see why as we go in more detail next class All right, so now let's talk about transactions So again the background material you should already have about what transactions are I just want to go through a little bit at a high level What I mean when I say transactions and what how this is going to fit into what we're talking about throughout the semester But everything again this this should all should be review for everyone here so a transaction is just a sequence of actions that will be executed in a Share database to perform some higher level function. So by higher level I mean not a you know a right to a single record I mean something that's at the application level right so again using that Amazon example adding something to your shopping cart, you know Updating, you know your your your state for your shopping cart what items are in it That's a higher level function that can be invoked as part of a transaction So transactions are going to be the basic unit change in the database and what that essentially means is that? All operations will be done in the context of transactions and the database and system cannot allow any partial transactions Right and that that's the sort of fundamental guarantee of what a you know transaction processing data system What we'll do for us Whereas in if you use like a no sequel system that doesn't support multi operation transactions or multi document or multi record transactions That logic you have to write yourself in your application and you're probably going to get it wrong So you don't want to do this so it's ideally the data system can provide this for you so now we can talk about what these actions actually are and the I like to use this category here because again it sort of focuses on What is something that's a fundamental thing that the database system is going to provide and what's something that? The underlying system that it's running on will do for us and we can build on top of them So we have unprotected actions protected actions in real actions so unprotected actions are again the the basic primitives of what a computer provides for us and they're not going to have all the asset guarantees that we're going to want for Our for our regular transactions the high-level transactions But we can use them in clever ways or redundant ways that allow us to Achieve what we want in our protected actions So one way to one sort of obvious example to think about this is writing to a disk So most most disk drives for SSDs will only allow you to do atomic 4 kilobyte writes So if I need to write say 12 kilobytes, which would be three four kilobyte pages Then the the drive itself cannot guarantee that all three writes will be atomic it can only guarantee that's one of them So that's what that's what we mean by an unprotected action So now protected action is something that'll be built on top of unprotected actions That it's going to allow us to again provide that full asset guarantees that we want for our transactions Right, so the way to think about this is we can use the unprotected actions to do Unprotected, you know to do one four kilobyte writes to the disk But then we can maybe write to a log and write to our table heap as well and flush those out and the combination of the two of them will be enough to guarantee that our transaction is fully asset The last category is called real actions that these are not we're going to focus on throughout the semester these are things that Modify or affect the real world the physical world in ways that the database system has no control of So it's it's it's basically impossible to be able to undo changes that may occur in these in the real world so the example I always like to give and say we have a order processing application and We actually our transaction and we're updating payment information and then halfway through the transaction We send an order confirmation email to the user to say yes, we process your order But then we keep for whatever reason the application is written poorly we keep executing more queries after we send that email and Those queries end up failing because whether there's a conflict or some other problem and now we need to abort that transaction but the problem is we sent that email out to To the customer and that's actually now out in the physical world and we we can't control that we can't undo or retract that email So these are the kind of things that Or what I mean by a real action and these are not something the data system has any control over so these aren't things that we're actually going to care about All right, so now with this now we can actually talk about how do we use Unpredictive actions to build protected actions and support different types of transactions So we're going to talk about a bunch of different transaction models that you can have a data system can provide I Was say sort of spoiler is going to be The flat transactions and the flat transaction of say points are probably the really the one They're the most common types of transactions that are used in the real world And they're probably the only ones are actually they are the only ones we're going to focus on throughout the semester But I still think it's useful to think about all the types of transaction models that people can have in Using their application that go beyond so the classic textbook definition of transactions that we talk about in a introduction class So the first type of model we're talking about is flat transactions So flat transactions are sort of what everyone thinks of when they think of a transaction like this is the standard approach that everyone uses Right, you have a begin statement followed by one or more actions And then the transaction finishes when there's either a commit or a rollback statement Or also the transact the data system could also abort your transaction If there's a conflict right but for that one we can ignore it now So let's say it's here transaction one just begin There's a read on a right on B then commits and at the commit point when we notify the outside world That our transaction has successfully committed Then we know that our right to be is durable any transaction that comes after us will be able to see it and if the system crashes will be guaranteed that That our changes are still there when we come back online In case of transaction to there's a read on a right on B Then it rolls back and then what's happened to that point is any transaction that comes afterwards no matter how many times you crash Though they should not see the right to be So again flat transactions are what most people think about when they think about transactions when you Read any kind of tutorial online about how you do transactions and each other begins and commits It's essentially without calling it flat transactions. They are the flat transaction model I can't prove this, but I would say probably 99% of all the transactions executed in today's software in the world is It's running flat transactions All right, so what are some of the problems with this? Well, the first problem of flat transactions is that the There's no way to do any partial rollbacks right, so When I call rollback in my transaction, I have to roll back all my changes Right, so so say I need to update a hundred tuples I update 99 to them and then the one hundredth one. I recognize I need to roll that that might change back I got a rollback everything. I just can't roll back just the last one. I did So so related this is that if the system crashes Before my transaction completes then I lose all my changes Again, this is what we said in beginning the basic unit change of in a database system will be a transaction So there's no partial transaction So this is not surprised that this is going to happen But it kind of sucks when you think about really large updates to a database system Right, this is going to be expensive if you have to keep redoing the same work over and over again And the last one's a bit nuanced It's debatable whether this is an issue it depends on what your application wants but the Every transaction under the flat transaction model has to take place in a single point in time And you can't sort of stop and come back At a later date it always has to be sort of all or nothing at the very beginning So this is sort of abstract, right? Maybe it's hard to understand why this this may be problematic So it's actually look at some real some real examples So the first example of a deficiency with flat transactions are when you want to do what is called multi-stage planning so think of like you're you're like a travel agent and Somebody wants to book a flight from from Pittsburgh to Florence Well, you can't fly directly from Pittsburgh to Florence So say you want to go get a book a flight from Pittsburgh to JFK from JFK to Rome And then maybe instead of taking the plane up to Florence you take the train So each of those tickets I have to get is was one stage And I want to make sure that I get all of my tickets that I need rather than than then some of them right if I can't get the plane ticket from Sorry, if I can't get the train ticket from from Rome to Florence then Then who cares that I could be able to find a room. That's not where I wanted to go So in the flat transaction model you can't do that, you know automatically or natively inside the database system because it can't do these these partial rollbacks or Transactions you can't have these multi-stage operations It's gonna be all or nothing and therefore to be able to support something like this You have the right additional application logic to make it happen The other problem example we can talk about is doing bulk updates So I think I suffer this a little bit, but say I need to update a hundred tuples I update 99 of them, but then in the last one There's some kind of failure or there's a conflict and now I need to roll back all my changes So this may seem kind of simply maybe thinking I was a hundred tuples who cares, but you know now think about one billion tuples Think about an operation where I need to update at a bank all my customers Accounts with something so it's got one billion customers. It's gonna take me hours to complete and during this time I could have a conflict or the system could crash and therefore all the work that I did We'll just get rolled back when the system comes back online or when I bought my transaction And therefore I'm doing a bunch of work That's essentially wasted and I may never actually be able to complete my operation because there's always gonna be a conflict so the transaction models we're going to talk about next are Don't always aren't completely except for one and I'm going to completely remedy these problems But we can see how we sort of chip away and look at more sophisticated Or complex transaction models to be able to handle these these various types of problems So the first model we talk about are what are called transaction save points So as I said, these are probably the the second most common types of transactions you see in the real world after flat transactions the idea of a save point is that we're going to save the the current state of the transaction at At the moment you ask to create the same point And then that provides us now a handle that we can then refer to it later on to tell us whether We want to roll back any changes at that save point or also just you know delete any metadata We've maintained about it So this is all within the sequel standard. So I can create a save point with the same point operation And then I assign it a handle and then with the rollback I revert any changes going back to the that particular save point I want to roll back to and then with release. I destroy that say point just to free up some metadata So what I say is that for The way this is actually going to be implemented in various database systems depends on you know depends on implementation Logically the semantics are all defined in the sequel standards. They should be the same How's actually implemented under the covers can vary. So, you know one system I'm familiar with is Postgres So when you do a rollback in Postgres, what happens is it ends up actually aborting all your the sub transactions Up to the one you identify the rollback to and then it re-creates the subject transaction state So it's almost like it's like creating a new sub transaction after you do a rollback other systems Just revert all the the changes and and resume the the save point you come back to in the case of release This is almost like a Logical operation as far as they know, there's you don't actually physically destroy the save point Because this is required now to copy things Maybe in the rewrite set of the internal transaction state at that that at that save point into another save point or the implicit one And this is just sort of unnecessary work You just make it so that you can't actually access this this save point after you release it The one area actually save points actually come up really are really common is that you can use them and user defined functions or UDF To support exception handling because again allows you to just go back to the last save point right before you entered A try block or before you started the UDF the very common beat say what's a very common in Systems that support user defined functions So let's look at an example There's a simple transaction here once you're on right on a right on a right on B and right on C So that so when our transaction starts we call begin and this is going to create an implicit save point So the same point is not going to have a handle It's not going to have a name, but it's going to be created as part of this when this transaction starts So now when we do a right on a that's going to land in and our save point here So we don't care about the values. We're just saying you know the object a has been written into the same point So now we're going to invoke the save point command Which is going to change our implicit save point that was created when our transaction started and it's going to give it a handle So in this case here, we're going to say it's save point number one And then this is now going to create our new implicit save point for all our new rights We'll end up going into so once we create the save point and give it a handle then We don't update any of the transaction rewrite set in within that state point. It always goes to the new one So now we come back to maybe this command here Into a rollback on one. What's going to happen is we're going to blow away any modifications that we have outstanding in our save point And we're going to essentially revert the database back to the state. It was at say point one. So now We create a new save point the implicit one and now any rights go to that and then when we commit We see the changes that we made from A to C. We don't see the change we made with B Let's look at another another more complicated example So our transaction starts and this transaction wants to do a right on a B C and D So again when it begins we create our implicit save point And then when we do our right that gets gets put into that implicit save point We invokes the same foot command that gives us a handle for the first one creates a new one We do a right on B that goes into our implicit one And then we do I create a safe point to so now what's going to happen here is that because we're gonna have now multiple instantiated save points not in addition to those sort of implicit one It's going to link the second save point with the first one together. So it knows that the Save point one is the parent to say point two So what's going to happen is if we end up blowing away say point one that will implicitly So if we end up rolling back to say point one That's going to you know revert any changes that are in say point two Likewise, if we release say point one that'll automatically release say point two as well Again those changes will still stick around just we can't access it anymore through its handle So again now it's a video right on C same thing when we create a new say point say point three is now linked to say point two But now here we're going to do our release on two So this is going to get blow away the internal metadata we're maintaining about knowing That's a point to exist Although the changes are still there, but now we can't roll back to it And then implicitly because say point three is linked to say point two Say point three gets blown away as well Now we do our right on D that ends lens and our implicit say point and then now we try to roll back to three But again, we're not going to be able to do this because we release Say point two which implicitly release or cascaded that released to say point three Therefore we can't complete this operation and the transaction is going to get a boarded and all the changes will get rolled back All right, so One way to think about say points is that they are a single set of actions that can then be rolled back individually based on the say point handle Another variant of this is called nested transactions where you can actually have a hierarchy of actions or operations You're doing within the transaction So the way to think about it you have sub transactions within your transaction and The sub transactions the final outcome of these sub transactions or the trial transactions depend on the output the outcome of the parent transaction All the way up to the beginning of the tree so I would say that my understanding is that semantically save points and nested transactions are equivalent and you can implement Nester transactions with say points if you want It's almost like they're just sort of syntactic syntactic sugar So let's get a ball of a nested transaction So again, the way to think about this is that inside of our transaction We're gonna have all these nests to begin and commits, but the final outcome depends on on For the for the top most begin would depend on the the the last commit that corresponds to it Let's say our transaction starts we do begin and again underneath the covers we could be maintaining say points It doesn't matter right preferences here. We do want to see sort of at a logical level. What's happening? So we first do a write on a that's fine But that's with an average transaction, but then we see this begin statement here So what that's gonna do is it's gonna invoke a sub transaction 1.1 We're now all of the operations Between that begin and before they will the final commit are then gonna be executed within the context of the sub transaction And so typically you don't know all the operations you're doing ahead of time I'm just showing them moving moving over as a way to visualize this But you know, it's You don't actually know where the last you know last committed last rollback is in those cases and this is just for illustration purposes So now I do my right on B now that right on B is in the context of the sub transaction 1.1. I See another begin that fires off another sub transaction 1.1.1 And again all my operations that within the beginning commit calls get get executed over here Now I do it right on C. I go ahead and commit and all the changes from one point one point one point one are Saved but now they're sort of put into the context of one point one Right, but I I'm not so in terms of sub transaction 1.1. I can see the change I made to object C But technically it has the sub sub transaction has not committed yet because I need to see whether one point one commit So now I do my right on D But now I hit this rollback command So what this rollback command is going to do it's going to blow away all the operations of modifications made by my transaction 1.1 so that's the right on B and right on D But it's also going to blow away the modification made on C by a sub transaction 1.1.1 Right, please again the sub sub transaction It's final outcome depends on its parent transaction and its parent transaction is doing a robot So that rolls back everything that it did So now our execution context comes back to our They're a parent transaction that the globe the top most transaction It does a commit and then that only applies the change to a right the modification the BC and D Don't get persisted don't get actually stored because they got rolled back in the net-said transaction. So again Not all systems support save points and all systems for nested transactions At least as a mysql 5.7. They supported nested transaction Or sorry. Yeah, these were nested transaction process doesn't act to support this transactions Different data systems had different things and again semantically. They're essentially the same thing. All right, so now we can talk about a Another variant of a transaction model called transaction chains and In the case of save points in the case of of of nested transactions It was at the end of the day It is essentially one transaction Right. So when you when you commit that determines at the very end whether any of the operations or modifications You in you made in your sub transactions or your save points, right that determines whether you apply any changes With transaction chains the idea is that we're going to execute multiple transactions one after another and What will happen is the commit of one transaction and the begin of the next transaction in the chain will be atomic Meaning that no other transaction is allowed to change the database state That will be viewed by the second transaction After the first transaction commits So I'll we'll show an example of this in a second, but the way to think about this is that the Other transactions could be mod that aren't part of the chain could be modifying the database on different objects That's fine. But as long as they don't modify anything that Could be read or or that was either read by the first transaction the chain of the second transaction chain or written By the first transaction and read by the second Longest of the modify and ending in between the switch from one transaction to the next in the chain then then we're fine So now this sounds like save points, but again the key difference is that the the Each transaction or sub transaction if you want to call that within the chain is its own standalone transaction and That means you're not going to be able to roll back any previous transactions you successfully executed in the chain Within the database system and it calls commit things get flushed to disk you send acknowledgement back to the application server and it's done You can't roll that back the other difference, you know, sort of from an implementation standpoint is that the since each transaction in the chain is his own Is his own sort of first-class transaction? This allows the system to decide Whether to release all the locks for the first you know for the for the committing transaction and Before it starts the second one or whether it can actually hand off the locks at the you know, you're operating on the same thing Right depends on the implementation different systems do different things But in general you can think about is that you could just release all the locks and then require them in the second transaction But logically as long as no one modifies anything we should be reading the second transaction that we're fine Let's look at a table. So we have a 3D transaction chain T1 T2 T3 so we start with T1 and it's going to do a right on a and This little image at the bottom it's showing us where we're actually in it's what would what modifications We're making the database and whether it could be persisted or not So now when transaction one calls can get to this point here it calls commit again It's a full-fledged transaction. It has to flush all changes out the disk Before we hand it off to the second transaction It's actually not entirely true like you could specifically execute the second transaction But the way to think about it can because it's a first-class transaction We can't tell the outside world that transaction one has committed Until we know everything's been safely written a disk We could start running transaction to assuming transaction one is going to complete and just make sure that we don't commit transaction to unless one commits Now this point here is this is keep it simple T1 flushes their the log at the disk So now we can then start executing transaction to And this transaction was to a read on a right on B So this is the key part of strength say in the last slide that no other transaction is allowed to modify the Stay of the database in a way that could be observed by transaction to in between the the Completion of T of transaction one and the startup of transaction to again Think about like if I have to flush up flush my T1's log records out the disk and some other freckles start to pick up another transaction Start running in between before before I start executing transaction to me And long as that transaction doesn't modify anything that will be read by T2 Then that's okay. So long as in this case here as long as no one modifies Object a it overwrites the change that T1 did then that's fine in my transaction chain Again, I do my read on a do my right on B then a commit Again, the this fires off now transaction three in my chain and I could you know flush the change to be Before I start that up again depends on implementation So now I start executing transaction three I'm gonna write on C, but now I'm gonna roll back And so the final outcome of this transaction chain will be that we will persist the changes to A and B But we'll have to roll back the change to C Again under transaction chains each transaction in the chain is its own first-class transaction So therefore no matter what happens later down the chain as long as I get to the commit phase of my transaction My changes will be persisted here So this solves the multi-stage planning problem in some ways But it doesn't solve the bulk update problem, right? It sort of seems like this is what we want to do But it doesn't go all the way there because it's required to us to write additional application logic to be able to handle affairs failures and rollback state so what I mean by that is in the case of If I want to do the bulk update where it's sort of all or nothing for 1 billion tuples I can do that as a train change transaction and that may improve the amount of parallelism that I have in my system But the issue is going to be that The if a transaction, you know, the very last tuple that gets updated by the last transaction If there's a conflict there and I have to roll back I Can't roll back all the previous transactions because I've earned the change transaction model each one is committed on its own So the this requires us to write additional logic and application to figure out what's the right thing to do And then make sure that we invoke that operation or reverse those changes as needed Which now means we need to keep track of where what transactions we've evoked to reverse things, which is now we're maintaining sort of the same transactions Table and other information that a Davis would normally provide for us so what we really want is we want to have the The the data system trying to manage all this as much as possible, right? We're already managing execution transactions. It'd be nice if it can also handle failures and rollbacks In in the change transaction model. So this is what compensated transactions are so compensating transactions are a special type of transaction that are Written by the application programmer to logically reverse the effects of a transaction that's already committed and so The key thing to understand here is that again, these these are doing logical changes and not physical changes So if you think about compensating log records from areas of recovery protocol Those are doing physical reversals so I've run my transaction the first time and I create a log record that has the before value and after value if I need to now reverse that change I just flip the the the current value with the before value and that undoes it Right, that's a physical change. I'm actually modifying the bits of the values that I've stored to be what they were before I Initially did my update So now a logical Change or logical reversal It has to understand something about how the application is using the database in order to perform this reversal so the example I always like to give for this is think about a Web application that's keeping track of people that are enrolled in the database course here And there's this counter that keeps track of the number of students that are enrolled So when I signed up to enroll in the course the counter was originally nine and then when I got enrolled I updated the counter and added one to it to announce ten So now if I for like weird diarrhea reasons or some some some other is our problem I need to leave the course and I want to reverse that counter So if I'm doing a physical reversal physical compensation Then I would look at the log record that I created the first time I did the update and it would say alright Well, I The old value was was nine and I said it's ten So now let me go back and set it to nine the problem here is that say 20 other people have signed up for the for the course after I signed up So now the counter is actually 30 So if I do a physical reversal, then I'm gonna take the counter from 30 and set it to nine Which is incorrect But if I do a logical reversal, which is just decrement that counter by one the same way I incremented it by one when I ran the first time Then I'll make sure that I go from 30 to 29, which is what I want So these compensating transactions are written by the application program and they have to do these sort of logical reversals Because only the application program it can can can understand What it means for the database to be put back into correct state after a transaction fails So now the idea here is that we can then use these compensating transactions in conjunction with chain transactions to now do more complex larger updates to to our database and These are typically what are known as saga transactions So there's this great paper from Sigma 1987 Created by Hector Garcia Molina and Ken Salem This probably is one of my favorite titles of paper because it's only five letters, right? and What they propose for doing how to handle like long-running multi-step transactions Entirely inside the database system is to define a sequence of chain transactions and then corresponding compensating transactions that allow you to reverse them Right, so the way to think about this is that For I'll have n regular transactions and they have n minus one compensating transactions But know how to undo each of those my first transactions and the reason why it's n minus one because if I get to transaction Tn at the very end, then I know I my total transaction is committed and I can undo the change so again, this is going to allow us to Support these multi-step long-running transactions without having the application Worry about where they are in the the workflow of executing transactions that update a lot of this invention and making sure we can undo them Let's look at really simple example here So this is what we talked about before of having transactions updated counter, right? So the chain transaction will go from t1 to t2 and then t2 to t3 and along the way I'm just updating This this counter by by one But let's say now I get to transaction three in my chain and for whatever reason this guy aborts Now with just regular chain transactions t1 and t2 would still remain But with soccer transactions when I abort this this this last transaction I'm also going to then now go back and invoke compensating transactions to undo the transactions that actually committed In the first case here. I have a compensating transaction for transaction t2 And all this is going to do is do a logical right to decrement the counter That I incremented when I execute the first time and likewise same thing for t1. It's just going to decrement that counter So again the main idea here is that I could write these compensating transactions myself in my application and worry about invoking them or and in case of My cheap my chain transaction fails at runtime But with soccer transactions the idea is all that logic is now running inside the data isn't itself and it knows how to reverse it as needed so As far as I know No data is natively supports Chain transactions and soccer transactions where you actually see these are in application programming frameworks They will expose or provide you this the api To do these kind of things and then they underneath the cover they just use a database system to Invoke it for you. So they sort of make it easier for you as the programmer to implement compensating transactions or soccer transactions And they you don't worry about how to actually manage that yourself so Even though the idea is old with soccer transactions, you know almost over over 30 years old now um it's actually super common now in application programming frameworks for using microservices Right because microservices are all about doing these small updates on small number records But maybe you want to do a large update in a lot of the things So you can use soccer transactions across your microservice fleet to make sure that this is all done atomically and correctly But I think that's a very interesting idea where like uh an old concept From the the literature that the navy was not widely adopted when it first came out is actually it's uh becoming becoming more prevalent today Um, which I think is pretty cool All right, so now We can jump and talk about current control So again, this should all be review for you from from from your background and your undergraduate database course uh But concurrently a protocol is essentially going to be like the traffic cop or coordinator That is going to allow us to execute transactions on a shared database in a multi-program way meaning we're interleaving their operations but The application programmer doesn't think about these things doesn't think about interleaving They just write their you know flat transactions would begin to commit and the operation in between And then the the current protocol Allows the data system to provide the illusion that these transactions are running by themselves on a dedicated system So, you know, this is what the goal of having complex or liable schedules complex or liable Uh transaction execution, right? We want to essentially have Uh the end state of the database To be equivalent to any zero executioner of of all the transactions that we have to execute Even though we're interleaving their operations Another way to think about this is that the the current drill protocol is provided the animosity and isolation guarantees that we want to have In our our old to be database system So briefly want to talk about a little bit how these things are actually implemented Um, well again for today's class we're just going to go over at a high level to understand the basic types of protocols that exist And one of some of the challenges you have with with uh implementing them the When we see uh when we start to you know in the context of multi-virgin current drill Protocols will see a bit more about where they're actually maintaining the internal transaction state Um as well as the state of the database with latches and locks and other things like that Um and how that will vary based on the implementation But at a high level these these these concepts are essentially the same just where they're actually being maintained the subject system can vary The most obvious thing you need for your transaction You need its current state like what it's active what it's committed what it's aborted whether it's terminated and so forth Then you need to maintain the undo undo records for any changes you the transaction makes to the database and so This is a good difference between the disc-oriented database system and in every database system is that these undo records Only need to be stored in the informal data structure inside the system When a transaction commits we don't need to write any of these the undo records out the disc Because we have no pages on disc and therefore we don't have to worry about Undoing changes from uncommitted transactions So when a transaction commits we essentially just blow away The undo log records at some point in time. It may not always be right away Um, we blow it away and don't worry about persisting them The log record what you need to persist though is redo, right? So this is where we're gonna Sort of classic protocol these readings out to the redhead log And then make sure we flush it when to disk when a transaction commits And we tell the outside world our transactions finished because that way The redo records is enough information information for us to to restore the changes that the transaction made to the database And the last one is going to be the rewrite set or sometimes also the scan set Or index key set for transactions And we'll see more about this in subsequent lectures and this How much information you actually need to maintain per transaction for the rewrite set? Uh, it depends on the actual implementation of the protocol so the There's essentially two categories of uh, Contraterals schemes two-phase locking timestamp ordering And so two-phase locking can consider a pessimistic protocol Because it's going to assume that transactions are going to conflict So therefore it requires them to to acquire locks on all the days autos before they're allowed to access them Time stamp ordering protocols are considered optimistic because you assume conflicts are rare And so you don't require a transaction to fire lots from those database objects And instead you just check for them at commit time or some cases You actually as you go along, but in general you wouldn't you check for them at the end Right, you assume there's not many conflicts and you don't worry about it until the very end to see And just see whether you were you your assumption was correct or not So i'm going to go through each of these at a high level just so we understand what we're talking about when we talk about the How to actually apply them for multi virgin security control because that's the main thing we're going to talk about Uh, and in the next week We'll talk about what that's actually going on and then we have a sort of common vernacular to describe these various parts So two-phase locking uh look at a realistic example of a transaction I want to do a read on a and a write on b so Before any transaction is allowed to access an object. It has to acquire the locks of them Right, so I need to get a lock on a and a lock on b Now one key thing to point out here is i'm not saying whether a and b or tuples or Right, i'm just saying that they're objects in the database. They can be tables. They can be pages It could be database databases Right, it could be single attributes within a single tuple Right doesn't matter the protocol is always the same So the first phase is when I acquire them on my locks and that's called the growing phase And then as soon as I release one lock Then I enter what is called now the shrinking phase And when you're in the shrinking phase, you can only release locks and do operations on the on the objects You already have locks for you can't go back and add new locks and this has been proven if you enforce this shrinking phase Uh the entity then this would be this is proven to generate conflicts or liable schedules All right, so let's take a look at how the of two transactions run at the same time To this locking every see some of the problems that can come up So my transaction to here it wants to do a write on b and write on a So at the very beginning when both transactions start they have to acquire locks So in the case of transaction one it wants to read a so that's acquire lock on a and then in in transaction two Who wants to a write on b? So that's acquire lock on b now again in the intro class We we talked about intention locks. We talked about shared exclusive locks for our purposes here That doesn't matter that those things only provide Performance improvements allow for higher levels of concurrency Or to reduce the amount of metadata you have to maintain for transactions Again, the protocol is always the same no matter whether you use those extensions or not So now with t one has a lock on a so we can read it t two has a lock on b So we can write to it But now we have trouble because we end up here where transaction t one wants to get the lock on b before it writes to it But that lock is being held by transaction t two So therefore it has to stall on wait And likewise t transaction t two wants to get a lock on a but that's being held by transaction one Which now we have a deadlock and we now we have to resolve this because otherwise these transactions would stall forever So the variations of the two phase locking protocols depend on whether doing deadlock detection or deadlock prevention Now a deadlock detection essentially means that you allow transactions to acquire locks Uh as as needed if they're they're able to meaning nobody is holding it Um, and then you have a separate thread we're on the background and every so often We're going to check the cues for transactions that are waiting for locks See whether you have a deadlock because there's typically a cycle and a dependency graph And then you're going to use some kind of heuristic to decide what transaction you want to kill in order to break that deadlock All right, so this could be things like what transaction held both the most locks what transactions the youngest the newest right There's different approaches. No one is better than another for all applications and all workloads And different databases and do different things The other approach is deadlock prevention and this is where You're going to check at the moment a transaction tries to acquire a lock To see whether another transaction already holds that lock And if so then you make a decision about what you will do So one is that you could get you could just wait and assume the other transaction We're going to give that lock up right away and then you can acquire it Or you can say i'm never going to get that lock and I also might be in a deadlock So i'll shoot myself in the head and commit suicide if at least all my locks and reschedule myself and come back Um, but the last approach is the most gangster one You basically put a gun to the other transactions head shoot it steal us locked and run off and keep running Right again different systems do different things. No one approaches better than another for all applications um You know in some ways in some of the software engineering choice for these All right, so the next class of protocols are called time stamp ordering So again, this is where things get confusing Uh, there's a basic time stamp ordering protocol, which which again every textbook covers which go over right now um And then there's also optimistic country for optimistic country or protocol, which is a variant of time stamp ordering um But again all time stamp ordering protocols are considered optimistic, but there is one Implementation of one algorithm specifically called optimistic convergence tool, right? It is what it is Um, this is going to be a new current theme throughout the entire semester We're just the same name is reused for a bunch of different concepts and a bunch of different things All right, so with basic time stamp ordering we're going to check our complex on each read and write And we're going to use time stamps to figure out whether one transaction is allowed to read something Uh, or not right depending on whether somebody else in the future in the past has has done something to the thing We're trying to access When optimistically protocol, we're going to store our changes in a private workspace and then just at commit time Go see whether there's a conflict So let's go the first one So with basic to we have our transaction here went through a right read on a right on b and a right on a So at the very beginning on the basic time stamp ordering protocol We need to sign our transaction a time stamp And this time stamp is then going to be used to determine the serial serial ordering of operations for all our transactions So let's do a really simple kind of uh time stamp. We just use a counter So we have some kind of a talent counter every time your transaction starts We just do a compare and swap to add one to it So our time stamp for this transaction is 10,001 Now in the database, what's going to happen is we're going to maintain for every single two pool We're going to maintain a read time stamp and a write time stamp So this these will be time stamps that correspond to the transactions that last came along and and did something or access the the tuple So our transaction starts wants to do a read on a so our we go check in the database This is the object we want to read a it's right time stamp is 10,000 Our read our time stamp is 10,001 So we know whatever transaction That wrote to this object was in the past And therefore we should be allowed to see its changes because that's what how things would would would happen if we were doing serial order execution So we do able to do a read and then we update our read time stamp with our time stamp 10,001 Now we do a write on b same thing We need to check to see whether anybody has written to the same object We're trying to write to in the future and therefore our write would overwrite it and that would be a problem Um, and we also make sure that nobody in the future has read this object And therefore has a time stamp that's higher than us and but missed our read which shouldn't happen under a serial execution order And so in this case here, we're fine because 10,000 is less than 10,001 for both time stamps So we're allowed to do the right install the update and then we update the right time stamp with our with our time stamp So I'll say this transaction goes off and does something Does somewhat something else, right? It say they just they just some flushes data at the disk and that's the stalled Then it goes out in the network and get some additional information Whatever it is. It's not running it stalled and then during this time another transaction comes along and does an update to object a And this op this second transaction has time stamp 10,005, which is in the future from us And so it's allowed to do overwrite our change or overwrite the the the current value in a because Our transaction read it in the past at the time stamp 10,001 And we're writing the future so that's fine because in under serial ordering. It shouldn't see our right anyway So now when our transaction comes back and resumes execution and tries to do right on a Now its time stamp is going to be less than the right time stamp of a So therefore it has to abort because now it's going to be overwriting something That got updated in the logical future with data in the in in the past So what I'll say is that as far as I know Very few systems actually implement the basic time stamp ordering protocol as I'm describing here You'll see this again when we talk about multi-version concurrent control That's where people typically help people typically implement something like this so now a A More well-known a more sophisticated approach to doing time stamp ordering is called OCC or Optimistic Contributive Control So think of this as like almost like a multi-version system But instead of actually having the versions be visible to other transactions They're only visible to our transaction because we're going to make all our changes to a private workspace And only when we go to commit and after we verify that there's no conflicts with any other transactions Then we install the updates from our private workspace into the global database And now our changes are visible to everyone So the OCC protocol is old. It actually was vented here at CMU going back to 1981 by HD Kung who was not a database professor, but he sort of stumbled upon this protocol And like I said, it's it's been Used more recently in the in-memory databases less so back in the day when it came out the discordant systems Most discordant systems implement two-phase blocking All right, so let's look at how OCC works So we have a transaction that was read on a write on a and then write on b So now in our database Unlike before with basic timestamp ordering protocol, we had to have both the read timestamp the write timestamp with OCC We only need the write timestamp So we first start off and we're going to do a read on a so unlike basic timestamp ordering protocol We don't get a timestamp when our transaction starts Right at this point our transaction doesn't really exist And no one's we want to see our changes because we're going to put all our changes into a private workspace So this first part is called the read phase and again, this is confusing because it's it's we're actually going to do writes to our database But it's for whatever reason it's called the read phase So when we do a read we're going to go to the global database And find the record that we want and then copy it into our private workspace We're going to copy along the metadata to say what the write timestamp was When we when we brought it in But now when we do a write instead of updating the global database, we're going to modify our private workspace So for this we need to overwrite our write timestamp But since our transaction wasn't assigned when the end of the beginning We'll just set that to affinity and update the value to be whatever it wants to be Likewise when we do write on b same thing we copy it from the global workspace into our private workspace We update the value with whatever the new value we want to install and then we set the write timestamp to be intended So now at this point Transactions going to ask the commit Or the application is going to see you know from the command line or within the program It's going to say commit my transaction But now that the bunch of actual work we have to do before we're allowed to truly commit the transaction To check to see whether there was any conflicts while we you know that that occurred while we were running So there's two additional phases that are going to come out of the read phase There's going to be the validate phase and the write phase So essentially the validate phase is we're going to look at all the changes we made in our private workspace And either look at transactions that have committed before we did or transactions that have not committed yet Right, whether working backwards or forward in time doesn't matter the cementically the same It's just how you actually want to implement it. We check to see whether there's any conflicts If yes, then we have to border all our changes, which is easy to do because it's in our private workspace So we just blow it away But if we pass the validate phase Then we enter the write phase where we install our we we get our timestamp finally and then we install our changes into the global database with our with our with our updated write timestamp Then now at this point once all our changes are installed our transactions fully considered fully committed that we can tell the outside world that we finished So again, I don't want to go into details of how you actually implement OCC Because we're going to focus on that when we talk about Implementing this in the context of a multi-version system So so hold on hold off on that for later You know because the the validate phase is sort of the most important part and that can be quite tricky All right, so one observation we can make now about these protocols is that Obviously the obvious connection of protocols are going to perform better the time-safe ordering protocols are going to perform better When there's low contention because the data system is not going to end up wasting its time Checking for a complex that just don't exist But the problem though is that You know for really important applications There's often a lot of contention And what will happen is that The protocol is essentially just sort of degenerate into the same You know generating schedules the same serial execution So no one under extreme circumstances. No one protocol is better than another so the To understand this better the paper I had you guys read was a survey paper that I wrote With a PhD student back at MIT On evaluating these these concurrency protocols in the context of human databases But looking at them in a sort of really high level of parallelism So what I mean by that is instead of looking at you know, a single single machine that maybe has 32 64 cores We want to look at really large core accounts and see how the protocol is actually fair And we're going to do this in the context of a single desk pet system So we don't worry about aspects and features of the of the The database system that may interfere or may taint their measurements So what I mean what I mean by this is that rather than taking my sequel which does, you know two-phase locking and and Postgres which does time-safe ordering like instead of taking those two systems running them on And comparing them sort of Based on how this system is actually implemented We actually just want to look at what protocols are using implement into a single test system And that way we strip out all this additional overhead They don't actually matter to the things that we want to measure because we want to focus on just the concurrential protocols And again, the idea here is that we're going to run these in extreme environments And that's going to allow us to more easily identify What are the bottlenecks in a database system that could prevent these systems from scaling to higher core accounts? So the way we're going to do this is run this in a cpu simulator called graphite that was developed at mit So the way to think about what graphite is is simulating. It's that it's a single socket tau base cpu Where there's a two two dimensional mesh network that allows cores to communicate with each other So Think about this is like, you know, when we talk about the zions and the qpi But that's a a pneumo system where you have two cpu sockets And there's a there's a bus in between the two of them that allows them to communicate But I can communicate on the cores with the cores that in my socket Much more quickly that I can communicate with the cores on the other socket And so that's a pneumo architecture non-uniform memory access the cpu we're going to simulate and for this paper is called nuca non-uniform cache access for architecture and this is where the uh cost of accessing the The cache Is going to be different Because there's me a groups of cat there can be a shared l2 cache for groups of cores But then again, I need to go over this mesh network to communicate with with other cores I can't just talk directly to my neighbors, right? Or go over a single bus. I got to figure out how to wrap my messages On the messaging fabric on the chip so the For the most part for the rest of the semester, we're actually going to be talking about pneumo architecture the non-uniform memory access with things like xeons uh For this one because we're trying to look at really high core accounts I'm not an architecture person, but my understanding from talking to intel guys is that when you're looking at a system of the thousand cores You have to use something that's that's like what graphite is using here They give the the interconnect between sockets which is not going to not going to work So again, we're going to implement we're going to implement these different culture protocols in a single system called called dvx 1000 So this is a system that was explicitly built with this paper to run these experiments inside this chip simulator Um, because the chip simulator is 10 000 x slower on wall clock time um, we need to be very uh The dvx 1000 needs to be very stripped down And not have a lot of additional features because that would just take forever to run these experiments So the actually the student that wrote dvx 1000 actually wrote some core parts of it In assembly because that would end up being way faster than anything you go right and see So that was very impressive when he did that All right, so we're going to implement a bunch of protocols in a single in a single test bus system And then we're going to evaluate that with the the ycsb benchmark yahu clouds every benchmark So this is a sort of a a well known key value store benchmark That's just to simulate the access patterns of online web applications So we're going to have two million tuples. I need to lose me one kilobyte And in the original ycsb each transaction only modifies a single tuple But to make this more realistic, we're going to have each transaction actually modify Read or modify up to 16 tuples And we're going to vary the amount of skew we have in the transactions access patterns to vary To simulate different types of hot spots and measure how contention affects performance for these transactions We're going to execute i'm going to execute all these transactions with the serializable isolation model So just to understand what the uh, the the labels are going to mean in our graph legends So we're going to group group our transaction or concerning protocols into Whether to begin two phase locking or types of ordering So we have deadlock detection. No weight and weight and dye no weight basically means that if I go If I go try to acquire a lock and that lock is not available. I need to kill myself on a port and roll back So these types of deadlock detection approaches are used in these types of these major database systems The second category is typeset ordering again We'll have the basic typeset ordering protocol and the optimistic culture protocol that I discussed And then we're going to use a multi version variant of type the typeset ordering protocol That is really common in using it in a bunch of these different database systems So we'll discuss mcc to in the next couple classes Again, the way to think about it is that it's almost like occ But you uh, you get a time stamp when you first arrive and instead of making changes to a private workspace You're updating the global database All right, so the very first experiment we're going to do is we're going to look at a read on the workload So this is where you have zero contention at the logical lock level um Because there's transactions are just they're not making any modifications So there's reading data so anybody can read anything without any bottlenecks So the first trend we see is that deadlock detection and no weight Turn out to perform the best here and actually you can scale up to a thousand cores um Because they're not really doing any extra work right no weight says I need to acquire this lock Which I can because it's always a shared lock So therefore I just update a status and say I I have a shared lock on this tuple And then I can I can read it I can do that very quickly For deadlock detection same thing I'm acquiring share locks. There's never any deadlocks. There's never any stalling. So everything runs real quickly The next trend we see is that for no weight and npc There's just a little dip here around 800 cores. So one way to think about is I should explain this so for the The as we scale along the x-axis for the number of cores There'll be one additional thread is added to the system and that one thread is going to be adding Extruding a new transaction. So at 800 cores, there's going to be at most 800 simultaneous transactions running at the same time So we see a dip here at 800 cores And this is due to the transactions always having to acquire new timestamps for You know from the from the system, which is going to end up being a bottleneck Based on how we actually implement it which we'll we'll discuss in a second The last one here is we see occ actually performs the worst and this is because the overhead of copying things out of the the global database and into actually A private workspace You know to to to apply your changes or make sure you have repeatable reads That copying ends up being a bottleneck and the performance gets gets worse But now we're going to start adding writes to our workload and we can increase the amount of contention we have in it. So by increasing the contention, we're going to mean that A larger portion of the of the transactions are going to access a A some subset of the database. So I think this one is 60 percent of the transactions are going to access 20 percent of the database So the first thing you see is that the deadlock detection line Falls off and dies immediately actually does the worst here, right? And this is because now there's more conflicts The we're going to spend more time trying to figure out Uh, since there's more transactions to try to be accessing the same The same or updating the same tuples There's going to be more conflicts. There'd be more deadlocks And it's going to take longer for the deadlock detection thread to find those those transactions And therefore they're going to be stalled and and we're end up wasting work The uh, no weight and weight and died still actually perform really well Weight died does or so no weight does the best here and this is because these protocols are actually so simple And the cost of of of restarting a transaction in this environment in the dps 1000 system Is actually really cheap because everything's running the store procedures I check to see whether the transaction I want or the lock I need is there If I can acquire it, I just go ahead and get it if I can't then I immediately kill myself And roll back which is a cheap operation because everything's a memory They didn't log anything to disk and come back when we start again So I can keep sort of essentially spinning and doing this over and over again Until I acquire the locks that I actually need We see this middle band here where the times table warning protocol mvcc and occ are roughly doing that the same and again This is because they're copying data anytime they do uh, do updates and maintain different versions um, or they are uh You know trying to supply our time stamps from from my global shared counter So these guys are sort of roughly doing the same But again occ is always doing the worst because it's just it has to copy way more data when you do reads All right, the last graph is sort of the the the the money shot of all this This is now we're running up to a thousand cores But the right intense of workload with high attention So this is now 90 percent of the transactions are trying to update 10 percent of the database so what you see is that all the the the protocols essentially flat line and Converged to zero when you're up to a almost zero when you're up to a thousand cores So this this is what I was saying before but that when when there's high contention Then the the different protocols essentially, uh Degenerate into the to performing the same Do you see however that no way it's up and doing actually pretty well It's a champ almost up to 800 transactions a second now We're not executing that many transactions, right? We're only executing 200 000 at most transactions a second Which is not a lot compared to what we saw in the previous graphs But it's doing okay up until this point here, uh at 800 transactions a second which is again, uh Now what happens is there's just so much contention because there's so many active transactions that the The transactions are hitting conflicts because they can't quite the lots they need Immediately boarding themselves and rolling back So there's just spinning doing much of useless work is starting and restarting starting restarting over again So the interesting to point out though is that When we look at when we have a small number of cores OCC does the worst But when we go to a thousand cores, it actually does the best And the reason is because, uh When you're at these extreme cases here OCC is essentially running at In in serial order, right? So under the and under the validation protocol It'll guarantee that at least one transaction will be able to commit And so it's essentially running this now at serial order. So going back to this you see that at a thousand cores You're essentially getting the same performance you get at like four cores And that's because we're actually essentially executing one transaction at a time There's all this other crap we're doing on the other Uh 996 cores of starting and restarting transactions copying crap that ends up getting thrown away But in the end we still can actually be been able to execute one transaction So this is a good example of all those ways to work. We're doing having all this, you know, this this this amazing cpu Uh, but it doesn't actually help us for any of our protocols because of this contention point of during the validation phase So the the main takeaway I want to get from this is that the circle on the on the on the left hand side here, uh, when we have a small number of core accounts This is roughly around where we are today in modern systems On a single cpu or a single machine if you go distributed and then that increases your core count But that now the network becomes this big bottleneck and that that causes a lot of problems so there's this big gap in the middle Where there there's currently a lot of active research being done and try to how to resolve this right the no way It seems like this be somewhat ideal, but again the the number the absolute number is not that great And this relies on everything running in store procedures, which we'll see later on This is not always the case in a lot of application scenarios so This is problematic here and frankly the There's no good answer of how to solve this. There's no, you know, piece of hardware that's going to come and save us To solve this problem I think it's going to be a combination of better algorithms But also now exposing more information about What the application is trying to do with the data down to the active database system So in the second time I'm sort of go through this quickly This is just showing you the the time is being spent while I execute all these different transactions And the yellow bar corresponds to the time wasted spent doing the boards And again, you see under no way it's doing, uh, it's doing well, but it's just It's executing way more abortions than everyone else Because it's just sort of spinning and trying to start and restart transactions over again Same thing with OCC it does a little bit better It has a little less of boards, but it's spending more time in the transaction manager because that's the copy data So again, this is just showing you that it's The main bottleneck is is because you have to abort transactions because there's complex Not essentially anything about that the protocol itself So we can actually look at the deeper now. What are all the different bottlenecks we could have? Right, so break it up into three categories lock thrashing time type allocation and memory allocations So these are fundamental to the implementation of the of the system And not something directly applicable to, you know, a single protocol So with lock thrashing the idea is that um As a transaction has to wait longer to acquire locks because there's contention Then if I have to wait longer to acquire my locks and you're waiting from for me to release my locks Then you're gonna end up waiting longer too So this is convoy effect where one transaction waits longer that causes another transaction waiting longer And so far the ripples are down the line and at some point there's a tipping point where we end up doing worse than we would have Executing with a lower number of core accounts than we would with higher current core accounts So one way we can actually measure this phenomenon in our system is actually Have our transactions acquire locks in primary key order So this means that say I have locks a b c and d So I want to so anybody who wants to acquire a lock on on a and b has to acquire a first before b And you know b before c and c before t and what happens if you enforce this what you can't do in a real system but this allows you to uh Ensure that there's never any deadlocks And that you're really only waiting for the transaction in front of you to release its locks and it's not waiting for you And so if you do this in the simulator, do you end up with these nice curves like this? uh And so theta is very the the the parameter theta is varying the amount of skew in the system So if theta is 0.8, that's the most skew theta zero is no skew So what you see is that these two dips in the scalability of the system as you increase in the real cores Where now you have lock thrashing right because this is where the transactions are Waiting longer to acquire locks held by other transactions And transactions behind them are waiting longer and then you see that you see this this this me and this dip Which is really cool about this like this graph when when we did this in the experiment is that This is exactly what you see in like theory when you go look at like the textbook And you go look at lock thrashing though I usually show like a hand drawn graph that looks like this Uh to to explain the effect and but here's here's an example in a in a real test bed system That is exactly this problem The next issue we're going to have is actually how to allocate timestamps. So we saw this Uh when we talked about timestamp ordering um Where the as you increase the core account now all your transactions are trying to acquire Uh a timestamp from a single global global shared resource and that ends up being coming up on it So I don't talk too much about this right now, but this is going to come up multiple times throughout the semester about how we actually do Uh efficient research Sharing or having critical sections where Multiple threads may be trying to read or write something to something some piece of memory And how can we do this efficiently without having everyone? Uh get bottlenecked on on that critical section So the worst way to do a timestamp is you use a mutex. I'll explain more about this later on Um, but this goes into the operating system of a syscall in the worst-case scenario And this is always just going to be super slow The next way to get a little better is to do atomic addition and this is a single instruction to do a prepare and swap to Uh, you know update the value and Check to see whether you actually you updated or not and get any timestamp that way The reason why this is going to be problematic in a thousand core CPU is that it's going to require cash invalidation Every time you do an update which if every transaction is trying to get a single counter Then that's going to update have to invalidate everyone and that's going to be slow So another way to get around this to do batch time addition So this is where you instead of getting a single timestamp at a time Maybe for a group of threads you get 10 timestamps And then you hand them out locally and only go back to the global counter anytime you need to get, uh A new you know a new batch of timestamps So this is a pretty good idea. We'll see this multiple times again throughout the semester. You just need to make sure that if transactions are aborting because of their timestamps in the batch is less than some other transaction that's updating much of stuff That you have a way to avoid just burning through your timestamps very quickly and essentially degenerating into the single you know single up uh times of at a time approach The other things that talked about in the paper were doing harbor clocks and harbor counters Again, I don't talk about too much about this but other than to say if the cpu can provide us with You know fast instructions that do essentially the technique or the Provide us the primitive that we're implementing software Then this would be ideal for us because then we just you know single instruction and let the hardware handle all those for us So this graph is just showing you the the drop-off rate Or the as you scale up to really high number of times in per second for for a thousand cores what the You know where the different approaches How the different approaches are fair with each other All right, the last thing we don't talk about and this will come up again multiple times throughout the semester is that Anytime we got to read and write data until we got to copy anything that sucks. That's gonna be slow so Because we may have to end up allocating memory which means that we may have to make a call to malloc And if we're using the the default let's see malloc implication you get from linux Uh, that's going to be terrible. That's going to be slow. So we need to be more careful about how we use memory Um, but and keep track of where the memory is located so that we read memory that's close to us Um, rather than something that's far away on another socket There's a whole bunch of crap around to worry about when we start worrying, you know, we're not where we're actually reading running number two um But we'll cover that later on when we talk about storage The main takeaway from this is like, uh, the libc malloc is super slow We want to use j e malloc or tc malloc and our own system. We use we use j e malloc And it's much much better All right, so to finish up I want to quickly talk about isolation levels So in the thousand core paper the steering of the disk paper the All the transactions are running a serialized by isolation level and this is because serializability within the database academia is considered the gold standard of how we want to provide transactions to programmers um the tricky thing though is that we Enforcing serializability in a daily system may end up Minimizing or reducing the amount of concurrency we actually can have and therefore you end up with really shitty performance um And so in actuality in most systems we'll see in a second most systems don't actually by default support the serializability or Provide sale about serializability. They're going to run at what is called a weaker isolation level um And we're going to do this for performance reasons or engineering reasons It's going to allow us to get better scalability because we don't have to worry about finding complex serialized What execution or complex serialized will interleaving for our transactions So an isolation level is a way to control uh On the the extent to which a transaction will be could incur different anomalies from other transactions running at the same time All right, so the the weaker the isolation that you have the more exposure your transaction could possibly have to Uh these different anomalies. So you have dirty reads unrepeatable reads and phantom reads and these are what you get from the sort of the the text work definition of Of isolation levels, but we'll see that we different type of anomalies you have there's way more than this Um, but there's two we want to focus on beyond these So the ANSI standard defines the four isolation levels of serializable Repeated or reads and recommitted read uncommitted. So the way to think what this is going from the top to the bottom Serializable will guarantee that you have no phantoms all repeatable reads and nerdy reads. So this is like again This is generating extrusion schedules from transactions that are equivalent to a complex serializable schedule But then repeatable reads we may have phantoms with recommitted We can have phantoms and unrepeatable reads and read uncommitted all bets are off and all the anomalies could possibly have So you can map it sort of like a hierarchy like this Again, the bottom is the weakest the top is the the most strict or the uh, the most protective Again, the key the point about these anomalies is is that it's not that any transaction going to Is always going to hit these anomalies when they run at a lower isolation level It's just the database system is not going to guarantee that you won't So if you only execute one transaction at at a time in serial order in your application Because you know your website doesn't get any traffic Then you're going to run at serializable isolation level Already because there's no current transactions that therefore you cannot have no anomalies right, so Again, these it's not necessarily mean if you're running at a lower isolation level that you're going to hit these problems It's just that data system won't protect you So this is why the in real world systems Most of them don't by default run with Uh, serialized isolation level. They actually run with recommit it And then the maximum isolation level level a lot of these systems don't support is actually in some cases not even, uh, Serialize at all then these examples here you actually do get serializable But there's a bunch of other ones where maybe the most they go is recommit recommitted or repeatable So again, we covered these in the intro class I don't spend too much time talking about them for the detail But I do want to focus on these two other isolation levels that don't fit into the ANSI standard that I just talked about cursor stability and snatch isolation so the ANSI isolation levels were part of the sql 92 standard and the standards body that developed them Defined these anomalies in the context of a two-phase locking database management system But this is great paper from 1995 written by, you know, some very famous database researchers that basically said look Yes, what you're describing exists for Two-phase locking systems, but for time-safe ordering system and optimistic systems This is actually not this doesn't cover everything and even within actually two-phase locking systems There's other problems that can occur that two-phase locking doesn't doesn't or other anomalies that can occur That two-phase locking doesn't actually protect from as you define it As you because when you enforce the different isolation levels to find in the standard So I want to go through each of these one by one The focus is going to be on snapshot isolation Because that's what we're going to get that's what we're going to get when we do multiversioning and that really is going to be the focus on through the most of the semester because that's how most people think about Extruding analytical queries and you want to you know a snapshot of the database And you don't worry maybe we're about seeing new updates You just want to say here's my consistent snapshot and run my query on that so with cursor stability the idea is that the The database as execute transaction can maintain a cursor lock On whatever object that the the transaction is operating on and it does not release that lock Until it moves on to the next item So the way to think about this is that it's going to be a slightly stronger isolation level That sits in between repeatable reads and read committed That is going to prevent what is called the loss update now right So this is not the same thing as two-phase locking because you're not acquiring locks in the growing phase at least in the shrinking phase think of this is like The two-phase locking the shrinking growing phases within the context of the internal transaction This cursor stability stuff that i'm talking about the cursor lock is on for like a single query Right or a single two, you know multiple queries that operate on the same object But as soon as I move on to another object, I release that lock But I can only hold one cursor lock at a time so therefore they can't be any any deadlocks All right, so let's let's take an example with a loss update anomaly looks like There are two transactions t1 t2 t1 wants to do a read on a And then t2 will start to do it right on a t1 comes back and does it right on a So again, I started here. I do my read. That's fine Uh now I do my write on a transaction t2 but now I come back and do a write on a in t1 And then I commit but now when I commit t2 it's going to end up losing the Modification made to object a Even though it commits after t1 And this is this this is considered a loss update So a cursor lock on a held by t1 would solve this problem because We would hold that lock on the read on a then it would stall for whatever reason But then t2 wouldn't allow to be allowed to modify a Because it couldn't get the cursor lock on that object And then t1 resumes and then can do the write on a and then if it goes off and writes another object Read to write to another object it gets it releases that cursor lock on a and gets the new cursor lock Or if it commits and releases the cursor lock Then t2 can then do its update. So we we would avoid this problem So not very many systems Support cursor stability as an isolation level. You get it by default in db2 Um I'm actually not aware of any other system that actually does this now some systems will use cursor locks To avoid this problem, but the Whether you can declare it specifically I I I only think db2 allows you to do that I the the second isolation level I want to focus on more is is snaps isolation So this one's pretty easy to understand. It's just that When a transaction starts The the the data is we guarantees that any reads that transaction makes Will be based on a consistent snapshot that existed at the time that that transaction started So that means that there's another there's an active transaction That is running when our transaction starts and say that first action transaction has modified, you know, the database in some ways Our new transaction doesn't see any of those changes because That transaction did not commit before our transaction started. So it wouldn't be part of consistent snapshot So Now what if I would do modifications and turn to our under snapshot isolation Our terms as long as our transaction all along the the modifications we made To the database do not conflict with any other modifications made by any other transactions That have that occurred since we created a snapshot Then we're allowed to commit because that's still will be conflicts. They're liable Right, but the tricky thing though is now if I read stuff in my snapshot that then gets modified By another transaction. I don't see those updates because those updates Uh, we're not part of my consistent snapshot So this is what is called the right skew anomaly. Again, the loss update anomaly and right skew anomaly are different types of anomalies that exist Uh, but aren't part of the anti-standard when people talk about anomalies in in in sql But you know recommit if you didn't committed repeat all reasons you're liable So I always the way that I understand the right skew anomaly is to use this really great example that was Uh, invented by jim gray Um using a database of marbles, right? I think I like, you know little glass marbles like you shoot them, right? or pebbles And pebbles the pebbles in this database can have two colors. They mean black or white So we're gonna have two transactions run at the exact same time under snapshot isolation and we can see how Why this thing is not serializable because because of this right skew anomaly So transaction one is going to change any white marbles that exist in the database to black And transaction two is going to change any black marbles that exist in the database to white So both these transactions start they both have the same consistent snapshot and in the case of, uh Transaction t1 is going to see these two white pebbles or marbles Transaction t2 is going to see these two black pebbles or marbles And therefore they're going to flip the colors And then now when they go to commit and write their changes to the database The right set for transaction one at the top only contains the bottom two marbles and if and if flip them the black the transaction at the bottom Only it's right set only contains the The top two marbles that except they're white Therefore the right set to these two these two transactions don't conflict and therefore they're both allowed to commit And we can start the updates, but now we end up at the stated database where essentially we all the black marbles went white and all the black marbles All black marbles went white and all the white marbles went black Based on on the consistent snapshot And this is not serializable Because if it was truly a serializable execution We would execute t1 first could put all the marbles of black and then execute t2 and put all the marbles to white So under snapshot isolation I was able to Generate a state of the database that is not equivalent to something like this So you need all the marbles have to be white all the marbles have to be black So this is why Again snapshot isolation is not considered full serializable So we go back to our hierarchy. It's actually a little more complicated than before Right, so we have cross-stability in between recommitted and the beetle reads And the snapshot isolation is this other weird thing on the side too where The repeater reads it doesn't suffer from from this problem, but You know This right skew anomaly occurs in snapshot isolation and it can't actually resolve it so Again, we're going to focus on snapshot isolation. We're going to see how to actually make it serializable In postgres and other systems in the next couple classes, but I'll just say that Uh This problem of isolation levels is even more complicated than this. I'm only showing you Six of them here There's this phd dissertation from late 1990s from a tool at itchia at mit And he maps out what was known at the time for the entire space Of of different isolation levels and all these different types of anomalies that you that you can have There's been a lot of great work in the last down the last five or six years expanding this this flow chart even further but You know from our purposes, we don't care about we're going to focus on snapshot isolation but it's just showing you that there's the isolation levels will You get by lowering the isolation level below serializable It'll expose potentially your transaction to more anomalies And there's all these other things you can actually be exposed to as well They're not not captured in the sort of the three basic anomaly types that we talked about before All right, so I realize I went super long in this but there's so much transaction I wanted to talk about I If it's not clear, I can I realize I'm hanging out in this weird shed and I I'm talking to you without a classroom But and you can't see how excited I am transactions are awesome Transactions are hard. They're hard to get implement correctly to hard implement efficiently um But there's so many weird aspects of them or cool aspects of them that We need to think about as we build out a real database management system And I'll all I did today was just talk about how to update objects in the database like, you know, abstract objects that don't really You know, there aren't tuples that aren't records that aren't our databases when we start focusing on Real computational limitations and we start throwing in all these additional components We need and a real database in its system to be able to support transactions and queries efficiently Indexes triggers catalogs and sequences materialized views and so forth when we start bringing all that crap in That's when things get really hard because now we need to reason about the state Not only the state of the database objects that we're modifying but also the How it's going to find this data if it tries to read these auxiliary data structures So again, hopefully, you know, my energy is coming through The lecture, but this is something we're going to cover throughout the entire semester Of how to actually do all the crap we're talking about in the constant transaction, which I think it's really cool All right, so to finish up the uh The next class we're going to talk about more of the virtual control and this is going to be a three-part lecture We're going to go in details about how to actually implement the Protocol protocol for a new garbage collection how to do version stores how to maintain indexes All the things again you need to have when you build a real database system that's doing in memory in DCC, okay? All right, so with that i'm heading out The only thing in this town I think is like a denny's or some bullsh** that so like I said, I'm flying back From the west coast this weekend and I'll be on campus starting on on tuesday next week. So all right guys. Take it easy. See you Careful with the bottom baby