 All right, I think we can get started today. Hopefully everyone are recovering from the deadline of the hash table project. How do you feel? No responses? Maybe not recovered yet? So first of all, a little bit of administrative stuff still. We'll be releasing a homework for Wednesday. It's actually not that far away on November 7th. And project three is already been released. It's still on November 14th. And we actually also opened project to practice submission on Gridscope upon the request of some students. So that if you still want to test your project to submission, especially if you need to build on top of it on later project, you can still test that. But one thing I actually need to emphasize a little bit here today is that we actually observed that many people only started, at least started their first submission on project two, on the due date of the project, or on the original due date, or on the extended due date of the project. So that is a very tight schedule that many, at least some of the students are following. At that point it's actually pretty difficult for the TAs and us if you are really on that tight schedule on the last one or two days trying to finish that project. So just start early. Otherwise, if you start on the last one or two days, it's just a very risky way to ensure your normal progress in the class. So for project three, I would definitely recommend to start early. And then again, right after today's presentation, we'll have actually the co-founder of Trino, a federated database query engine startup. He's going to talk about the error query optimizer. So again, we just talked about the database query optimizer in the last two lectures. I think it would be pretty interesting if you could check out this database talk. We have links in my slides as well. So back to the content of today's lecture. I don't know whether Andrew has showed this architecture or this workflow of database system before, but essentially, so far in the semester, we have come all the way from the bottom to the top, disk manager, and manager, et cetera, to query planning. This, if we will, could be described as the linear workflow or cycle of a query. And for all the components we talk about, they are kind of independent. For example, how do you manage the access of a page on the disk doesn't really affect much of how you perform a join or how you perform a sort algorithm, et cetera. But there are other components of the database system, namely concurrency control and recovery that actually just permeates different components or parts of the database system that we talked about earlier in the class. So that's why we actually haven't talked much about the other two components yet, but they are actually very, very important to achieve the desired functionality of the database system. And to some way, you can argue that they actually help achieve the core functionality of the database system, which is to ensure a quote-unquote acid property of the system, which I will give more details. But essentially, there are two other components that permeates different parts of the system, would touch how you execute a query, how you write pages in and out of the disk, et cetera, that are very important to the database system, that we are going to talk in today's lecture as a following of a few lectures. So what are those? Well, why we need those additional components? Before I talk about the details, I'd like to give you a few motivation. So far, we talk about how we execute queries in the database system, at least how to execute a single query. But what if, in many practical cases, the database could have many clients. And for example, this could be your bank, then many users could log into the bank, transfer their money, et cetera, et cetera. Then what if there could be different users or clients that are trying to modify the same record? What if one value, the other give another value? How do you ensure the correct value is not being overwritten, et cetera? This is called a risk condition. And the other is, hey, what if there's a failure, there's a power failure, for example, during the execution of some of your queries? For example, if you want to transfer $100 to my bank account, what if that database already took $100 out of your account, but they haven't put it in my account? Then there's a power failure here. How do we deal with such a issue? That would be called the correctness of the database state. So these are the issues, very important challenges the database system deal with so that it can ensure the good property of the data. So to be more specific, the first issue about the risk condition would deal with the challenge of you lost certain updates and then the mechanism to deal with that would be called concurrency control. And that's what we're going to talk about today as well as remaining three lectures, actually. We're going to talk quite a bit on that. And for the second property, what if there's a data loss on the power failure? That would be called the durability of property that the database system is going to achieve. And the method to cope with that would be called recovery. And we are going to talk about that later in the class as well. So to emphasize a little bit about the importance of the concurrency control at recovery, they actually just provide the fundamental or most distinguishing property, if you will, of database systems. And again, I mentioned it a little bit. It would be called asset property. It would be a way to let developers conveniently handle their data without worrying about those correct value got overwritten by another user, whatever there's a power failure, et cetera, et cetera. So that the users can focus on the core logic of the application, to develop features, generate a business value of the application. And because that's when you are developing an app or developing a software, that's what you want it. You want a system that can help your user and generate values, et cetera. And then the database system would handle those complicated data users for you. And of course, there are different types of performance metrics or properties that you want the system to have. And that's why you have many, many different database systems, providing different performance guarantees and a little bit different data consistency or durability properties at times to the application. And then this actually released the developers if we were to focus on the more important logic of developing the application and generate corresponding values for the user. So today we'll start with transactions, aka the concurrency control. And in the next few weeks, we'll talk about recovery. So to be a little bit formal, essentially a transaction would be the execution of a sequence of one or more operations, typically would be SQL queries, a database to perform a higher level functionality. So when I see higher level function here, this would be the example of you want to transfer or I want to transfer $100 from my bank account to your bank account. So the database system would actually not have an interface or query to say, hey, I'm going to initiate a task of transferring $100, not really. These tasks are executed or implemented through a SQL queries. Maybe you first read your bank account, deduct a value of $100 from your account, and then read my account, and then increment $100 value from my account. Typically this is performed through a set of standard interfaces such as SQL queries. But then a transaction would just be a sequence of such small tasks or SQL queries that perform a higher level meaningful functionality such as to transfer $100 from your bank account to my bank account. And then this transaction would just be the basic unit of database operations. And database wouldn't really allow partial changes to a system, right? So it wouldn't allow a set of queries that would be intended to fulfill such functionality such as transferring the money to just, I mean, execute partially. And then have this $100 loss in nowhere, okay? Okay, so to give you a more specific example, let's move $100 from my account, for example, to a promoter of this class, right? To his or her account. And then the transaction would be the execution of a series of operations. Again, typically SQL queries in this case could be a first check whether I have $100 in my account and then deduct $100 and then add $100 to my promoter's account. This would be called a transaction. And this is execute either all or nothing. This is like a basic unit of the operation. So how do we achieve that? So, well, we can actually imagine a strawman approach, right? Let's just start with that. How about we just enforce the system to execute one transaction after the other? If there's a transaction to transfer money, you execute that transaction. If that transaction is still executing but the other transaction arrives, you just queue them up and then execute one transaction after the other. And then how to ensure that the crack update is always possessed in the system? Well, what you can do is that you can, before the execution of the transaction, you can actually copy the entire database, right? Again, a very strawman approach. You copy the entire database to a new file and then you make all the changes to the new file, increment $100, et cetera. And then only when all the changes have applied successfully, then you actually swap the database content from the old file to the new file. You just swap a pointer in your internal database system to point to the new data file with the updates and then you delete the old file. So this would obviously be correct. You don't really have the issue of multiple users updating the same record with the different correct results got overwritten because of my execution at the time. And also it ensures durability, right? So because before you finish editing the new file, finish the execution of this transaction, everything would be possessed on the old file, right? And if there's a power failure, then after you come back, you just realize, hey, I haven't swapped my database file from the old file to the new file, then you just remove the new file, right? But in the case of the transaction successfully finished, you made all the changes, then you just swap the database to the new file to point to the new file, then nothing on old file will still matter. So this would ensure the concurrency control property as well as ensure the challenge of the lost data records as well as ensure the durability of property, right? So that can recover from the incomplete state. Does that make sense? Is that a simple or naive approach? Okay. So obviously, one issue with the simple approach would be that it could potentially be slow, right? So what would be ideally one is actually to allow multiple transactions to be executed at the same time so that they can interleave the operations. And why is that? Well, there are two reasons. Because the first is that, I mean, modern database or not modern data system or like machines would actually be a multicore, right? So if you only allow one transaction to be executed at a single time, well, of course, I mean, that's correct, there's no overwritten, but then, I mean, if your machine has 10, 20, 40 cores, only one core would be utilized, a very low data utilization, right? And also everything is queued up, also increase the data latency. And furthermore, many database operations could actually read the data or write data from the disk, right, to interact with the data on the storage media. And in that case, if one transaction, it wants to read certain records, I mean, from a page, and the page is not in memory, it has to go read from the disk, which would take milliseconds, for example, right? And while this is happening, if you use the naive approach, nothing else can happen, and the database system just stalls. And what you really want to do or what you want to, the modern database system to be more efficient is that you could while waiting for one transaction to read one record from the disk, you could do a context switch, and let the database system to handle the execution of the other transactions if they have available data already in memory, right? And if you have finished fetching records from the disk, you can just switch back and execute the original transaction. So this will result in overall higher utilization of the hardware on the system, and eventually would lead to higher throughput, right? Make sense? Right, very straightforward, right? We want multiple transactions to be executed together, so we have higher throughput, lower latency. And of course, during the meantime, we need to ensure the correctness of all the earlier properties we talked about, and we also want to ensure fairness, right? Because if we are scheduling different transactions to execute concurrently, we don't want one transaction, I mean to just star for the results, wait forever. And of course, we want to execute the whole thing efficiently. And this, by the way, it actually turns out to be very, very challenging, right? If you want to execute many things together to be correct, to be efficient, to be fair, and just like people are still doing lots of development and continue to perform lots of organizations' research in this space. And in fact, I mean, even with the Postgres, they have a system, right? We mentioned a lot during this class. They actually still found, I mean, transaction bugs, right? Even last year, I think, they still found one transaction bug in their system with a newer testing framework called SQLizer that is developed recently, right? So again, quantization is pretty hard. Transactions are also pretty difficult in database systems, all right? So to be a little bit... So to give you a little bit of motivation on the definition of the transaction, right? So here, when we talk about that, we want to ensure the correctness of the database, there are actually a little bit of a subtle distinction here. So consider a scenario, right? We just arbitrarily let operations to interleave with each other. Well, this can have a higher throughput, higher performance, obviously, right? But at the meantime, like I mentioned, I mean, data can overwrite records, can be overwritten by different transactions, data can be lost, et cetera, et cetera. But there can actually two types of incorrect behavior, or you can call it inconsistency behavior lead by arbitrary interleaving operations among different transactions. The first we really call temporary inconsistency, right? This is a kind of like unavoidable middle state while executing a transaction. For example, if you want to transfer $100 from my account to your account, then during the middle of that higher level operation, there must be a state that there could be $100 extracted from my bank account, right? But it has not been put to your bank account yet. And in that case, the database is sort of in a wrong state, or in formal called the inconsistent state. But that's just kind of unavoidable, because you haven't finished the transaction, but that's fine. But there are another type of, you know, this is called permanent inconsistency, right? For example, if during that happening, I already deducted $100 from my account, but then there's a power failure, right? And when you come back, somehow you cannot realize that this is an incomplete transaction, and the database is just left in a state that $100 deducted from my account but not putting your account yet. And if the database system left in that state permanently, well, that is bad, right? That's the thing that we want to avoid. So, essentially, there are different type of inconsistencies. Some will be allowed, some will be not allowed. So what will be actually be the formal definition of correct here, right? For these transactions executing the database. So that's what we're going to talk about. So firstly, to clarify a few basic concepts that we are going to define in this class. So first, a transaction would carry multiple operations on the data from the database, right? Could be read or write from different records, et cetera. And then, for the purpose of the database transaction, it's actually only concerned about the content of the data, but not everything else. So it may sound a little bit abstract. For example, if there's one transaction that is executing some user logic outside the control of the database, right? So it can, again, transfer $100 from deduct my account to your account, but in between, right? Let's say, first it deducts $100 from my account, but then after that, it somehow sends an email to the users from the outside world, hey, this $100 has been deducted. We are doing this transaction, are we complete? But then, before the transaction finishes, there's a power failure, right? Let's say the transaction abort. Well, abort would be the concept we're talking about later, which is the transaction could not get finished. Then, in that case, the email is already sent, right? The database system cannot deal with the fact that there's some application logic written by the user that have some external effect outside the scope of the data on the database, right? That's just not what the transaction can deal about. The transaction only focuses on the read and write on the data in the database, all right? And then, for the purpose of this lecture, we're actually going to define the set of operations that the transaction is going to perform on the database would be on a fixed set of named objects, right? For simplicity, we just denote them as A, B, C, etc. And we note that there are two important things here. We are defining the set of objects the transaction is going to operate on, right? The first is that I'm defining them as fixed, which means that for the purpose of today's lecture, I'm just going to assume that only the transaction is going to read or update on these objects, right? And obviously, transaction can also read the record, read the record, etc. But then, I mean, to handle those, it would be a little bit more complicated. And we'll talk about them in later lectures. But for today's, I mean, lecture is only an hour and 20 minutes, right? So we are going to assume the objects would be fixed at, where we only focus on the read and update on these objects so far. And the second is that when I give these objects the transaction is going to be operated on as just simple names, we actually don't care whether these objects are either an entire table or a tuple or a page on the index, etc., etc., right? Because the algorithm to deal with the concurrent operations for them will actually remain pretty much the same, no matter what is really inside the objects as you will see later in the class, right? The algorithm would actually only care about whether you perform a read or whether you perform a write, etc. It wouldn't really care what would be the content of these objects. But just practically speaking and also generally speaking, most of the system, the transactions would actually operate on these objects at the tuple level, right? Even though the transaction in concurrent control or transaction algorithm doesn't really care, but most of the system would just operate on tuple level since that's just the natural way to do it. And then we are going to define the transaction would just be a sequence of read and write operations on these objects, right? Denoted as read on A, write on B, etc., etc. Okay? And just again, we are not showing demos today but we are actually going to show some demos in the next lecture, right? So I actually very encourage you to come to the lecture this Wednesday as well. But essentially, the secure way to begin an internal transaction would be, again, to begin a transaction would simply be in most systems would be to begin command. And of course, this may vary a little bit among systems to systems, but most systems would just be probably begin. And then to end the transaction, right? Usually, there will be two types of command or two commands you can issue. The one type of command will be called commit command which means to indicate the system that you are done with the transaction and you want to apply all the changes to make them durable in the database systems. And then the thing to note here, one important thing to note here is that if you are a user and tell the system to commit, the system actually doesn't guarantee that it will commit your transaction because in order to ensure all the desired properties we talked about earlier and also allow for concurrent operations, there could be cases that the system just couldn't allow a transaction to commit. You have to stop the transaction and roll back all your changes to ensure the desired property of the database system. Again, we will talk more in this class, but the important thing here to emphasize here is that when you tell commit, the system will actually have the option to either successfully write all your records. So a system, you are done, all the system may tell you that, hey, I cannot do that and I have to abort. And of course, you can also directly tell the system to abort which just means that, hey, I just figured out that I made some mistake and just I cannot really continue to execute this transaction. All this transaction doesn't make sense to me so far so I want to undo all the changes to a database system. Recall, transaction would be the basic unit of operation in the database system. So if you don't want to complete this transaction, you have to abort all the changes in the transaction and start over again, okay? Okay, the last sentence would just say, hey, abort can either be you as a user to use or the system can do it on itself to ensure the property. So now, what are these desired properties? So I talk again, give this term earlier, but more formally, these fundamental distinguishing property of the database system can provide to developers, we will call asset, right? Some of them may actually heard of this term before during our other exercise in our current science class. So essentially, they will be just the abbreviation of atomicity, consistency, isolation, and durability. So let me just briefly go over one by one. So atomicity, just like I mentioned before, it just means that all the operations in the transaction are executed all at once or we have to rule back everything, right? Assume this transaction didn't happen, right? This is the basic unit of the operation. And for consistency, we're actually going to go a little bit light on this in these few lectures. It will actually make more sense in a distributed setting, a single-node setting. It does not matter that significantly. But essentially, I mean, as it defines, it says that if a transaction is consistent and the database state starts consistent, then it will end up in a consistent state. For this consistent, you can think of it as a desired property, right? For example, assuming that all the money transferring are within a single bank, just assuming that, then a desired property would be that, hey, at the end of the, no matter how you transfer the money, at the end of the, all the total money of all the accounts in a bank would be staying at the same number, the same value, no matter how you transfer in between. That could be called a desired property. And the consistency would be, say that, hey, if you start the transaction with a specific value, and if the transaction is just moving money in between accounts, then you just end up with the same value, right? For example, this could be a consistency property or a desired invariant of the database you want to apply. And then next, isolation. Execution of one transaction is isolated from the other transactions. It's really an important concept that can really ease the use of the users to deal with the data, right? So when I'm issuing a transaction to a database system, I'm just going to execute this transaction as if I'm the single solo user of this database system, right? So if there are some other concurrent transactions that have some updates that are happening in the middle, having to commit yet, or in the process of rolling back or undoing, then I should not be able to see any of these partial effects, right? So I should, I mean, when I'm dealing with the database system, I should only care about my own transaction, and as if that's the only transaction running in the database system. This is a very, very important property, and we are going to expand more on this in this course that can ensure, make the user to use the database system much, much conveniently. And lastly, as all those straightforward durability should mean that if I have committed a transaction, then all the changes will be persistent on a persistent media such as disk, right? You cannot lose data. To summarize them in short, atomicity means all or nothing. Consistency means that, hey, the database looks correct to me, right? For example, the money in the bank account adds up to the correct value. Isolation, as if that I'm a single user of the database system, right? I mean, executing this transaction alone, and durability means that if there's a power failure, maybe that would be on disk, right? Or it can be recovered from disk. It's not going to lose anything. In this course, we're just going to go over these properties one by one. But again, we'll be a little bit light on the consistency property. Oh, and also in today's lecture, I mean, things as you can see from the title of lecture is called concurrency and control theory, which means that we are going to talk about the high level concepts, right? Of how database system is going to achieve this property. But we are not going to talk about the specific algorithms or the implementation details, right? Because there are just much more of them that we cannot cover in the single lecture. So today, we are going to focus on a little bit about the high level methods or methodologies, if you will, okay? So actually, any questions so far? So when we are, I mean, we already talked about this a little bit, right? When we are executing a transaction, the transaction, essentially, there will be two possible outcomes, right? Just to read a little bit. The first would be it commits, which means that all the changes to the database system, and then they are also durable on a disk, right? The other would be abort, which means that, I mean, none of these actions, whether the actions that has already been done or the actions that haven't been done, none of these actions should have any effect to the database system, right? Everything should be undo if it's already applied. And it's the database systems property that to ensure this atomic execution of the transaction, right? It's not on the users, on us. So to give you two examples of the scenarios that the database system needs to deal with, right? So the first scenario would be that, hey, again, the transfer money example, we want to take $100 off of my account, put it on someone else's account, but then the database system could abort in between, right? In this case, I mean, just $100 just become missing in the database system, right? This is not good. The other scenario is that, hey, again, take my $100 off of my account, but then the database system could actually experience a power failure, right? In this case, we also need to ensure that the original state of the database system is going to be successfully recovered. So again, there's no money loss in this case. And then how do we ensure that? Well, there are essentially two possible ways to deal with this atomicity property. The first would be called logging, right? So what this login do is that before or when the database system is applying any change to the two posts, right? All the data contents is actually while applying those changes, in the meantime, it will write out what changes it has done onto a log, onto the disk, right? So essentially, if you want to modify the data, you need to have a backup on the disk to talk about what has been modified to what value, right? And then there are two benefits, right? It serves both of the properties we talked about earlier. The first is that if somehow there's a conflict, another user is modifying the data, we don't want to overwrite some correct value, then we can look at the record on the disk and then undo what we have written to this record and to restore the value back to the original state, and then we can abort the transaction, right? That's obviously one case. It also solves the other case, right? If there's a power failure, then what has been written halfway is already got lost. You don't know what's before and after the transaction, but you have the log on the disk, right? The log can tell you that, hey, I have a transaction that has did this modification, but it has not finished, right? So after I come back from power failure, then I can again restore to the original state. And there are also other potential purposes, but that's important for the asset property but for some practical scenario, right? If you want to certify some auditing constraints, right? It's very common seen in the banks. If an auditor want to know who purchased this stock, etc., trying to comply with a certain regulation, this logging could also be useful, okay? And that's what most database system do, right? Almost all the database would actually have this logging component to ensure automaticity. But then when I say almost, then this implies that there's a different approach that a few database systems do, very, very rare, but that exists, which will be called a shadowpaging, right? So instead of, I mean, write the changes what's before or after onto a log record onto the disk, what you can do is that you can, a little bit similar to the strawman approach, you can either copy the entire database or copy the pages that you are going to modify to a separate place or shadowpaging it and then do the modification there, right? And once you have finished all the changes of the entire transaction, you can switch, you can do a pointer switching, right? Switch the database system to point to the content that you've written in the new copy of the database system, right? And then this actually, by the way, this is actually what being done in first place in the very first original database system implementation or one of the first in the system implementation system, right? That's what they did. Because, I mean, first, that's like a straightforward, right? I mean, the shadowpaging is like an established technique used in many places. And also, you don't have to go to this additional step, right? You write additional records onto the disk and it could cause right amplification, et cetera. But the drawbacks would also be, would also have many drawbacks, right? For example, you can cause lots of data fragmentation when you start this shadowpaging. And also, you can, typically, if you do that, then you can only commit one transaction at a time, right? Because you have to switch data from the old copy, new copy, back and forth. And essentially, system R abandoned that and switched the logging approach later. And then most of the system currently, almost all the system currently also used the logging approach instead of the shadowpaging approach. But then there are also some systems do that in very specific application scenarios, right? For example, CouchDB, I think the first version of CouchDB did that, but I think later on there's also some variant that, I mean, switched to logging, and there's another, they have a, it's called a level, LMDB, they also do it that way. But it's in very rare cases, okay? So before I switch to consistency, any questions? Yes, please. Do any actions that put the action in the log of a copy in the e-couch app? Do any actions that the data extracts before you log into the action? Yes, yes. So, well, actually, this is getting out a little bit ahead of time. We'll expand that later. Again, this texture focuses on a high level concept, but essentially there will be a technique called write ahead logging. You have to make sure that a set of changes to be, to be, the log of a set of changes to be persisted on the disk first before you can actually apply the change. But we'll expand that later. Of course, there will also be performance issues related to that, so we have to optimize that as well. So, yes, yeah, good question. Cool. Next consistency, just, again, this lecture is going to talk a little bit at a high level, right? So this consistency is trying to, just trying to ensure that the code on code, the word represented by the database is logically correct. And then when you ask the database system about, with some queries about the database state, it will give you back a logically correct answer. Again, think of an example, or an example where if you are just transferring money within your bank, then one potential is not, is not like a necessary, right? You don't have to define it, but one potential consistency or property you could have is that you, at the end of the day, you want the value of all the money in your bank add up as the same, right? That could be a logical or property that you ensure, you can ensure consistency are, right? Then again, there are two types of consistency, right? Before you execute a transaction, right? The database system could be in a specific state, right? For example, if you declare all the money add up to be a specific value, then you can check whether, at any point in time, whether the database system is actually indeed satisfied that property, right? That would be just called the consistency property of the database system itself. And also, of course, for transactions, right? If your transactions, it's just moving money from accounts, but always ensure that if you move money out of one account, always put the same amount of money into another account, then this transaction would satisfy the consistency property you want as well, right? If there's one transaction that somehow magically put a million dollar in my account, well, then it will not satisfy the consistency property we defined earlier, right? Makes sense? Again, this would actually make more sense when we talk about the distributed database system. For now, for a single node database system, it's not that, it does not matter that significant, okay? So what we will spend the most of the time of this lecture is the isolation property of the transaction, right? Again, this is a very, very important property that the database system provides for the users so that they can conveniently develop their applications without need to worry about, hey, data overwritten or like some other transactions, et cetera, et cetera. So, again, to, oh, that essentially this just summarized that, okay? So, essentially, this deal with the interleaving issues of different users like accessing different records, issuing transactions at the same time, and then we need an approach to ensure that the database system can provide this isolation property, right? That's the mechanism we are going to talk about, which will be called the concurrency control protocol, right? So to formally define it, concurrency control protocol is how the database system decides the proper interleaving of operations from multiple transactions that are happening concurrently, such that each transaction or each user would just perceive the database system as executing that transaction alone, right? It will really really see any impact or effect of the concurrently executing or executing the transaction, okay? The mechanism to ensure that will be called concurrency control, right? And generally speaking, there are two types of concurrency control protocols. The first will be called pessimistic, right? So just from the name, we can tell that what it does is that it just assumes that this either a conflict or the overwritten, I mean, of different concurrent transactions would actually happen pretty often, right? So before the execution of each operation in your transaction, we have to do something to ensure that, hey, my system is in an absolute good state, right? So that before I execute anything, this thing would actually not cause any impact to other users in the database system, right? So that every user would have an isolated view of the database system, right? So this would be called pessimistic. You do something before the execution of every operation. And then the other high-level approach we call optimistic, right? Again, just from the name, you can tell that, hey, this approach just assumes that the conflicts or the concurrent writing of users to a single record have actually happened very rare in the database system, right? So in this case, you just let different transactions to do whatever they want to do, like execute things, but then before the users want to commit a transaction, right? You're actually going to go back and check, hey, whether during the execution of my transaction, I have either violated some property or conflict with other transactions, et cetera, that cause the database system in a invalid state so that if that happened, I would actually only abort the transaction and erode all the changes at the end, right? So that would be an optimistic approach, which is let transactions do whatever they want to do and check whether there's a problem at the end of the day when they want to commit, okay? So I'll give you some examples of the transactions, right? That could be executed, right? So they'll give you a little bit more concrete. So assuming that we have two transactions in the database system, right, A and B, and then transaction T1 wants to transfer $100 from, well, they're going to transfer T1 and T2, and then there will be two records, A and B, right? Then transaction T1 wants to transfer $100 from A's account to B's account, and then transaction T2 will actually credit both accounts with a 6% of interest, right? And what we want to do is that we want to ensure that no matter how the database system interleaves the operations of the different transactions, right, so that they can actually concurrently, at the end of the day, the database system still have a correct state, right? And we will define that more formally later, right? But intuitively, right, we can look at what would be the possible result of these two transactions and look at what would be a desirable result, right? We can look at it intuitively first. So essentially, what do we do? Is that we move money from one account to another, right? And then we credit both accounts with some interest. So one way to say that the transaction execution would have a correct result would be that, hey, at the end of the day, like I mentioned earlier, you check the total amount of money of these two transactions, right? But in this case, a little bit complicated because there will be an interest involved. But no matter how you apply the interest, like, say, before the money transfer or after, at the end of the day, the database system have the same amount of money, right? Or in the bank have the same amount of money. And after you apply the interest, it would still have the same amount of money, right? In this case, it would be $21, $20, right? And then, I mean, either way, I mean, no matter how you execute this transaction, T1 first or T2 first or interleave them, at the end of the day, we need to ensure the property that the bank or the database system still have this amount of money in total, right? Otherwise, it will be incorrect. Yes, please. Is the naming correct? Yeah, sure, sure. Yeah, yeah, yeah, yeah, yeah, yeah, yeah. Probably interleaving would be a better terminology. Yes, yes. Our account should always be the same. Yes, please. Yeah, yeah. Well, the database is just like a semantic, right? Essentially, the result in the state of the database system would be $20 to $20. And then we want to ensure that no matter T1 first or T2 first, at the end of the day, it has the same result. I mean, you can call it outcome, right? But it's just semantics, right? Okay. So, I mean, the possible outcomes? Yeah, actually, yeah. I mean, in this case, I mean, as we define here, the outcome would be, say, hey, what would be the money left in the database system? Instead of this invariant property we talk about, in this case, there could be a different outcome, right? So, in one case, the account A could have one type, one number, right, before, if they credit to the transfer, before the interest apply. But in other case, if you do the transfer, after the interest apply, it could have a different amount of money in each account, right? And then this outcome could actually be different. But at the end of the day, if you end up, don't have the same amount of money after the interest, as we talked about earlier, right? This is $20 to $20. Then the database system would be an incorrect state, right? Yeah, in this case, in both cases, it adds up as the same amount of money. So, we'll give you some examples, right? Some simple examples will be here, right? You first execute a transaction T1, right? And then execute a T2, right? It will be on the left. And the other case, you can reverse the order, right? I mean, you can check the numbers here, but no matter in each case, at the end of the day, I mean, the total amount of money after the interest in the entire bank or the database will actually end up to the same, right? And any value other than that would be incorrect, okay? So, again, this is just a little bit reiterate on why it's important to interleaving transactions. Well, because we want to improve the throughput, we also want to reduce latency, right? While we are fetching some records on the disk, we want to slip in some operations from other transactions to a system, right? So that the data system don't just stall and wait forever or not forever, but just wait for the records to come back without utilizing any resource, right? And in the multi-call system, we also execute transactions concurrently, right? To improve the total throughput. So, will be the possible interleaving of transactions, right? So, in this case, what we can do, just give you one example is that we can first, I mean, in the transaction T1 deduct the $100, and then in T2, we can actually apply the interest on the account, I mean, after the deduction, right? We can already apply the interest there. But then after that, we can put the $100 on the other account and then apply the interest on the $100 on the other account later, right? And in this case, I mean, everything, or like, at the end of the day, the total amount of money will actually remain the same, right? And in fact, the end result of this transaction would be the same as you execute the entire transaction, entire operations in T1 first, and then execute the entire operations in T2 later, right? So, even though these operations are interleaved, at the end of the day, you still ensure this correct property, right? Make sense? This would be a valid. So, what do we invent, right? Here, for example, if you T1 first deduct the $100, and then T2 first multiply this value, the account, the balance in A, deduct it by $100 by this interest, and then before transfer the money, if the transaction T2 immediately apply the interest, right? And then later on, transaction T1 finally increase this $100 to the account B. Then in this case, there's like some amount of money. In this case, it would be $100 in the account B, have not been applied the interest, right? In this case, the total amount of value would be different, and then the bank is missing $6, right? This would actually be bad, right? So, in this case, this transaction would actually fail, and then this is like a very bad state you can end up with if you are operating a bank, right? You never want this happen. Make sense? Okay. So, just to loop back a little bit about the transaction definition we talked about earlier, for the purpose of the transaction, right? It only cares about what records it is accessing and whether it's read or write those records, right? It actually does follow from the transaction's perspective as I mentioned earlier. It doesn't know whether, hey, this is a tuple, this is a bank account, this even could be an index page. It doesn't really matter, right? It only knows that, hey, whether I'm reading the data or I'm writing the data. And here, in this case, I'm just, I mean, I'm writing the property on these operations as using the notation we talked about earlier, right? This could be read on A, write on A, et cetera, et cetera. And that's what the database system sees, at least from the concurrency control perspective, that's the concurrency control component of the database system sees, instead of the concrete semantic meaning of these records, right, whether it's A and B, et cetera. Make sense? It's very important, right? So now, we finally can talk about how do we define whether the transaction execution is a correct or not, right, with many examples and motivation. So we define a transaction scheduling or operation interleaving order in plain English, right? But more formally, a scheduling of the transaction is correct if the schedule is to equivalent to some serial execution of the transactions, okay? Sorry, yeah. So we define, okay, sorry, I'm just there. I'm just going to expand the concept here, right? So a serial schedule of the transaction would mean that a scheduling of a sequence of transaction execution without any interleaving of operations, right? In the other example, it would be T1 after T2 or T2 after T1, right? That would be called a serial schedule of a transaction execution. No interleaving. And then what equivalent schedule means that is that if for any database state, the effect of executing the first schedule is identical to the effect of executing the second schedule, right? So which means that if there are two different ways to arrange the execution of two different transactions, if at the end of the day, the two schedules of the transaction end up with the same state for the database, right? Same number of money in A, same amount of money in A in B as we turn the same results to the users, right? If the two schedules have the same effect, then we are going to call these two schedules equivalent. And this is independent of how those transactions are executed, right? Independent of what drawing algorithm we use, what index we use, et cetera, et cetera, right? Makes sense? So this will be called equivalent schedules. And then finally, we are going to call a schedule a serializable schedule. If such schedule is equivalent to some serial schedule or serial execution of the transaction we talked about earlier, right? So in this case, looking back to our earlier example, right? If you interleave some operations in transaction A and B, right? Let's just go back, right? Yeah, here, if you on the left, even though you interleave some operations in the transaction A and B, so on the left, the schedule of these two transactions itself is not serialized, but then it is the effect of the transaction schedule on the left will be equivalent on the serial schedule on the right, right? So what we would just call the schedule on the left is serializable schedule, okay? And then this would be a correct schedule as far as we concern for the heavy system, right? And again, looking back to the earlier consistency, if each transaction is consistent, then every serializable schedule would also be consistent, okay? Any question on this terminology definition so far? Okay. So far, one question you may have is that, hey, why we have to define this complicated equivalent schedule, serializable schedule, et cetera, et cetera? Couldn't we just say that, hey, no matter which transaction arrives at the database system first, right? We need to execute that first, right? And then we need to have, ensure a schedule that always follows the order of when the transactions arrives at the database system, right? And then that would be the correct state of the database system. So why don't we do that? Well, the reason is actually pretty straightforward, is that if we allow this more flexible definition of schedule, right? We allow the system to swap the execution order of different transactions, and then just ensure a final property equivalent to a specific serializable, I mean, serialized schedule, then we actually allow many more possible different scheduling of the database system to execute things, right? This would just allow the database system to interleave the operations more flexibly and then potentially more efficiently, right? That's just pretty much the reason why we define serializable this way, instead of strictly following the order that they arrive to the database system. Make sense? Okay. So talk about how do we ensure this serializable property? Again, we are not talking about the specific control algorithm to ensure the query or the database system executes these transactions in a correct and efficient manner. For here, this lecture we're just going to talk about how we are going to analyze a specific schedule of a transaction, so that we know whether this schedule is correct or not, right? We are going to talk about algorithms and implementations in later classes. So when we are trying to determine whether a transaction schedule is serializable, we need to use an important concept here called conflicting operations, right? We need to know what we can do and what we cannot do so that we can ensure the serializable property of a transaction schedule. So here, to define it more formally, we are called two operations or in two different transactions are conflict. If, I mean obviously first, they are operated by different transactions, and second, if they are operating on the same object, right? But at least one of them is the right. Then in this case, we will say, hey, these operations on the two transactions would have a conflict. And then we'll use that to analyze whether a specific transaction schedule is serializable or not, okay? And to expand a little bit, there are three different possible conflicts, read-write conflicts, write-read conflicts, and also write-write conflicts, right? Depending on the type of operation you perform. There's actually another special case called a phantom. So this is not entirely the all the list of the anomalies, but again, we'll talk about it later because there's too many stuff here. So first, the read-write conflict, right? So we define the read-write conflict as the schedule of the transaction that you cannot really repeat the read from earlier in one transaction. So again, it'll give you a specific example that will be a little bit easier to understand. In this case, assume that you first read the value from the record A, assuming this value is 10. Then you read this value from transaction two, the value is still 10, but then you can apply a change to this record A again. In this case, you would have a read on the left, but write on the right. That would be called a read-write conflict. In this case, after you read this value from A again, you come back with a different value. And clearly this violates the isolation property because if A is the single transaction, you just cannot read a value as 10 and then suddenly it comes back as a value as 19 if T1 is the only transaction. So this would be a read-write conflict and then there's a problem it would be called unrepeatable reads. And then another example would be the inverse case would be called a write-read conflict or in formal terms it would be called a dirty write. So again, assume that you have this transaction T1, T2, T1 first read record value 10 and then write this value back to this value A as 12. Assume that later on T2 reads the value 12 and then did a modification operations on the value A and increment this value by 2 and make it 14. In this case, if before T1 commits, T2 already read the value and then did the modification and write the value back and later on, this would be called a write-read. You first write and then read in T2. But later on, if T1 aborts, then the C-dhubby system end up in an invalid state because the T1 aborts, none of the T1 should matter anymore but what T2 does is that it reads the value A and increments it by 2. So if T2 is the only transaction that I made, that means the system can never reach this state. So this would be incorrect as well. It would be called dirty write or a write-read conflict. Make sense? So lastly, they would also be write-write conflicts. So in this case, you will be overwriting So again, transaction T1 and T2. T1 can first write a value 10 in A and then T2 can write a value 19 in A as well. So far, this is fine because you can overwrite values. This is actually fine so far. But what if T2, later on, write a value in B, and then T1, later on, write a value in B called Andrew. So the database system would end up in a state that it has the value Andrew for B but it has the value 19 for A. This would be a write-write conflict because the database system, if you execute either transaction isolatedly or if any serial order of transaction T1 or T2 or T1 first T2 first or reverse, there's no way the database system can end up in such state and that would be called a write-write conflict. We also need to avoid if we want a transaction schedule to be serializable. Yeah, this is just not going to work. So one interesting thing we can observe from this exercise is that we can observe the database system, it actually doesn't try to schedule this transaction. Generally speaking, the database system doesn't really try to come up with a very, very complicated or clever schedule of the transaction. Well, in real cases it does, but generally speaking it does not. What it does is that it just allows the transactions and the operations within each transaction to interleave with each other naturally depending on when an operation comes or what operation is stored by a disk fetch so that it will need to switch to other transactions but just interleave if that happens but the database system usually just comes back and check whether there will be any conflicts among a specific schedule of the transaction so that it either needs to abort or doing some operations to handle that we'll talk about later. But generally speaking, it's very difficult to know the exact operations of transactions in advance so that it can come up with a clever, concrete schedule of transactions ahead of time and then follow that. It's one observation. The other is that there are actually generally two types of serializability that people define. The one is called conflict serializability. Again, that's most database system would try to support and that would just leverage the conflicts operations that we talked about earlier. That's the most common case. There's also another case called view serializability that is pretty rare because that's actually more flexible even more flexible than conflict serializability. It would allow more possible interleaving and possibly make the database system more efficient but that actually requires a little bit of understanding of the semantics of the application and what it's actually trying to do instead of just read and write. In practice, that's very, very difficult and pretty much nobody can really do that but that definition exists. Now let me just define this. We'll give some examples but for here we have to formally define conflict serializable schedule. We say if two schedules are conflict equivalent if, first of all, obviously they need to involve the same set of actions or operations of the same set of transactions. And second, if every pair of the conflicting transactions is ordered in the same way. Assuming that every pair of the conflicting actions are ordered in the same way. Assuming we have one transaction schedule, another transaction schedule, there are different conflicts in one schedule and the other and the conflicts in one schedule would be the same set of conflicts and follow the same order as the other schedule and then we will call these two schedule conflict equivalent and then the straightforward conflict serializable definition would just be that if one schedule is conflict equivalent to some serial schedule. We'll give examples but just bear with me. Let me go through the formal definition. So how we are going to check whether a schedule is conflict serializable is that we are going to see whether we can transform a schedule S into a serial schedule by swapping consecutive non-fuck conflicting operations of different transactions. So that for the conflict operations or actions of two different schedules, we cannot change them but we can swap on conflict operations as far as we want. And if by such swapping we can make the two transactions end up to be the same, then they would be conflict serializable. Let me get to the examples. They will be easier this way. So assume that we have this transaction T1 and T2. T1 first read and write on A and T2 write, read and write on A and T1 do the read and write, read and write. So same operations but have the interleaving. So we can first look at this. So we want to know whether these two transactions are conflict equivalent. So what we do is that we can look at the non-conflict pairs in the transaction. In this case, it will be read on B, write on A. Well, they are operating on different records. So obviously they are not conflict type. So we can just swap these operations. And similarly, look at other records, A on B, not conflict. Swap them. And a similar again is that different records, not conflict. And then we can swap all of them. So at the end of the day, we can make them in a state that you execute all the operations on T1 first and then execute all the operations on T2 later. So this transaction, original transaction on the left, will be conflict equivalent to a serializable transaction on the right. In fact, this is like a T1 first and T2 second. In this case, the original transaction scheduling will be conflict serializable. And it will be a valid execution of a transaction schedule. Make sense? Cool. Pretty cool. All right. Let's look at just a bad example. Let's say you have a transaction T1, like read on A, and then transaction T2, read on A, write on A, transaction T1, write on A again. So we can again try to swap things and make this conflict serializable. But in this case, if we want to make it conflict serializable, we have to swap the write on A. For example, if we want to T1 execute first, then we have to swap the record on A up. But in this case, we just can't, because no matter whether you swap write on A from transaction T1, with write on A or write on A, or with read on A or write on A, transaction T2, they are all conflicts. So you cannot swap. So this will not be a conflict serializable transaction. In fact, it's invalid. And then, yeah, serializability just defines as like swapping operations so that we can just make a two transaction schedule the same. But one thing that we probably noticed so far is that, well, it's a little bit cumbersome when we want to determine whether a transaction is correct or serializable or not, we just start with these random swapping over and over again. That's like lots of overhead, and it takes lots of time to compute or to determine whether a schedule is correct or not or serializable or not. So whether there will be any algorithm, there will be a smarter way to help us to do that. Well, it turns out that there definitely is. So the better way or the algorithm to help us to determine whether a transaction is a conflict, or whether a transaction schedule is a conflict serializable is called a dependency graph. So what we do here is that for every node, for every transaction in the transaction schedule, we are going to define a node in this dependency graph. So we are going to define an edge. It's actually a directed edge. We are going to define an edge from transaction TI to transaction TJ. If there's an operation, OI of transaction TI conflicts with another operation on TJ, furthermore, this operation OI on transaction TI needs to happen earlier than the operation OJ on the transaction TJ. Then there will be a directed edge from transaction T1 to TI to TJ in the dependency graph. I mean, some textbook will be called a president's graph as well. Then with this simple graph, the way to determine whether a schedule is a conflict serializable is actually straightforward. Just look at whether there will be cycles in this graph. If there is cycle, no matter how you're swapping transactions, at the end of the day, you will never be able to reswap operations and then make this transaction schedule to be equivalent to a serial schedule. If there's no cycle, then you can always figure out some swapping so that you can make the operation in one schedule to be equivalent to a serial schedule. Make sense intuitively? Okay. Nice, nice. We'll give you some examples. Here, in this first example, we again have a transaction T1 and T2. It's actually an example we showed earlier, just to rewrite on A and then rewrite on B. The other way is the same. In this case, what are the conflicts? There are definitely conflicts. For example, here, there's a write-read conflict with the operation on T1 happen first. What we'll do here? We'll have a directed edge from T1 to T2. Then there's another write-read conflict from a transaction T2 to a transaction T1. What do we do here? We'll have an oil edge from T2 to T1. In this case, this transaction is actually not conflict serializable, because there's a cycle in this graph. Make sense? You can just not be able to swap it to make the same. On the other example, there's again a more complicated example, translating T1 and T2 and T3. First, there's a conflict from the transaction T1 to T2, but then it's a write-read conflict, because the write happens first. So it will be a T1 to T2. Then there will be another conflict that has a write-read conflict from T1 to T3 as well. But there will be no other conflicts in this graph, because you can check the records. All the other records are on different values. Both of them are read, so there's no conflicts. In this case, it will be equivalent to the serial execution of the transactions, namely T1, T2, T1, T3, so that this schedule would actually be a conflict serializable schedule, and then it will be correct. Transactions are isolated, and we can ensure the correct properties of the data. So what do we... I mean, look back to the... Okay, so let me give you an example here. We can talk about what this was later. So assuming that, I mean, here not only am I denoting the read-write operations, so I come back to hypothetically, in the dependency graph, we don't have this information, but hypothetically, I add back to what operations the application code will do inside those transactions, just to give you an illustration. So here, we can look at this conflict graph. Here, it deducts the value A, and we... Well, deduct the value A and look at the sum, and then, I mean, echo the sum out. In this case, it will be a write-read conflict, and then we can draw this graph, and also a read-write conflict. We can also draw the graph as well. And in this case, obviously, again, this is an invalid transaction state, and it's not conflict serializable. But in fact, what if there are some application semantics from this transaction that we hypothetically can know? For example, instead of looking at the sum of A and B, which, I mean, if you schedule the transaction this way, it's a conflict, and the result would be inconsistent. But what if we have some notion or understanding of the semantics of the transaction, and what if such semantics the transaction needs to perform is a little bit loose, right? Let's say, instead of looking at the summation of the values A and B, we just care about, hey, how many values are greater than zero. And again, assume in this case that even though you deduct $10 from A, and whether you deduct or not, it'll always be greater than zero. In this case, even though this transaction is not conflict serializable, but at the end of the day, if all transaction B cares about, it's that how many values would be greater than zero. And if assuming that A is always going to be greater than zero, then in this case, even though from the conflict serializable definition perspective, this scheduling is wrong, but then in practice, this schedule actually returns the correct result. Because T2 will always return two if both A and B are greater than zero, again, assuming A has enough balance. And also, T2 only reads things. It never writes things back. So no matter how you interleave the operations in B with A, at the end of the day, all the records in that system would also end up the same. So strictly speaking, the execution result outcome of this transaction is actually correct. But just in practice, because it's very difficult to have a system to know, hey, what do you really want to do with the record? What's the semantics of this? So in practice, we just look at whether this is a read or a write, and we define conflict graph on this read and write. And we use a little bit more stricter approach to determine whether a schedule is correct or not. It's a little bit more conservative, but in most cases, that's what we can do. So this gives you a little bit heads up on the view serializability, which will be the case that, assuming that you know a little bit more about the semantics of the transactions, then you can give a little bit lean weight. So you can potentially allow more possible flexible transaction executions and not abort the above transaction, not abort the earlier scheduling. But in practice, that's very difficult to do. We actually, I'm not going to read these things here today, because it's not used, as far as I know, it's not really used in a practical system implementation, but you can check it out later. But I can give you a more concrete example, so you can understand, because it is a formal definition of serializability. Say you have these three transactions here, T1, T2, 3. Again, I draw the dependency graph as a read and write, and then there's a read write, and then, yeah, there's many conflicts. But then, I mean, obviously it's not a conflict serializable transaction schedule, and according to our standard, I mean, this cannot happen, we have to abort. But if you actually look at these transactions, they're just doing blind rise on A. So the first transaction reads a value from A, but then it just rise the value back, and then I never read it again. And then for transaction T3, it's the last transaction to write a value, and well, it actually, it doesn't really matter what you do with T1 and T2, because if you write the value in transaction T3 at the end, well, I think at the end of the day, the Davy system just has that single record. So in this case, in terms of the eventual outcome, I mean, this could actually be equivalent to, for example, X2, T1 first, T2 second, and T3 third. The outcome would be the same, but according to our schedule, this is not allowed. Yeah, essentially that's what it shows here. Make sense? Right? Okay. Yeah, I mean, I mean, this is actually essentially allows not only a serializable schedule, but also allows this a blind rise. If you never read the record, you just write a new value, then at the end of the day, it's the same, the value that T3 write is not dependent on any other value read or write earlier, okay? So, again, this is almost summarized that. Essentially, a view serializability allows more possible scheduling than a conflict serializability. And, well, and essentially, it's a little bit difficult for Davy system to know the semantics of the operation to achieve view serializability. And actually, neither documentation actually allows all the possible schedules that we would consider serializable, because beyond the blind rise, there are actually other possible operations that if you know the semantics of the transaction ahead of time, right? Let's say, if you know what exactly will happen, the exact read or write set on each individual record and the exact operation you're going to perform, there are many other potential schedules you can perform. But Davy is just a difficult Davy system to know that. So, in practice, it's mostly, we only care about conflict serializability. Okay, and just to remind a little bit that in some real cases, some system would actually look at what's inside the transaction, right? So, it's not totally 100%. And in some cases, in some cases, Davy system would look at what's exactly in the transaction and do a scheduling, allow more possibility. And in some cases, right, Davy system would also push a little bit of flexibility to the application level, right? So, if the application, I mean, specify that it would allow certain type of conflict because of certain or certain properties, then it's also possible, okay? Okay, to summarize today's, I mean, today's different properties of schedules we talked about earlier, right? Let's assume that this is the entire set of possible schedules of a set of transactions. Like all possible schedules. Then, the serial schedule would just be a very small set, right? With no interleaving among transactions whatsoever. Then, conflict serializable would be obviously bigger than that, right? You can continue leaving certain operations, but then there cannot be any cycle, right? In the conflict graphs, I mean, defined by the three conflict, three types of conflicts earlier, right? There can be any cycle in the dependency graph. They will be called conflict serializable transactions. And then, view serializable is actually strictly bigger than that, right? It allows all the conflict serializable transactions, but it also allows other transaction schedule as well, such as we have a blind right as we talked about earlier. And then, for durability, right? So, again, we will talk about more durability with recovery, but to ensure the correct state of the database, we also need to ensure that when we commit a transaction, we have to apply all the changes to the persistent storage, such as a disk, right? So that there's no partial update, or like a torn update, or there's no updates from a transaction that only executed one or two statements, but didn't finish the rest, right? And then, there will be logging or shadowpaging to ensure those properties, okay? So, to make a summarization of the asset properties we talked about earlier, because this is very important. You will hear this term many, many times in the lecture, as well as later on, if you deal with the database, you will deal with those properties all the time. Automicity, everything happens all together. It's either all or nothing, right? Like transactions is a basic unit. And then, consistency, right? If the database is in a consistent state, satisfy certain properties, right? We gave examples earlier, then if a transaction is also consistent, then a consistent transaction applied on a consistent database state would result in a consistent database state, okay? And the isolation, I mean, every, the execution of every transactions, I mean, is as isolated as if the database system is handling that transaction alone. And finally, durability, if a transaction commits, then the effects always persist, okay? So, any question about serializability, acid, et cetera, before we just go to the Calcurent? No? Okay, nice. So, just to sum everything up, concurrency, control, we talked about this class, and recovery, we will talk about the future, a few lectures later, among the most important functions provided by a database system, right? For the asset property, mostly they are already to currency, control, and recovery. And these are just super, super important why the database system even exists in the first place, right? Just because it is not only trying to save, restore, retrieve data efficiently, but also try to handle those complicated cases involved with the database, with the data management, right? Such as different users accessing power failures, et cetera, it deals with all those things for the developers, so that users or the developers only need to focus about, focus on their application logic. And also, currency and control, as well as recovery, they all happen automatically, right? The users don't really issue the database system and say that, hey, if I'm executing this operation, I cannot execute the other operation, or I need to put a lock on this record or that record, et cetera, right? Users don't specify any of those things, but the database system handles everything automatically, right? Again, so that the users can use the database system easily, all right? So just want to mention that there would be some argument that's saying that, hey, I mean, if you put the things in the database system, then would this be less efficient, right? Similar to the example of the compiler, the argument that people put on compiler or criminalization, some people will argue that, hey, if you just let users to specify the order of different transactions or specify what type of conflicts they allow, they don't allow, and what scheduling that's put out there, would that be more efficient, right? But then there was this famous paper from Google talking about their Spanner system, it's like their globally distributed database ensures lots of these asset property. They actually have a paragraph to confirm that Google believes it would be better to have application programmers to deal with the performance problem due to overuse of transactions, right, such as bottleneck when bottleneck arises, rather than just actually always, I mean, writing code to cope with the different use of the data due to lack of transactions, right? So in practice, people just found out that, hey, just a very, very inefficient, at the end of the day, if the application developers need to deal with these transactions, even though theoretically, they may come up with better scheduling and potentially more efficient scheduling, but in most cases, if the database system deal with the transactions, they actually put their energy among more important things and can create more values, all right? So this is all today. In the next class, we are going to talk about the specifics. I'll talk about how do we actually use the two-phase locking algorithm to ensure these properties we talk about in the concurrency control mechanism, as we will talk about the different levels of isolation.