 Okay, so today's class we're going to start talking about logging. As I said, this week is the last week of what you need to do, the basics of building a single node database system. So if you understand today's topic, you understand Wednesday's topic, then that's everything you need to build a full database management system running on a single machine that's fault tolerant, can recover after crashes and can do transactions. So today's class we'll talk about logging schemes, how to actually write data out to a log while we're running transactions in order to recover after a crash. Before we jump into that, nearing the end of the semester, so there's only a couple more deadlines coming that you have. So homework four is due today at midnight. Project three is due next week, next Monday at midnight before the holiday. And then homework five will go out next week and then it'll be due December 3rd. And then I'm missing project four, but project four will be due, I think, at the end of the end of the semester, I think, before finals week. And then the extra credit is due around that time as well. Okay? So we're almost done, we're almost there. So I want to start off today's lecture with a sort of a simple motivating example to help us sort of set us up to understand why we're going to talk about what we're going to talk about today. So we're going to run a simple transaction that's going to do a read on A, write on A. And then we want to see now, actually physically, where the changes that these transactions making is actually going to end up. So when the transaction starts off, there's nothing in our buffer pool, and it wants to do a read on A. And out on disk we have a single page, and that page has one record A. And so in order to read this record, we have to go copy it from the page on disk and bring it into our buffer pool. And we know how to do this because we covered this before. So now it's in memory. Now we can do a write on it. So now we do a write on A. And again, the way database systems work, or just sort of storage works in real systems, I can't modify things directly on disk, I want to modify the object in memory first. So we can make our write to our buffer pool. Then now our transaction is going to go commit, right? And again, commit basically is going to want to say is that the database system is going to give an acknowledgement to the outside world that our transaction is durable, is safe, right? Meaning all the changes that it made are persistent at the disk. So at this point here, do we want to tell the outside world that we've committed our transaction? Yes or no? What's that? Correct, so he says no because A2 is not out on disk, right? Because what could happen is we could have the most evil person in the world, right? The Hitler of databases, Hitler, he comes along and he zaps our machine. We lose all power and our write goes away. So if we had told the outside world that our transaction has committed, then Hitler comes along and kills us, then we're screwed because now anybody comes back and is going to want to look to see, oh, you told me this transaction committed, I know I made this change A, where is it, right? And we can't recover it. So that's what we're talking about for this week, crash recovery. And recovery, obviously, the basic idea is they're going to be the techniques we're going to implement inside of our database system that's going to allow it to ensure that it can guarantee consistency and durability despite almost any possible failure that the data says may incur. So every recovery algorithm is going to have two parts. The first part is going to be the actions that the data system takes at runtime as transactions are modifying the database. And then the second part of the methods or techniques we use after a crash or after a restart, how do we use the information that we generated during runtime during the first part to restore the database to the correct state, right? We want to guarantee that all our transactions are acid, right? So, atomicity, consistency, and durability. Isolation is about concurrency control, how to interleave the operations. We're not so much worried about that here. So for this lecture, we're focused on the first part, right? Today we're going to talk about the runtime things. What you actually need to do while you're running transactions, what information you want to generate, write it out the disk so that if there's a crash you can come back and restore the division to the correct state. So we're going to talk about a bunch of different things. So we'll start off talking about the different types of failures that we could incur in our database system and that we want to overcome in our recovery protocol, recovery algorithm. Then we'll even talk about the management policies for our profitable manager for deciding when data is actually written out the disk and when you're allowed to do this. Then we'll talk about two techniques to do recovery. The runtime recovery mechanisms, we'll talk about shadow paging and we'll talk about right ahead logging. And then we'll, the order's flipped, which we'll talk about logging schemes next and we'll finish off talking about checkpoints. And the checkpoints will segue into what we'll talk about on Wednesday of actually how do you take the information we've collected today in our system and be able to restore the database to the correct state. Okay? All right, so the recovery mechanisms that we're going to have in our data system are going to target the different components that we're going to have in our architecture. And those components in some ways, their failure models are dependent on the underlying storage device that they're predicated on or where they're actually storing data. And so for these different storage devices, since they're going to have different properties of durability, then we're going to want to have different policies of how we actually treat them, how we actually put data into them. So the first thing we need to do is understand, then more concretely, what the storage devices are. And then we can see what the theories we could have on those possible storage devices. So this basic hierarchy we've already talked about before at the beginning of the class. We have this dichotomy between volatile storage and non-volatile storage. So volatile storage is DRAM, it's not persisted after a restart, after a crash, because the way DRAM works is the motherboard is giving it a small amount of charge every so often. And that's how it's restoring the charge inside it to maintain ones and zeros. So obviously, if you lose power to DRAM, the charge dissipates and you lose your data. There's been studies, I think, that maintains the data under the right conditions for maybe like 20 seconds after you lose power. But that's not good enough for what we need in our system. So the next category is non-volatile storage. And this is what we've been assuming, what we've been calling the disk throughout the entire lecture. This is your spinning disk hard drive, this is your SSD. And the key difference about non-volatile storage and volatile storage is that any right we do to non-volatile storage will be retained after we lose power. There's a whole bunch of other stuff we talked about earlier about non-volatile storage is usually doing reads and writes at a page or block level and they're better for sequential access rather than random access, all those properties still matter here. And we'll see how we design algorithms to take advantage of that in a second. The last type of storage in our classification is called stable storage. And so this would be a type of non-volatile storage that can survive any possible failure model we throw at it. So I say it's non-existent because if I have an SSD and I light it on fire and I melt it, I lose all my data. There's no magical device that's gonna be able to persist, no matter if I shoot it with a gun or blow it up in a bomb. This doesn't actually exist but this is actually we're gonna end up storing our log when we talk about right ahead logging because we never want to lose data from this because we never want to lose the log and we always want to put in stable storage. So the way you essentially get stable storage is you can approximate it through redundancy, through like RAID having replicas or doing offsite replication to another machine or another data center. You can basically approximate this to get stable storage. But you can't buy a single device that gives you this property. So now given these storage types, now we talk about the classification of different type of failures we're gonna encounter in our database system. And we're gonna see as we go along, we can talk about the different runtime protocols like shadowpaging versus right ahead logging, which one of these are resilient, are able to overcome these different types of failures. So the first type is mean transaction failures. So these are gonna be things that the transaction is gonna get tripped up on that the database system is gonna prevent it from actually continuing and committing, right? So there's logical errors, think if I violate integrity constraints, like I can't insert a record with a duplicate key as another record, right? These are things that the database system will say, you can't do this, you have to abort. You have to roll back your changes. The second one are internal state mechanisms of the database system to prevent transactions from continuing because they're violating some higher little concept like serializable ordering. So this is like the deadlock detection or deadlock prevention stuff you're implementing in the third project, right? So both these cases, these are gonna be very common and we want our recovery mechanisms to be able to handle both of these, right? In these cases, we don't want partial transactions. Any transaction that gets tripped up on these two types of errors, we wanna make sure we roll back all of their changes. The next are system failures. So this would be hardware and software. So a software system failure would be, there's a bug in our database system, right? Somewhere we have an uncaught exception on a divide by zero. And the OS crashes us, right? We don't want this to happen, obviously. And database vendors spend a lot of time testing the database system as much as possible to make sure they catch all bugs. But no software is magically bug free. So this can occur. So again, if we have a software failure and we have a transaction running, we wanna make sure we don't come back and still see the changes from that transaction. Hardware failures are when the actual machine that is hosting the database system crashes, right? And this would be a simple thing, it's like someone tripping over the power cord. This could be like the DRAM goes bad, right? So for this, we're gonna have what's called a fail stop assumption. And we're gonna say for this type of failure, that the non-bottle storage components that we're actually writing things out the disk for our database. We're gonna assume that when we have a hardware crash like this, that they're not gonna be corrupted. That we can always come back and everything will be in the correct state we expect it to be, right? There won't be a sector that gets garbled up. So this obviously is not always a correct assumption. And this is what sort of RAID and other error checking code things help us automatically figure out for us. But for our purposes, from the database perspective, we're not worried about this. Somebody else will take care of this for us. And the last one is storage media failure. And so this is when the actual storage device itself gets corrupted, gets totally messed up. And we end up just losing data, we can't use it at all, right? So this happens all the time. You can have bad sectors on your disk, or even SSD. You can burn out the cells because you wrote them to them too many times, right? So no database system is gonna be able to recover from this. Because this is actually something in the physical world that software can't rectify, right? Again, if I light my machine on fire or light my hard drive on fire, no database software is gonna magically make that thing come back, right? So we can get around this through using stable storage by replicating or having redundant copies of the data we want to be persistent. So we can overcome this issue, right? And so by database mostly stored by an archive version, essentially that means coming back from a backup copy that you have. Because you can't recover anything from the disk. So as I showed in my example in the beginning, and for this entire class, we've been talking about disk-oriented databases. So this is where the primary search location of the database is assumed to be on disk, and anytime we want to read or write data, we have to get the pages out of the disk and put it into our preferable manager, right? And we do this because the DRAM, non-volatile storage, sorry, DRAM and volatile memory is much faster than disks. And that's the only place we actually can do direct writes at a fine grain. So again, as I showed in my example in the beginning, we know what pages we wanna access. We copy them in memory, apply our changes, and at some later point, we wanna write those changes back out of the disk so that we can recover them after a crash. So the reason why we're gonna do this, as I said, is that we wanna guarantee that any transactions that a transaction, any changes that a transaction makes and is able to commit is then persisted after any restart or crash. We also wanna ensure that all our transactions are atomic and we don't want any partial changes from uncommitted transactions to show up after a restart. So in all the failure scenarios, the first two types, so the hardware software failure and the logical failures can happen inside the data system, we wanna make these guarantees for the storage media failure. This is something that's beyond what we can handle here. So the underlying primitives that we're gonna use to make this all work, be able to make these two guarantees that all transactions are durable. If we know they're committed and that we have no partial changes, the two key primitives we'll have are undo and redo information. So they sound exactly like as they state, right? So undo is the information that's gonna allow us to remove the effects of incomplete or uncommitted or boarded transactions. So there's gonna be enough information to say, here's what the old value used to be, and here's how to actually restore it. And that way you undo the change that a transaction made. Redo again sounds just like it is. It's the information you need to reapply a change that a transaction made to the database, okay? So we're gonna use these basic two primitives to now build up a recovery mechanism that we would use at runtime to store this information out the disk in some way so that if we crash and come back, we have enough information to be able to make our guarantees that all transactions are atomic, durable, and the database returns as consistent, okay? So now, which of these two ones you actually use will depend on the how the data system is gonna manage dirty pages in your buffer pool. So let's go back to our example. So now we have two transactions. T1 is gonna do a read on A, right on A. T2 is gonna read on B, right on B. So again, at the very beginning, our buffer pool is empty. T1 starts, it wants to do a read on A. We go out the disk and get that one page that has all our data, and we bring that memory, we can read it. Now we're gonna do a write on A, and then we just go update directly in memory. Now we have a context switch, T2 starts, it does a read on B. Our page is already there in memory, so we don't have to go fetch it, we're done. And then it's gonna do a write on B and applies the change directly in memory. Now we go to commit the T2. So the question we had the first ask ourselves is, in order to say this transaction is committed to the outside world, in order to make the guarantee that all changes are durable, do we have to force its changes that are in the buffer pool on the dirty pages out the disk? Who says yes, for this example? It should be obvious, right, yes, right? What's the problem? Again, this is an entire page, right? We make changes, when we write things out to non-fault storage, we're writing things out in a page granularity. So what happens if I write this page out? What comes along with, I made a change to B1, I wanna get that out of the disk, but what's the problem? Yeah, A's in here, A got modified by T1, but it's gonna get written out. So the question is, do we allow the Davidson to write out the changes made by T1 on A, because at this point T1 is not committed. Then say we go ahead and do that, all right? Now we tell the outside world T2 is committed, that's fine, but now we come over here and now T1 ends up aborting, right? So now we need to roll back T1, what do we need to do? We gotta go out the disk, find out, first of all, we gotta know that whatever T1 wrote to in this page actually made it out the disk, then we gotta figure out where the hell it is on the disk and bring it back in and reverse that change. Right? So there's two aspects of this. So the first issue was, here, do I have to force all the pages modified by transaction when it commits out the disk? The second issue is, am I allowed to write pages that have changes from uncommitted transactions out the disk? So these are gonna be called the steal and force policies of our vulnerable manager. So the steal policy says whether the database system is allowed uncommitted transactions to write out pages out the disk with dirty records, even though they have not committed yet. Because essentially the database is on disk, the current version that's ignoring other transactions that came before me. Whatever pages are on disk are the current versions from previously committed transactions. Now my T1 shows up and he modified one of those pages. Am I allowed to overwrite the last committed version of the record with uncommitted data? Right? This is what the steal policy is. Essentially another way to think about this is, I'm running out of space in my buffer pool manager. And I need to have big pages to make room for new pages I need to bring in. Am I allowed to steal pages from another transaction or uncommitted transaction to go put those things out the disk and then bring in the ones that I need? So steal says you're allowed to do this, no steal says you're not. The next policy is called force. So force policy says whether the data system is required to flush out the disk all the pages that were modified by a transaction at the moment they commit before we can tell the outside world your transaction is safely committed. So force says that this is required, you're enforcing this, all dirty pages modified by a transaction will be flushed out the disk. No force says this is not required. So let's look at now, and so you choose either steal, no steal, or force, no force. You choose one for the first category and choose one for the second category. So let's look at no steal force. So no steal says I'm not allowed to write out dirty pages from uncommitted transactions and force says I have to write out pages, dirty pages from transaction actions when they commit. So T1 starts, does a read on A, we fetch in a memory, that's fine. T1 then does a write on A, we modify that in memory, that's fine. Then we have a contact switch, we read B in T2, that's fine. Now we write B in T2, that's fine. And then we go ahead and commit. So force means at this point that when I get the commit message, T2 has to have all its dirty pages written out the disk before we can tell the outside world that we've committed. But with no steal, we say that we are not allowed to write out uncommitted changes, not allowed to write out changes made by uncommitted transactions. So how would we have to handle this? Well, you'd have to then maintain some internal information to say, all right, well, T1 modified A and it's uncommitted, T2 modified B and we're trying to commit that. So let me actually just only write out the change to B out the disk. That gets flushed, then we can tell the outside world that our transaction is safely committed. Then when this guy later is abort, this is super easy to roll this back now, because I know there's nothing out on disk that this transaction modified. So I don't need to touch anything on disk, I just update, I just reverse the change in memory in the buffer pool. And that's really fast to do because it's in memory. So who thinks this is a good idea or a bad idea? I already gave the advantage of it, right? Rolling back is super easy, because I just reverse the things in memory. What's one obvious bad side of this? He says you have to track every tuple. True, I'm thinking something even worse. He says you need to write one with the times. True also, I'm thinking something even worse than that. So what does no steal say? No steal says I'm not allowed to write out pages to disk from uncommitted transactions. So ignoring two transactions, let's say there's one transaction, and I have a database of a billion tuples. But I can only keep one million tuples in memory. Would that work here? No, right, because I would update the first million, then I'd say fuck, I need the next one, right, but I can't go write out my other ones, right? Because the no steal policy says I can't do that, right? So this is super easy to implement, as I said, because you just, yeah, you have to track every single tuple, that's not a big deal. Yeah, you end up doing more write potentially, that's ignoring that, but I can't actually, I can't take every possible database I'd want and be able to modify every single tuple that I want, right? And again, the whole idea of a disk-oriented database, the one of the big selling points is that we want to be able to manage databases that are larger than the amount of memory that is available to my machine. To make it appear as if it has enough memory, right? So easy to implement because we never have to undo changes on an abortive transaction because nothing ever got written out of the disk, right? And we never have to redo changes either because when I, if I crash here, right, say before you get to the abort, say this isn't just crashes. I come back, this is the only information that I have, I just have my change to be. All of this gets blown away because it's in memory. So I come back immediately and don't have to do any extra work to put me back into a correct state. So this makes recovery super easy as well. But as I said, the downside is that the, you can't, you can't actually, you know, exceed the amount of memory that's available to you. So let's look at now one possible implementation of No Steel Force that can sort of overcome this. And again, we'll see some, we'll see the deficiencies of this and then we'll see what the right-hand logging is the approach that everyone actually uses. So shadowpaging, the basic idea is that we're gonna main two separate copies of the database. They'll be the master copy and the shadow copy. So the master copy will be the current consistent snapshot, I shouldn't use the word snapshot, the current consistent database state on disk. And then the shadow copy will be where we're gonna stage updates to this. So rather than overwriting the original pages that we have in our master copy, we'll make a copy of them in the shadow area and then right there. So then what happens is that we'll have a pointer that says here's, you know, here's whether, here's the current location of the master copy. And so I stage all my changes to the shadow and then when I go ahead and commit, I just flip the pointer to now point to the shadow and that becomes the new master. And now all my changes that I have for my transaction immediately become visible. So again, so this is an implementation of no steal and force. So again, the way we're gonna actually organize this internally is that the page directory is gonna be this tree structure. And what the tree structure is gonna allow us to do is that we can do path copying to say, here's the portion of the tree that's been modified and here's the portion of the tree that hasn't been modified. And as needed, I know how to route transactions to the right location based on what they're doing. So again, at a high level, it looks like this. So we have some database root, right? And then we have our master page table and then this just points to the pages out on disk. And then again, there's a, you know, sort of thing this is all the, so the buffer pool manager, we can swap these pages in and out as needed. And anytime I wanna find a page, I first, if I'm updating transaction, I would look for the shadow copy, but if I'm just doing a read-only transaction, then I can just follow the database root and find the right page that I need. So to install the updates, as I said, what's gonna happen is all we need to do is swap that pointer to now point to the shadow copy. And the advantage of this versus maintaining a bunch of information to say, here's all the pages that I've modified and now they become the master version. The reason why we do it this way is because we can do an atomic write on the database root because that'll just be in a single page. So I only have to update one location. If I had to say, apply all the changes to the various pages that I made in the shadow copy, I can't do that atomically because the data system's not, sort of the storage device can't do atomic writes beyond a single page. So let's look at an example. All right, so say this is the current state of the database, right, we're pointing, we only have the master page table at this point. So our transaction comes along T1 and it'll immediately create a shadow page table. And at this point here, the shadow page table does not have anything modified yet. So all the entries are still pointing to the original pages in the master page table. And now if I now do an update, then I'm gonna apply all my changes to through the shadow page table and any read-only transactions can just go read the master page table. And we talked about this before having a snapshot isolation. You sort of think of the same thing, right? I have a consistent snapshot of the database up here, but then down here is where I'm staging all the modifications for transactions that have not committed yet, right? So as I go along, as I update these pages, I'll make a copy of them and then I just update the shadow page table to now point to them. Then at some point, my transaction is gonna wanna commit and again, if I had to atomically apply all these updates of these pointers into the master page table, that'd be hard to do because that's gonna span a single page and the file system or that this can't actually do that atomically. But because all I need to do is just update the root, then now all my changes become visible, this thing becomes the new master and any transaction that comes behind me can conceal all my rights immediately. So I'm ignoring anything about the concurrency protocol. So you still have to do two-phase locking or timestamp ordering or OCC. All that still applies here, right? This is just how you're actually gonna organize the storage of the data so that you can recover it after a crash. So what do you have to do in this case? How do we recover from a crash on this one? Do I have to do anything? Shaka said no, right. Because if I crash and come back, say I'm here, if I crash at this point before I commit, the transaction in T1, all its changes are on the shadow page table. They should, you know, the transaction died before it actually committed. So nothing that it wrote should actually still be seen after a restart. So if I come back, my database route is pointing to the master page table. I don't have any, you know, stage updates in there. I'm correct when I come right back. I'm in the correct state. So I have the recovery super easy. So all I do is just throw this away, right? And then even after I apply the changes when this becomes a new master, same thing. I don't care about anything over here. This thing's consistent and I'm immediately correct. So what's one additional problem we have with this we have to deal with? So we talked to this last class with multi-versioning. So he says, if you have concurrent transactions, do you use same shadow page table? Yes. So that's what I was saying before. So if you have multi-transactions coming in here, they may be doing multiple updates to different things. We still need two-phase locking. We still need a concurrential protocol to make sure that those things are ordered correctly. And then what you basically, the way this would work is that you have to stop executing transactions at some point and say, all right, this is my batch. Everybody has to commit and then we apply our change. Of course, now the issue there is you have one transaction that's going to run for an hour and you're kind of screwed because that guy might run forever, right? And then you don't apply any other changes. Yes. So based on your description, it seems like the transactions are commuted in batch. Are you asking, sorry, yes. Transactions will, so my example here, it's one transaction. And so when he commits, I just flip the pointer. If you have multiple transactions, then you have to apply them in a batch. And again, so that means that you have to stop executing new transactions at some point, wait till everybody commits at a barrier, and then you apply all your changes. So if one of the running transactions aborts will affect the other transactions? So he says if one of the running transactions aborts will affect the other running transactions. So again, so that would be like, you'd have to maintain undue information in memory to roll back their changes. And that's what we saw with like, two-phase locking or time-safe ordering. That's all sort of orthogonal to this. Again, we're worried about recovery here, not animosity for concurrency control or isolation. So if I crash before I flip my pointer, I come back and my master page table is perfect. Yeah, I mean like, let's see if you have like the transactions running. Yes. I mean the first two commits. Yes. And then the third one, let's say the third one is the last one running. Yes. So if the third one commits, you will have to do a pointer and you've got to do a copy of the page table. Yes. But if the third one crashes, you don't swing a pointer. So if you swing a pointer, there will be no one. Yes. It seems like the first two didn't commit. They didn't commit, but you didn't tell the outside world they committed. That's what I mean by a batch, right? So only when I swing the pointer, do I say whatever transactions were being staged in my shadow page table, you now tell the outside world they committed. So you don't have any partial updates. Some transactions could say I'm committed, but our users can't see it. But yeah, the databases won't tell you you've actually committed yet. You'll stall and wait. So the issue we have to deal with now, of course, is garbage collection, right? Because now we have this master page table. We want to throw it all that away and reuse it. And then we have these holes here in our heat file from pages that are no longer accessible from the current master, and you can't get there anymore. And so obviously, we want to be able to reuse them. So as I said, supporting recovery is super easy with this because to do undo, all we have to do is just throw away the shadow pages and leave the master page table alone and it will come back in a correct state. Again, this is recovery. This is after a crash. It's not while transactions are still running in memory. And we don't need any redo information because there's nothing to redo because all our changes have made it out to disk. The downside, though, is that copying this page table can be expensive, right? You can use path copying to sort of speed things up. But it's extra work you have to do whereas in right-hand logging, we're not going to have to do this. The commit overhead is actually very large as well. So we're going to have to flush every single page out to disk. They don't even know that it's durable. We have to stall transactions if we're doing them in batches until everyone finishes. The other big issue is that the data is going to get fragmented. This is one of the big issues that the IBM guys saw. So shadowpaging is what IBM implemented in System R first in 1970. So this is how they did crash recovery. And then they turned out it was a bad idea and they threw it all away when they did DB2 in the 1980s. And one of the big issues was they were having fragmented data. And so before back here, say I was doing a sequential scan and this was on a clustered index, I could just rip through that real quickly and everything's going to be in the correct order. But now as I start making updates, right? As I blow away these other guys, you know, the first page is here, the second page is there, third, fourth, right? Things are getting out now all out of order. This is sort of similar to what Postgres does. Postgres does a new shadowpaging where they do the pen only multi-versioning. And that's why they can't support clustered indexes because they're always depending new tuples and you can't sort them on this page based on the index. So I want to make one correction from what I said earlier in the semester. This was on the first class when we talked about concurrency trial, the introduction to it. I said that SQLite was actually making a complete copy of the database file every single time a transaction started. Turns out I was incorrect and some random dude on the internet corrected me, which is nice, he was cool about it. And what SQLite actually does, and I misread the documentation, what SQLite actually does is they make this journal file and they copy the old versions of the pages into this journal file and then they overwrite the original version. And the idea there, if you crash before you commit, you come back and say, here's the journal that tells me how to reverse those changes. So it's sort of like the opposite of like shadowpaging. Instead of making copies and then over, making separate copies that you then modify, you make a copy of what the old version is and then you modify the original version. So this be clear, SQLite doesn't do this by default anymore. We're going to show right ahead logging next and that's actually what they do because that's actually superior to this. As far as you know, there's only one or two systems that actually still do shadowpaging to LMDB and I think Couchbase or CouchDB does this. I actually don't know if they still do, I know LMDB does this. So again, so this dude, this dude, this is what I love about the internet, right? I'd put this on YouTube and like some random dude in his basement in India that I've never met, I will never meet. He just say, hell, you're wrong. Here's the line in the documentation to say where it actually really does it, right? That's awesome, I enjoy that. All right, so the other issue we're gonna have with shadowpaging is that our, these flushing operations are gonna be expensive because we're essentially doing a bunch of random writes, right? We have a bunch of random pages. As I showed before, we run through the garbage collection after we've swapped the pointer in shadowpaging and now we have a bunch of other pages. We want to be able to reuse those locations for the next shadow table that comes along. So now what happens is you have all your pages are just completely randomized in your heap file and disk. So now every time you want to go commit a transaction, you're doing an F-sync or a flush at all these different random spots. And as we said, the non-volatile storage, like SSDs and spending these hard drives, sequential access is much faster than random access. So that's sort of, the shadowpaging is the worst case scenario because it's random access to do the writes out the disk to make sure, make things are durable. So we're gonna wanna actually come up with a better approach where we can convert all of these random writes into sequential writes. And this is what write ahead logging is gonna do for us. So the basic idea is that with write ahead logging is that we're gonna write to a log file the changes that a transaction to make to objects in our database before we make write the updates to the objects. Now at the very beginning, everything's gonna be in memory so that's gonna be fast but when we actually now go write things out the disk, we wanna make sure that any log, if we wanna write a page out the disk that contains a modification from a transaction, we wanna make sure that the corresponding log record that created that change is written out the disk first. And now we have this nice coupling where when a transaction commits we don't have to write all its changes out from its dirty pages out the disk. We just have to flush out the log. And the log's gonna contain enough information for us to be able to recover if there's ever a crash to reapply the changes we made to the pages. So this is an implementation of a bufferful manager with a steal no force policy. So again the steal policy is gonna say that we're allowed to write out dirty pages to disk before transaction's committed if we write out the log records for those pages first and then no force says that we don't require all the dirty pages to be written to disk when a transaction commits. The only thing we have to force out is actually the log records. Again, cause the log's gonna contain enough information for us to be able to recover those changes. So what's gonna happen is while transaction's running it's gonna create these log records and it's gonna stage them into a log buffer. And a log buffer is usually going to be backed by a buffer pool. And then at some point in these log records are gonna be written out to non-volatile storage before we're allowed to overwrite the pages out on disk that were modified for the transaction. And the transaction will not be considered fully committed until all its log records including a commit record that we're gonna maintain for it is actually written out the disk. So transaction starts and the first thing we're gonna do is we're gonna append a begin record in our right head log in memory that just says, hey, there's a transaction started and here's this ID. Then when a transaction finishes we write a commit record that says, hey, transaction T1, it committed. And then we can flush that out and then because the log is gonna be sequential every commit record, every log record in between the begin and commit has to be written out before we can write the commit record. And again, once that's all on disk then a transaction is considered fully committed. And we can acknowledge to the application that we're done. So, each of these log records for all the updates or insert update deletes are gonna retain some additional, all the metadata we're gonna need to be able to reapply the change. So we'll have a transaction ID, again a unique identifier to say, here's what the transaction change that it made. We'll have an object ID, right? Whether it's the tuple ID or primary key or the page ID, whatever you want, doesn't matter. It allows you to uniquely identify what object is being modified. And then we'll have a before value, which we need for undo. And then we'll have the after value, which you need for redo. Because again, what'll happen is because now we're disconnecting how pages are written to disk and while log records are written to disk other than the fact that the log record that modified a page has to be written before the page itself can be written. We don't know when we come back, when we replay the log, whether the page that the log record modified made it at the disk or not before the crash. So that's why we have to have both undo and redo. So let's look at our example. So now we only have one transaction, right on A, right on B. So now in memory, we're gonna have our buffer pool and then we have our log buffer, the right-hand log. Again, the right-hand log is just always gonna be, you're pending to the end of it, right? You're always, it's always growing in size. So T1 starts and we have to now add a begin record to our right-hand log. Then does a right on A and the first thing we're gonna do is add a log record to the right-hand log. That says there's transaction T1, it's modifying object A and the old value was one here in the buffer pool and the new value we wanna write to is eight. Once that's been appended to our right-hand log, we can go ahead and make the change now in memory, down there, right? And the reason why we wanna do this is because we'll talk about this next class but we're gonna actually generating an internal ID called a log sequence number that's gonna tell us the order that we're making appending these log records. So down in the page in the buffer pool, we wanna keep track of what's the last log sequence number that modified this page. And that way we know that we will know that whether the log record that made this change has been written out as a disk or not. Again, we'll cover this in more detail next class. Then I do the right-hand B, same thing, a panel log record says transaction T1, out of my B, all value is five, new value is nine. Then I do my commit. Now in the commit here I'll do an F-sync, I'll do a flush to take the log buffer and I know at what point in my log buffer all the changes I've made for this transaction have been written out or where it's last entry is because I added a commit record. And then I write that out the disk. And then now at this point the, because we know we've F-synced or flushed the log record for this transaction, all the log records for this transaction have been written safely to disk, we can tell the outside world that our transaction has committed. If a later point we now crash before we actually write out the buffer pool, the dirty page in the buffer pool, we're fine because everything we needed to redo that transaction is now safely in our log record or log out on disk. So we'll cover this in next class, we can essentially come back, replay the log and then reapply all those changes. And then now everything is durable. Pretty straightforward, right? So again the advantage we're getting out of this is that no longer, in order to write out the dirty pages, I no longer have to do random IO to write these things out to different locations. Because now I can do a sequential write on the log, flush that out and that's gonna go much faster. The other thing to point out to is because the log buffer is usually gonna be backed by the buffer pool manager, the buffer pool manager can decide, oh, I need to save space because I have other transactions and I wanna reuse memory for new log records. There's nothing about the protocol that says you can't actually flush out the log at this point here before we actually commit. So at this point here we could write the log record, the three log records for this transaction out the disk and then trim the log and start reusing it for other stuff. Just think about it, when I come back after a crash, what am I missing? The commit record, right? So there's no, that's the marker that says this transaction, yes, it did actually commit it. So if I come back here and after a crash and I only have this in my log, I know I don't wanna have any changes made by this transaction. And this is why again you have the undo information because I don't know whether before I crash whether this page actually got written out in the disk or not. So if I need to reverse this transaction if it doesn't commit it, then I have the original values that I can put back in place. Or if I have the commit record, then I have the redo values that put everything back in place. So I have both, I can do both ways, right? Okay, so just to recap what we talked about. So when should the data send write out log records to disk? Again, when a transaction commits, right? You have to guarantee that all your log records for that transaction have been flushed out, right? Done at F-sync. You have to wait for the disk controller to say this thing has been safely written out to disk, right? Now F-sync can be slow, right? Cause essentially you're blocking your thread, blocking your process till the OS comes back and says your data has been successfully written. So the way you overcome this or amortize the cost doing F-sync is what it's called group commit. You basically batch together a bunch of transactions that wanna commit all around the same time and then you do a single F-sync on all those things. And now again, you're doing sequential writes. You're doing a lot of writes all at once and that makes things go faster cause for a single F-sync call, you're running out more data than having every transaction do F-sync on its own. So for project four, you're gonna end up having to implement F-sync or sorry, group commit. The way you actually do it is either you say when the log buffer gets too full, then you flush it out or there's a timeout and say I haven't seen any more changes in the last five milliseconds. So whatever is in my buffer, I'll write it out now. And essentially you maintain two buffers. You have one for the stage, all your writes, and then you have the one that's actually actively flushing. Then after you flush the second one, it becomes the place you stage the writes and then you flush out the first one and you swap them back and forth. So when should the Davidson actually write out all the dirty records to disk? So you could do this any single time you updated a tuple or updated page if that's gonna be too slow. You could do it whenever you actually write out that when transaction commits, that could actually be potentially too slow. It actually depends, right? Different systems do different things but because we're doing this first part here, it's decoupled from when we actually have to write the data at the disk. So we could have a background writer to write out dirty pages and just flip them from being dirty to clean inside a buffer manager and then some later point we could evict them later on. It doesn't matter. Everything we have on disk and our log is enough to us to replay the transaction. So this, when we actually do this, could depend on the implementation of the system, could depend on what the application needs. It doesn't matter from the basic of understanding the right-hand logging protocol. So another way to think about why this is so great is to think about what happens if we actually defer the updates to pages. So if we prevented a database system from actually writing out dirty records to disks, like this is the no-steal policy, if we actually don't let it flush out any dirty data, then we actually don't need to restore any of the original values so we don't need any undue information. So we can just remove that entirely from our log records and only have redo. The downside, though, is that if you have different scenarios like this, so the first transaction, T1, it runs, then it commits successfully, then we crash, this guy starts running, and then it crashes before it commits. So for this one, all we need to do is just replay the log and then we reply to all our changes. The second one is that we ignore all the changes when we come back after our crash because we know none of our dirty pages have been written. The downside, again, as I'm distressed as a point we talked about before, this won't work for us if we need to update a segment of the database that's larger than the amount of memory that's available to us. So you can't do no-steal with right-hand logging because you need to be able to, you can't do no-steal without maintaining also the undue information because you can't guarantee that pages are gonna be written out of the disk, won't be written out of the disk until the transaction actually commits. So nobody actually does this and this is why everybody that does right-hand logging you're gonna have to use the steal policy. So another way to sort of differentiate between these different policies with shadowpaging and right-hand logging is in terms of their runtime performance for the system and the time it takes to actually recover the system after a crash. So if you're doing no-forced steal that ends up being the fastest implementation of a proper management policy for recovery because I can flush out log records sequentially that's gonna go really fast, I don't have to maintain, sorry, no-forced steal. Yeah, this is really fast because you write out log records, you can have your working set of the transaction exceed the amount of memory that's available to you and you're decoupling the right-hand log updates versus the flushing versus the page flushing. No-steal force is actually gonna be the slowest because you have to make sure you flush all the pages that were modified by a transaction when they actually commit and that's gonna be random IO to different locations to different pages. In the case of for recovery performance no-steal-forced shadowpaging is actually the fastest because I don't have to do anything after a crash I come back and the database root pointer is pointing to the master page table I don't see any partial updates from uncommitted transactions I don't have to do anything so I immediately come back and I'm consistent, I'm correct. In the case of right-hand logging it actually is gonna be slower because now I gotta come back and use the log to figure out what the hell was going on when I crash and put me back in the correct state I've essentially replayed the log. So most database systems will choose this approach to right-hand logging because they wanna be faster at runtime. Yes, I know it sucks if I have to crash and it takes me a long time to recover but it's not like your system is gonna crash every minute all the time, right? If you do, you have other problems. So most systems make the trade-off of having a faster runtime in exchange for a slower recovery time. Now there's ways we can make this go better with checkpoints we'll see in a second but again, every database makes this trade-off and always chooses this, almost every system. So another way to think about this is that for this one here you're doing no undo and no redo because you come back and everything's correct this one you have to do undo and redo because you have to come back and resolve what was on disk at the time you crashed and put you back in the correct state. So the right ahead log, the context of a log record what I showed before is essentially at a high level, I wanna call you term value log and it's essentially what we were doing. We had an object in our database, we didn't say what it was whether it was a tuple, an attribute, whatever and then we had the original value and the new value, the before value and the after value. In a real system though, you're updating tuples and tuples may have a hundred attributes. So what do you actually wanna store in your log record to say here's the change I made to a particular tuple? So if you wanna store exactly the modifications that were made, then you're doing what is called physical logging. And that's basically saying within this tuple here are the exact changes I made to all its attributes. Here's the before value and the after value. You can think of this like I'm just like storing a diff. The dumbest thing you can do is I'll just copy the entire tuple, the old tuple and the new tuple but nobody does that because it's too expensive so everyone does essentially what is called a diff. But it's a low level change you're making to the physical bits or bytes for the object that was modified. So what's one downside of that approach? Again, always think of extreme examples. I have a table with a billion tuples, I update one billion of them. How many log records do I need? One billion, right. So an alternative is to do what's called logical logging where you store the high level information about the operation that was applied to the database that made the change. So using my example, updating a billion tuples, say that was a single update statement that did that. Update table set value equal value plus one with no where clause, it updates everything. So with logical logging, what I can do instead of storing all the single updates to every single tuple that I have to do in physical logging, I can instead store just the query itself. That's enough information when I replay the log after a crash to put me back in the correct state because this is what I did before and this is what I'm gonna do it again. The logical logging requires less data to be written out to the log record than physical logging, which is nice. The problem with logical logging is that it's actually difficult to implement a correct recovery algorithm from this because there's a lot of randomness or non-determinism in the system that we now have to account for our log records so that we, if we replay the log after a crash, we end up with the exact same state that we had before. So all the continental particles we talked about before are considered non-deterministic continental particles, meaning if I replay the exact same sequence or transactions multiple times, I may end up with a different database state. It'll still be serializable, if I'm running with serializable isolation level, but T1 might get scattered before T2 or T2 might come before T1. And depending on how I'm interleaving those operations, some update query might appear before another one or if I'm running at a lower isolation level, then I may be allowed to read something from a transaction that hasn't committed yet or just committed. And I don't have any of that information with logical logging because it's just saying here's the query that executed before this one, but they might actually be executing at the same time and you might have, you know, it might see half the pages that would modify this other query the first time you replay the log, but maybe not the second time you replay it because it depends on some weird race condition inside of the OS, inside of the hardware, outside of the database system. So with logical logging, it's hard to implement this and have the database always returned correctly in the same state every single time when you use something like two phase locking or time stamp ordering OCC, concurrency control, because again, this is non-deterministic. If you do the partition paste locking or time stamp ordering that we talked about in both EB, they actually can do logical logging because they're always executing transactions in serial order, right? There's no transaction running at the same time at any partition. So they don't have this problem, but my SQL, your Postgres, all your other systems, they do have this problem. So one way to get around this is to do what's called physiological logging where instead of storing the individual tuples that get modified, like you do in physical logging or the low level attributes, you can just store all the log records that target a single page together and you don't specify how the page is actually being organized. You just say, hey, there's some tuple 123 inside of this, here's the changes I made to its attributes. I don't care where in that page tuple 123 is, you'll figure it out for me. So this is actually the most popular approach because you had this nice decoupling of how the data actually existed on disk the first time you ran, from the second time you ran, and you don't end up with incorrect interleavings based on the race conditions of how transactions are replayed. Yes, he says, how do you undo if you're doing logical logging? So for in-memory database it's easy, like, because nothing's written as a disk, but we'll ignore that. So both of these are in-memory database, so they don't have this problem. For, right, so he brings up a good point, if it's on disk, depending on what the SQL query is, it would be possibly hard to do. Like, if you have a random function in there, then that screws things up. For most things actually you can reverse it pretty easily. Update, table set value equal value plus one, the reverse of that is value equals value minus one. So you just undo that, reverse that. For other things it's harder. Then it's like you have to, like, the update side of each, each like a query. He says you have to, says the make logical undo work, you'd have to rewrite the query to reverse the change that it did when you ran it the first time. Correct, yes. And I'm saying for most queries you can do that as far as like thinking off the top of my head. I think yes. If there's a random value in there, then that becomes problematic. Yes. And another thing is like let's say we have 10 transactions on the log, and since the flush of the pages is kind of a two-digit and a two-discretion, don't you know how many pages are flushed to the data? What do you say? So he said when you do a flush to the disk, he wants you to log records. And you say. The log records are already on the disk. The log records are on the disk, okay, yes. And the people, the actual data, they may be in the top of the hole, or they may be flushed to the data. Correct. To the DMS instructions. Yes. And like when you're doing it with you, how do you know which part is already flushed and which part is not flushed? Okay, so he said is say my transaction has committed and I flushed out the log records. Now I have my pages in memory that were modified by the transaction. Before I crash, I may or may not have written those pages out to disk. So now I come back and I need to decide whether I need to redo or undo. Right? Yeah. Do you want to do it or do you want to do it? It's the same thing. You need to figure out what's the right thing to do to put it back into the correct state. We'll cover that next class, right? You basically can figure out some bookkeeping you do to say, all right, well, actually the way you actually sort of cheat, I'm jumping ahead. You basically redo everything, right? And this is why you can do this with physiological logging and physical logging because I don't care what was there before, I'm just redoing, I'm just applying the change now, right? Logical logging gets really tricky to do this. So I'm saying nobody actually does this. Nobody has a logical logging as far as I know on a disk based system. It only shows up in memory systems that have deterministic scheduling algorithms for current show like Vulti B. Because they know the order of things ahead of time. It's like when you're doing a redo, it's kind of like doing all the blind right. He says, when you're like doing a redo, it's like doing blind right, yes. I don't know what was there. I'll just overwrite it with what I know it should be. Then we'll see this next class. You do all the redos and then you go back and undo. It's up to the point where you know a transaction did not commit successfully. So you essentially do three passes over the log to make physical logging, physiological logging work with right-hand logging. Okay, we'll cover that all next class. Okay, so again, the physiological logging is the most common approach. Let's look at an example of what these three logging schemes look like. So physical logging would be, I have a low level diff about all the changes that I made, right? So I'm actually storing inside of my page. Here's the offset of the value I want to store, right? I know exactly the layout of the data on that page should be how exactly I expect. I just do my right directly into this. Logical logging is just saying I just stored the query, and that's all I need to replay it. And undo it, you could reverse it. And then physiological logging is it looks a lot like physical logging, but now instead of saying the offset, I'm saying the object ID. So there's some object has this ID inside my page. I don't know where it is, but here's the changes I want to make to its attributes. And you sort of think of this is when we talk about slotted paging, right? I don't care where in my, what slot the object I want to modify is in my page, I can reorder that any way I want and I have that indirectional there for the slot array to tell me where to go find it. So we use this object ID, look in the slot array and say show me the slot where object one is, and then that's the diff that I'll then apply. So we didn't talk about index logging, but you essentially have to do the same thing now. Also, when you have table as indexes, because you want to be able to modify them and have them written out the disk as well, because they're going to be backed by a properful manager just like the table heap. So you have to store, in addition to all the changes you make, you want to store the changes to the pages for indexes and those indeed flushed out as well, okay? So what's one problem with right ahead logging? We said it was great, right? You turn all your rights to random rights into sequential rights. It has enough information for you to recover after a crash. But how do you recover after a crash? What do you do? You replay the log. How far back do you go in the log? You don't know, right? So say my database is running for one year, I come back, do I want to replay the log for one year? I have to, right, because I don't know. I may not know what's actually been written out the disk, because there's nothing in the right ahead log that says this dirty page made it out or not. The other thing I also point out too is this is actually the fastest because it's just blindly writing into the pages. This is the second fastest. I have a little indirection I have to deal with, but it's still not that bad. This is actually going to be slow if these update queries take a long time to run. So the way to think about this, say this one query took an hour to run. When I come back, I can rip through maybe the log that replays its changes at a low level on the actual pages themselves. Maybe that takes me five minutes to go do. But if my query took an hour to run the first time, it's going to take me an hour to run the second time. So if my log is essentially growing forever, then if I have a one year of logical logs, it's going to take me one year to replay this, potentially. So the way to overcome this is called checkpoints. So checkpoint is basically going to be a marker we're going to put in our right ahead log to tell ourselves all pages that were modified by transactions that were committed up to this point have been written out to disk. So then when you come back, all you have to do is figure out when the last checkpoint was, and then just replay the log after that checkpoint. I'm being a bit hand-weighted about this because that's what I'm saying is not entirely correct because we'll cover this more next class, but I just want to give you an idea of what this is actually going to look like, and then I'll set us up for Wednesday to see how we actually do checkpoints correctly without having to stop the world when we take one. So we're going to write our checkpoints out to, you know, out to disk just like the log and then any unmodified blocks get flushed out. We don't have to evict them from our buffer pool. We just have to make sure that they've actually been written out and we can mark them as clean. And then we have this log entry and it says we did it, took a checkpoint and that gets flushed out to the log too. So let's look at an example here. So this is our log record. We have three transactions, T1, T2, T3. So at down here we're going to crash, right? So we want to figure out what were all the transactions that were running at the moment that we took a checkpoint? So for this one we're going to do, say, we do a really simple checkpoint mechanism where we stop everything. We stop all transactions from modifying any pages, doing any updates, running any queries, whenever we take the checkpoint. And we only return now to execution when we know all the dirty pages have been flushed. So at this point here we've got to figure out what was going on in the system at the moment that we took a checkpoint and we can use that to figure out what has actually been written out to disk or not, right? So any transaction that's been committed before the checkpoint has been safely flushed. This arrow should actually point to here. So T1 committed before the checkpoint. So all the T1s changes we know are out on disk. So we don't have to do anything with that. T3 and T2 started before the checkpoint. So we got to figure out whether we actually should be committed or not. So in the case of T2, we know that we're gonna have to redo it because after the checkpoint we see a commit record and then this thing got successfully flushed because this is what we're seeing on disk after a crash. So we see T2's commit record on disk. We know, all right, we gotta go bring that guy and reapply as all its changes. But for T3, we don't see the commit record before the crash. So we know that we need to undo anything that changes that it made because this shouldn't be out on the pages that we wrote out to the checkpoint file. I shouldn't say checkpoint file. We're just overwriting our pages into the main primary storage on disk that we normally would. It's not like we have a snapshot that we're putting on the side somewhere else. We're flushing all our dirty pages into memory or add a memory into disk. So in this example, I said it's really simplistic because we're stalling all transactions while we take the checkpoint, right? And the reason why we do this because we make sure that, you know, because the checkpoint essentially is doing a sequential scan on the pages that are in memory and writing them out one by one. So this ensures that there isn't a case where a transaction may be actively updating pages at the same time we're scanning it and we only see some of those updates and not another. And that's another good example of why logical logging is difficult to do because if I'm just recording the update query and it's updating every single tuple and I take my checkpoint, I don't know if they're running at the same time. I don't know at what point the checkpoint wrote the changes for that update query. It may have got some of them but not the others, right? So the other issue we have too also now is in this case here, we basically have to go back in time above the log and figure out what are all the transactions that were running at the time I took the checkpoint. And then that way I know whether I need to look for stuff I need to undo later on, right? So I could have had T3 made a bunch of changes, right? And then an hour later I took a checkpoint. I got to go back up and look at the last hour to say what transactions could actively be running at the time I took the checkpoint. Now we'll overcome this next class because we'll actually store it in our checkpoint. Here's all the transaction IDs that were running at the same time I was. So we'll know what else is out there to help us figure out how far we have to go back in time in the log to figure out what we want to undo or redo. But for our simple example here, we don't have that. Another issue we're going to have too is how often should we take a checkpoint? And so there's no magical formula you can give to say how often you should do this, right? It depends on how comfortable the application may be with a longer recovery time. So the way to think about this is if I'm taking a checkpoint once a day, then if I crash, I know I need to replay at least the log for one day. But if I'm taking a checkpoint every minute, then I can replay the log from the last minute or last minute or so and then I'll recover right away. But of course now taking a checkpoint is not free because now I'm writing out pages to disk and that's incurring disk IO to take the checkpoint that I could be using to be flushing out log records, reading pages in and pages out for the regular transactions. So this trade-off of again, the recovery time versus the runtime performance. If I'm okay with a longer recovery time, I can run really, really fast. Hell, I can just turn off checkpoints entirely, right? If my recovery time is going to take a year, I can run really, really fast. But most people probably aren't okay with that. So the most systems actually implement two policies. You either just have a timeout and just say every five minutes take a checkpoint or you have one that says if I've added this amount of data to my log on disk, go ahead and take a checkpoint. Like if I've written out 100 megabytes of a log, then that's a good time to take a checkpoint. I think the second approach is better. Different systems do different things. And actually, this is basically what I just said, right? How often you actually want to do this will affect your performance, all right? And there's again, depends on the application, okay? So this is again, so this was all the things you have to do at runtime, right? We have to maintain a log or maintaining shadowpaging. We can ignore that because nobody actually does that. We need to maintain our log and our log is gonna have undo and redo information and that information could either be at the physical level, like the low little bits at different offsets or it could be at a physiological level and say this record inside this page, here's the changes for that. And then the checkpoints we use to be able to truncate the log to a certain extent and only have to replay what comes after the checkpoint to put it back to the correct state. All right, any questions? So another way to think about this, we've been talking about crashing here, or like unexpected crashes. This is sort of essentially what the data system does when you do a clean shutdown. So when you do a clean shutdown, the data system essentially doing flushing the log of any actual transactions and flushing out any dirty pages, right? So that's why again, if you're running MySQL or Postgres or whatever your favorite data system on your own laptop and your own system, you always wanna do a clean shutdown. You don't actually just wanna do a kill-9 because then you have to do recovery when we come back, okay? All right, so next class is probably the worst class. I say this because it's like, not that kids are crying in it, but like it's hard, right? Aries is the gold standard of how to do database recovery. It covers all possible corner cases you can imagine, all the different possible failures, except for again, the machine catches on fire. We don't care about that, but it's gonna be really, really hard to understand. So this is the hardest class. Next lecture is the hardest one, but this is the most important thing because we never wanna lose data. This is why you never wanna have your own application and start doing logging or covering it on its own because it's gonna be really hard to get this right. And Aries is the as a technique or method developed by IDM, DB2 in the 1990s that almost every single database and that does redhead logging will do some variant of this. So we'll go through all possible scenarios and describe exactly what we need to do to make this run efficiently and run correctly, okay? All right, guys, see you on Wednesday. That's my favorite all-bracket. What is it? Yes, it's the SD Cricut IDES. I make a mess unless I can do it like a Gio. Ice cube with the G to the E to the T. Now here comes Duke. I play the game where there's no roots. Homies on the cusp of y'all my food cause I drink proof. With the bus a cap on the ice, bro. Bushwick on the goal with a blow to the ice. Here I come. Will he eat? That's me. Rollin' with fifth one by the 12 pack case of a five. A six pack 40 act gets the real bounce. I drink proof but yo, I drink it by the 12 ounce. They say bill makes you fat. But saying eyes is straight so it really don't matter.