 All right, so let's jump right into this. So today, we're going to talk about physical logging. So we're going to look at ways to write information out about what transactions you're doing at runtime and doing it in such a way that if we crash, we come back and recover the state of the database. So now we sort of, like the last couple of lectures have been about doing OLAPI stuff, right? So doing processing queries, indexes, doing joins, and things like that. So now we're going back to OLTP and talking about the things we need to do to make sure that our database is durable. So for today's lecture, I'm going to start off talking about the different types of logging schemes you can have in a database system. And this is not specific to in-memory databases. These logging schemes you can do in disk oriented or in memory. It doesn't matter. And then we're going to do a quick crash course on crash recovery with Aries, pun intended. So in order to understand better the decisions that the silo guys are going to make in their recovery scheme, we want to spend a little time to actually understand how Aries works, which is sort of the canonical way that a disk oriented system does recovery. So if you've taken an introduction database course, you probably didn't go through all the nitty gritty details of Aries. We used to teach it here at CMU. I taught it about a year ago. But the kids were just sitting in the class, as I was teaching it, just having their eyes bleed because it's so boring and dry and very complex. So we're not really even going to go through the entire Aries protocol. I'm just going to give you the high level of what is actually going on. And then you'll be able to see why we want to do a in-memory optimized version of log game recovery in our system and not rely on what disk oriented systems do. So then we'll talk about what the paper you guys read, the silo R stuff, and then we'll look quickly at an evaluation of two benchmarks using their protocol. So it sort of goes without saying, but logging and recovery or how the database system is going to ensure that all the changes that transactions make will keep the database consistent, run with atomicity and durability guarantees, despite the possible fairies you can have. So obviously, if you have a transaction that executes and updates the state of some object in the database, if you get back an acknowledgement to our client that says our transaction succeeded, then no matter what happens later on, if the database system crashes, we want our changes to always still be there and persist. And this is especially obviously really important in an in-memory database system because obviously if you lose power, everything's wiped out of memory and you want to be able to come back and recover everything. So recovery algorithms essentially have two parts. So the first part is what you do at runtime as you're processing transactions, like during the normal execution transactions. And it's all the extra steps of maintaining log information and checkpoints and other things to make sure that, again, when you crash, you can recover. And then the second part is after there's a failure, or after there's a crash, you can look back at what you did when at runtime and, again, restore the database to the state it was right before the crash. So there's other types of recovery schemes in terms of replication, off-site, on-site, things like that. We're not going to cover that here. And obviously if the data center burns down and melts all your machine, no database system is able to recover from a storage media failure like that. So we're really talking about if the machine crashes, like you lose power, or the hard drive dies, and you restart it, that's the kind of recovery we're focusing on here. So in general, there are two types of logging schemes you can have in a recovery algorithm. The two types are physical logging and logical logging. So in the Silo R paper, they refer to this, I think, as operation logging and value logging. They're essentially the same thing, right? So operation logging would be logical logging and value logging is physical logging. And the main difference of the two of them is that in physical logging, you're going to store in the right-hand log the actual physical changes you made to the representation of the tuple in memory. So if you think about it, if they have like five attributes and I update one attribute, increase the value by one, I would store in my log record the image of the attribute four and the image of the value afterwards after my change. You're getting down to the actual bits or bites that you stored in the tuple, and that's what goes in the log record. In logical logging, you're going to store the high-level information about what the transaction was doing and not get into the low-level details as if it modified this particular tuple. So in the Silo paper, they refer to logical logging or the operation logging in terms of the actual SQL statements you executed. Well, we'll see on Monday next week there's even a higher level type of logical logging called command logging. You don't even have to store the individual queries. You actually just store what stored procedure you invoked, right? And so ideally, this is enough information that allows you to, during the recovery phase, just replay whatever the query was and restore the database back to the correct state. Yes. His question is why would you want to do logical logging or physical logging? So the obvious answer is that logical logging stores less data, right? So if I update a million tuples under physical logging, I have to have a million log entries for every single tuple I modified. In logical logging, it's just one update statement, and that's all I need to do. That's all I need to store. This question is, or your comment is logical logging is where you could just store a single command that could update multiple tuples. Yes. Same idea. So this sounds, as I just said, you can update a million tuples and only have to store one small log entry about what you did. So this sounds amazing, and you would think, why doesn't everybody actually use this? Well, it turns out it's super hard to implement recovery when you have logical logging. And part of the reason in a disk-based system, it's probably near impossible. In memory system, it's still very tricky to do. And part of it is it's not just you're executing these queries in serial order one after another. They're doing this concurrently while you execute other transactions. So if you're running at a lower isolation level than serializable, then you have to worry about what came before what. And I'll show an example of what I mean for it in a second. But even if you're running at serializable isolation level, the serializable isolation level permits different schedules of the same set of queries. So if you just flip the order of them, that's still in serial order, but now you may end up with a different state. A different state of the database when you recover. The key is that whatever the state of the database was and whatever we told the outside world was how the database was stored when we made changes, it needs to be exactly the same when we restore the database. And logical logging makes this difficult. So I'll show this example here. So say we have two transactions, and each of the transactions are going to execute one update query in our small little database. And let's say that these transactions are running under a lower isolation level than serializable. So say they're running under read committed. So what will happen is, say the first guy starts running, and he executes this query. So what we'll put in our logical log is just the exact same query that they executed. So then now the cursor starts scanning through our table, and applies the first change to the first record, applies the second change to the second record. Now let's say there's a context switch, and the second transaction starts running. Same thing. So it's going to execute one query that goes in our logical log, and then it can apply the change to increase joy salary to 900. Now our context switch goes back, and our first transaction starts running again. He comes down here. He's allowed to read the change, because it's committed, and then he'll apply the change to increase it by 10%. So let's say now there's a crash in the system. All this is in memory, so this got blown away. Our log is durable on disk, because we flushed it when the transaction said they wanted to commit. But now if we come back, and if we replay this in serial order as it exists in the log, we would end up with 900 instead of 990. Because we don't know that when we execute the queries at runtime the first time around, that this guy read these tuples, and this guy read that tuple. They're not capturing that information in there. And if we added additional information to say, all right, I executed this tuple at this time, I executed my query on this tuple at that time, you're essentially doing the same thing as physical logging. You're storing the low level information about what every query is doing, every transaction is doing. So that's what I'm going to answer your question. So logical logging is phenomenal because you store less information, but you end up with an inconsistency issues on recovery here. So the only database system that I know of that does logical logging is VoltDB. And VoltDB can do this because it uses that single-threaded HStore concurrency code protocol that you guys learned about before. So now you have a deterministic system where you can guarantee the order of operations every single time you recover from the log. We're not going to talk about replication so much here, but using logical logging is also a big problem if you want to have master-slave replication. Because let's say ignoring recovering the database from the log later on, but you had the same problem where if you're replicating the logical log to another node, the master node may execute it in one order, and then the slave node would execute it in a different order. And now you would have inconsistent, unsynchronized copies of the database across two different machines. So typically, everyone does physical logging because you avoid this problem, although you pay a penalty for having to store more information. Yes? This question is, if you use snapshot isolation, does this problem go away? Under snapshot isolation, this guy would abort because first writer wins. So he would get here, realize that somebody already wrote it, and he would get blown away. My intuition is no, but I have to think about it. There's other anomalies that can occur too with logical logging. Yeah, we should think about that later. OK. So now I'm going to describe ARIES, which is the gold standard of doing physical logging and recovery in a discordant database system. So ARIES stands for the Algorithms for Recovery and Isolation Exploiting Semantics. So this was developed in the late 1980s, early 1990s, at IBM Research in Almedin, which is their big database lab there. You have to realize that before ARIES, there was existing databases that all had their own recovery methods and protocols and things like that. But this ARIES paper here is the first one that really laid it down in methodical detail of exactly how you deal with recovery in a disk-based database system. And so pretty much everyone now today implements some variant of this ARIES paper. And it's an amazing piece of work if you have the free time to read it, but it's like 70 pages, it's super long, and that's why normally I wouldn't assign it to you guys to read. So the key thing about ARIES is that it's going to rely on using a steal and no-force buffer pool management policy. So steal, again, means that the buffer pool manager is allowed to evict dirty pages that were modified but uncommitted transactions out the disk while the transaction is still running. And no force means that we don't require a transaction to flush all the dirty pages that it has in its buffer pool out the disk when it commits. So instead we'll flush the right-of-head log that contains all the changes that the transaction made. So ARIES is actually not just recovery. So there's actually a series of papers that all follow under the ARIES moniker. So the key value logging stuff that we talked about before when we talked about index locking, that's from the ARIES paper called ARIES Key Value Locking. So the lead guy that did a lot of this early stuff in ARIES, or did this work in ARIES, was a researcher at IBM called Mohan. And he's now an IBM fellow. So he's a very distinguished researcher at IBM. So this is all his brainchild of all this ARIES stuff in the very early days. I know Mohan personally. He's super awesome and super friendly. So this is actually a picture of me and him hanging out in New York City a few years ago when I was still in grad school. He's a funny guy. He's super smart. But he likes to party. And in this picture here, we're hanging out at a bar and then we went somewhere else afterwards. And you know he's out ready to party because he has his shirt open and he's fluffed out all his chest hair. So this is what I do when I think of chest hair when I think of Mohan. So what are the main ideas for ARIES? So again, we're not going to go in excruciating detail exactly how ARIES works. We're not going to talk about the recovery protocol of doing, you're compensating law records and things like that. This is just going to a 10,000 foot overview of how the protocol works. And we'll see what the main bottlenecks are and we'll see how the silo guys can avoid them because they're done with an in-memory database system. So the first key thing is that we're going to use right-ahead logging. And this is going to be storing all the modifications that a transaction makes to in-mary pages into this log, in the special log record. And then when a transaction commits, we were required to flush all the log records that correspond to that transaction out the disk before we can go back and acknowledge to the client that the transaction was committed. But again, because we're using no force buffer manager policy, we don't require it to flush the dirty pages. Only the log is that tell us how we modified those pages. Then on recovery, the key thing is that we're going to be able to replay the log and redo any changes that the transaction has made before the crash so that we can guarantee that all of those changes get put in place. And unlike the silo case, they're going to go back in time and apply the changes in areas you go forward in time. So you find the last checkpoint you took, and then you replay the log going forward in time. And then you'll do an undo and roll back some changes for transactions that haven't committed. But the key thing about what makes ARI special is that they're going to be able to store in the log notifications or log records that correspond to the recovery protocol itself. So as you're doing the recovery, if you crash, you don't want to come back and try to redo things multiple times and end up with a corrupt database. So as they do on recovery, they can store what they call compensating log records that allow them to say, I made this change during the recovery protocol. So in case I crash again, I know I don't need to do it. I know how to roll it back a little bit to get me back to where I actually really want to be and restart the log recovery in its entirety. That's sort of one of the big things that you have to deal with in a database system, because you're writing changes in memory and they may get flushed to disk. And you want to make sure that the things are on disk that are consistent with the sort of where you left off during the recovery process. So the way they're going to organize all these log records and keep track of what the correct order to use to apply them is to use log sequence numbers. So at runtime, as we're logging, any modification we make in the database will append to the log record in memory, and then when we commit, we'll flush it. And I guess we're skipping ahead. And then for checkpoints, we are going to use what are called fuzzy checkpoints where we're allowed to write a snapshot of the database that may actually be inconsistent, meaning we may see changes that are modified, or may see tuples in pages that are modified by transactions that haven't committed yet, or we may see half of their changes and not all of them. So there's a bunch of extra metadata we're going to have to keep track in our system to understand what was the internal state of the database in memory at the time we were taking the checkpoint. And we're going to use the checkpoint to speed up recovery, because what will happen is after a crash, you load in the last checkpoint you did, and then you replay the log from that checkpoint going forward. Because otherwise, without the checkpoint, you have to replay the entire log, which could take days or forever. So the main thing I just want to point out here, as we take a checkpoint, under ARIES, we have to keep track of what transactions are running at the time we're taking the checkpoint, and we have to keep track of all the different pages that are in memory that were dirty at the time the checkpoint started. Under Silo, you don't have to do this. And the reason is because under ARIES, you have this issue of that the database exists in two different places at the same time. You have pages in memory that can be dirty that contain the latest changes made by transactions, and then you have the database on disk that may have flushed dirty pages from uncommitted transactions, or they may be committed. So we have to reconcile what was in memory and what was in disk, and therefore, we have to keep track of what are the dirty pages that are sitting around in memory when we take the checkpoint. Silo doesn't have to do this, and any in-memory database doesn't have to do this when you take checkpoints, because you lose everything in memory. Who cares? And then you can load the last checkpoint and replay the log. And that makes a big difference in terms of the amount of metadata and amount of overhead you have to do to organize all these things. So I mentioned this earlier, but the way we're going to keep track of the order of the log records we put in the right-hand log are through these log sequence numbers, or LSNs. So these are going to be globally unique IDs that we're going to allocate and assign to every single individual log record. And the log records, again, aren't just this transaction made, this modification. It's all the additional internal operations that are occurring at the same time as well. Like when we start a checkpoint, when we end the checkpoint, any time we're doing recovery, we log all the changes during the recovery process. When a transaction begins, commits, or abort, all these things have to get a log sequence number. And we have to make sure we write these things out in the correct order. So a globally unique ID for every log record, what does that sound like in other parts of the database system that we talked about before? The transaction ID, right? So under transaction ID, when we were doing concurrency control, maybe we were only running a couple 100,000 transactions per second. And therefore, the transaction ID allocator didn't really become a huge bottleneck. But under Aries, if now you have 100,000 transactions running a second, they're all executing a ton of different operations per second, plus all the internal stuff you have to log, the log sequence number actually becomes a big bottleneck. It's actually a big problem in a disk-based ecosystem, if you're trying to get really high throughput. This is part of the reason why the silo guys avoid this entirely and use those batch transaction IDs through the epochs. So just to give you an idea of how much different, how many, all the different ways we can use the log sequence numbers, I just want to show you an example real quickly. We're going to have to maintain not only the different log records have to have a log sequence number, but every single page in our buffer pool is going to have the log sequence record that corresponds to the first and the last log record for the transaction that modified it. And then we'll also keep track in the buffer pool of the last page, the log sequence number, the last page we wrote out to disk, as well as the master record on disk that says, here's the log sequence number to roll back to the last checkpoint. And so at any time that we want to flush a page, we have to go check to see whether the page LSN is less than the flush LSN, because we don't want to flush anything out the disk that has been already logged. So again, this is just a quick high level overview of what's going on in areas. This gives you ideas of how complicated it is. The first is that we have our in-memory tail of the log, and these all have sequentially ordered log sequence numbers. Then out on disk we have the rest of the log, and again they all have unique log sequence numbers. Then for our pages, we have the page LSN, which corresponds to the record that last modified it in memory up here. Then we have flushed LSN, and then this corresponds to whatever the last modification was made by some transaction that we flushed out. And then we have the master record that says, here's the last checkpoint that we took. So now we want to decide whether we can evict this page. We have to check whether the page LSN points to a record that we know is less than our flushed LSN, and therefore we know the transaction that modified it is safely had been logged. And if it is, then we can evict it. There's a bunch of other stuff we're doing too as well, like in terms of checkpointing, keeping track of the dirty pages and the actual transactions. I'm ignoring that in its entirety. Just kidding. I want to show you how much things we have to keep track of when we do disk-based recovery. And if you remember back in the beginning of the course, I showed you this graph, or this pie chart here, where I was showing that the recovery mechanism of a disk-based database system is taking 28% of the time. This is not doing the disk flush, sinking out the disk. This is simply just doing all that extra management of metadata and LSNs. So this is why, again, we want to avoid all of this when we build a high performance in memory database system, because we don't want to pay this 20% overhead. So any questions about disk-based stuff? We have to know what's in memory, we have to know what's on disk, and we have to make sure that we have log sequence numbers for everything. For recovery, Aries basically has three phases. And you'll see how this differentiates with the silo stuff later on. Essentially, you have to do three passes through our right-of-head logs. We jump to the last point in a log that corresponds to the last active transaction when we ran the checkpoint, and we scan forward and figure out what was going on. Then we scan through again, and from the changes that we know we want to apply to our database, we redo all of them from beginning to end. And then we go back in reverse order, and then we roll back any changes from a transaction that shouldn't have committed. And again, as we're doing all of this and this, we're going to log every single redo and undo step in these compensating log records so that if we crash during recovery, we can still recover ourselves correctly. So again, this is way more complicated than what the silo guys are proposing and what the Voltebe guys are doing on Monday. OK, so regardless of whether you're doing have an in-memory database or a disk-bit database system, the slowest part of using a logging scheme is always going to be the flushing to disk. It's always going to be when you call the F-sync to make sure that the changes you wrote in the log have been safely stored on your non-volatile media. And so the reason why it's the slowest is because we want to make sure that if we tell the client that our transaction is committed successfully, that if there's a crash, we can always come back to it. But there's a couple of ways that we can get around this that have been proposed in disk-based data systems. But they're going to show up again and we can reapply them when we talk about the in-memory stuff. So the first idea is this thing called group commit. And you might have seen this referred to in the silo paper. But this is an old idea that came out in the early 1980s. And it's pretty easy to understand. You basically, if you have multiple transactions running at the same time, you buffer all their log records and then you batch them and do a single write out the disk and do one F-sync. As opposed to, for every single transaction, when it finishes, you take their log records and F-sync that one transaction and then everybody else has to wait behind them. So this idea was originally developed in this extension to IBM's IMS system called FastPath. And truth be told, FastPath is sort of like their in-memory version of IMS that can do transactions very quickly. Same with the Hecaton as a table extension or storage engine extension to SQL Server. My understanding is FastPath is an extension to IMS. But I can be honest, I actually don't know exactly the nitty-gritty details of what FastPath actually does. There's all these white papers in the early 1980s that describe it, but it's very complicated to read. And it's like a lot of mumbo jumbo. And then now when you go read IBM's web pages about what IMS FastPath actually does, it's all again more like corporate speak and all mainframe stuff that I don't fully appreciate. But my basic understanding is it's in-memory transaction extension for IMS. So what GroupCommit gives you is that you're going to amortize the F-sync costs across multiple transactions. So it'll improve the average latency for your transactions. But it may actually affect the 99%ile. Because whatever the first transaction that comes to the queue, it has to wait for some period before it gets flushed out and you send back the acknowledgment. So for the first guy, it's terrible. For the last guy in our batch, it's just the same as if we just F-sync right away. So the two basic ways you can implement GroupCommit are with a timeout mechanism. You say, basically, it's been 10 milliseconds. I'll take whatever's in my log buffer and I'll flush that now. Or you can say, when my log buffer gets full, no matter whether it's been 10 milliseconds or not, then I'll do my flush. The other optimization we can apply is called early lock release. And the basic idea here is that typically what you would have to do is when a transaction wants to go flush its log records out the disk, it technically has not committed yet. It's not fully committed until we know it's safe on disk. So rather than having the transaction hold all its locks while waiting for that F-sync to complete, we can then just release all the locks and assume that the transaction is going to commit successfully after it finishes the F-sync. And therefore, we'll let other transactions start running and grab whatever locks that the first guy was holding and apply other changes. So the key thing about this, though, is that we want to make sure that we don't expose any information about the transaction that's waiting to F-sync to the outside world before we know it's fully durable. So let's say we have a first transaction that makes modifications and then it goes to flush its log buffer and so it's waiting. We have a read-only transaction that then does a read, reads the changes from the first guy, and then when it commits because it's read-only, it doesn't have to flush anything to the log buffer. We don't want it to send a acknowledgement back to the client because, again, that may expose data or changes that haven't been fully committed yet. So for this, we have to track the dependencies between these different transactions if we do the early lock release optimization so that the read-only guy would just have to wait until he knows that the first guy in front of him completes successfully. OK. And again, we can apply these same changes for an in-memory database as well, not just a disk-based database system. Yes. So your question is, I have my first transaction. He made modifications. I'm flushing the log and you're saying if the F-sync fails, the disk dies, the whole thing's going to crash. Yeah, it's like, your question is, so the first guy wrote some stuff. And then we're going to F-sync him. He releases his locks. The next guy comes along and he overwrites the thing from the first guy I wrote. So we make the assumption that when the transaction goes and says, I'm flushing my log, it's either going to succeed or the whole thing is going to crash. I mean, think about it. You wouldn't want this weird, like, eh, I sort of got some stuff in, but not really, right? It has to be all or nothing. OK. So now we can talk about in-memory database recovery. So the important thing to understand is we don't have to worry about dirty pages, because we don't have a buffer pool. And we don't have to record anything in our log to do undo. So again, remember that in the Aries case, you have to store the redo and undo information in your log, which basically, if you update it to tuple, it's the before value and the after value. Because a page may get flushed out the disk that has modifications from a transaction that has it committed and we need to know how to roll that back if later on we find out that it aborted. But in-memory database, if a transaction commits and flushes through the log successfully, we're done. We never need to roll it back, because we now told the outside world that this transaction committed successfully. So now in our log, we basically can store half the information potentially that we would have to do if we're using physical logging, then you would do in Aries, because you only need to store the new image and never the old image. Now, when you run transactions at runtime, you're still going to store undo information in-memory in case you need to roll it back. But then again, you're just rolling back the contents of memory. We don't need to make sure anything's still durable. But the main challenge we have to overcome in an in-memory database system is that even though our transactions are going to be super fast, because everything's in memory, we still had that same bottleneck of when we do the f-sync to the log that a disk-based database system is going to have. So even though we're storing less information, even though we can run transactions more quickly, we're always going to be bottlenecked on that log. So there's not really that much you can do to get around it. So the one thing I'll point out though, there's a bunch of papers from the 1980s for the early research that was done on in-memory database systems where they assumed that there was going to be non-volatile memory that allowed them to get faster f-sync times. So now think of non-volatile memory as it's like DRAM, it's byte addressable, you can read and write to it just like as if it was just regular memory. But if you pull the plug, everything is still persistent and durable like an SSD. So one way to approximate something like this is that you can have battery backed up DRAM, so you can have like a supercapacitor on your DRAM so that when the power gets pulled, you have just enough charge in your capacitor to be able to flush everything out to storage. You can buy that today, but it's not really widely used. You can imagine also like a UPS that could recognize the power's getting cut and it sends a notification to the servers to tell them to flush things out to memory right before everything goes down. But if you get an instance on Amazon EC2, you're not going to get anything like that. So we'll talk about non-volatile memory later on in the semester because unlike in the 1980s, if you read their papers, they say, oh yeah, we think non-volatile memory is only a few years away, right? 30 years later, we still don't have it. But I think we're actually on the cusp of actually having this hardware available. I would say in the next five years, you can probably buy non-volatile memory and use it in production systems and maybe something you can get on Amazon. Yes. So this question is, even if it comes available in five years, we'll have the capacity we would need to use it in memory database. And I think, yes, from my conversations with the Intel guys and other places, this is finally happening for real, right? I remember the big announcement in 2008 was when HP announced that they discovered the Memrister, the Stanley Williams guy made a big speech at UCLA. And I remember him saying, it's going to be in production. You're going to have it in two years, right? And that was 2008. 2010 comes long. It's two years. 2012 is two more years. It's always two more years. It's finally getting really close. So within this year, Intel's version of non-volatile memory, it has some name 3D crosspoint, that's coming out, but you're going to be able to get it as an SSD device, like a PCI Express card. So that's not going to look anything special. That's not going to be what we're talking about here. The really is fits in your dim. It's non-volatile. There's OS support to flush things out. I would say that's in the next five years that it's going to be actually. And I think it's going to have really large capacity. But we'll talk about how you use non-volatile memory in databases later on in the semester. But our intuition for the research that we've done is we think it's going to require a pretty significant change to the database system architecture in order to use it correctly. And I think that in-memory database systems will be better positioned to use it better. Whereas in the disk guys, if you just treat it like an SSD ram or just treat it like an SSD, you're not going to get the huge performance gains you can get. So that's sort of on the side. Basically, I just want to say that if you read the recovery papers in the 1980s for in-memory databases, they assume that there's going to be stable non-volatile memory, but it doesn't exist yet for us. So we're going to look how to use the existing storage devices we have now as SSDs and spinning disk hard drives. OK, so now we get to SiloR, which is the paper you guys read about taking Silo and adding recovery mechanisms to it. So the key thing to understand is that the paper you guys read before on Silo's OCC mechanism or OCC protocol, that paper had logging in it, but it didn't have checkpointing and didn't have the recovery protocol. So this is expanding all the system to be able to support both checkpoints and full recovery. So the key thing to understand how SiloR is going to work is it's going to rely on that same epoch mechanism they used to assign transaction IDs in batches. And then the other key aspect of it is that it's going to try to paralyze all aspects of the recovery protocol, but that runtime and during reloading things by having things run on as many threads as possible and using as many storage devices as possible. Again, this is the work done by Eddie Kohler. He's a beast. He's probably one of the best systems programmers that are out there today. He's not a database person, but we still love him nonetheless. OK, so we'll go through the logging protocol. We'll go through how they do checkpointing. And then we'll talk about how they do recovery. And then we'll finish up looking at some benchmark numbers that they have. And I'll say to you that this is not the, I'm not pointing to Silo's mechanism that is saying this is exactly how you want to do in-memory database logging and recovery. This is just what I consider a state-of-the-art example of one way to do it. The hyper guys do something different. Hackathon does something different, right? I'm not saying this is the only way you can do this. I just think that the paper sort of lays out nicely all the different design decisions they had to make at different points in the system. And they discuss pretty nicely the different trade-offs and why they made the decision that they made. So I think it's a nice paper that you could use to say, or if I want to build sort of the same thing, you could use that as a blueprint. OK, so under Silo, they're going to assume that for every CPU socket that you have, you're going to have a dedicated storage device that you're going to use for logging and checkpoints. So you're going to have one log or thread per socket that's going to be responsible for doing all the writing out to the log. And then you're going to have additional worker threads that are grouped together on the socket that also operate on the shared database. But then they write all the log records just to that one log or thread that's local to them. So as the worker exits the transaction, of course, we're under OCC, so all our changes are going to be made in a private workspace and not exposed to the main database. But then as we make changes to our private workspace, we then append our log records to a log buffer. And again, we don't need to store the undue information. We're only going to store the redo information. And they're doing value logging or the physical logging, which is just saying this attribute now has this value. So for the log or thread, it's going to maintain a pool of log buffers that it can use to give out to the worker threads. And it's going to recycle them as it flushes things out. So you hand out a log buffer, the worker thread gives it back to you. Then the log or thread writes out the disk. And then when it's done, it clears out the log buffer and puts it back in the pool so it can be reused by another thread. So what happens is when the worker's buffer is full, it has to give it back to the log or thread, which then writes it out. And then it tries to acquire another one. If there's no free log buffer available to it, then it just stalls and waits until it's notified that it can proceed. And they do this because it's a back pressure mechanism to make sure that the log or thread doesn't get to a point where it can never keep up with the amount of log records that the worker threads are generating. Because what you don't want to happen is you don't want these buffers to get growing indefinitely. And then you run out of memory. Or you have to stall the epoch transitions until you know that the log or thread has written everything out that it needs to write out. So by blocking the worker threads from requiring a new log buffer, it sort of adds that allows you to add a governor to how fast they can produce updates. So now out on disk, they're going to create new log files over time for each log or thread. And in this case here, they're going to do it after 100 epochs. There's other ways that you can do this, like if the log fog is too big, you can rotate it and things like that. But the idea is that when they create a new log file, they rename the old one and keep track of what epoch, the highest epoch that that log file contains records for. And you would do this so that you can make it easier to truncate files. So if you know that you have log files from a week ago, you're not going to need to leave them around, so you can go easily and prune them out. So you actually see this. This is not actually a new idea. This is actually done in other database systems. So this is actually a screenshot of a MySQL installation that I have access to. And you see right here, there's two file names, IB log file 0, IB log file 1. And this is the same thing that Silo's doing. So every time you make a new log record, a log file, you just rename the old one. And you keep sort of having history go back in time. So now again, as long as I take a checkpoint and everything's safe, I can blow away this file and save space. So this is not a new idea. They just sort of do this to make it easier to manage the system. And then the actual contents of each log record is going to contain the ID of the transaction that modified the record. So remember, under OCC, you don't get a transaction ID until you start the validation phase. So in the beginning, as they're creating the log records, they just put a placeholder in. And they know that when the transaction goes to commit, they have to go back and update those records. And then there's also going to be a set of value triplets that are keep track of the table that was modified, some kind of primary key identifier for the tuple that was modified, and then a sort of attribute value pairs of the objects that were changed by the transaction. So a quick example here. So say we have a query here that's doing update on the table people. And it's going to update tuples for join myself. And so the log record would essentially look like this. We have some kind of transaction ID at the top. And then we have two entries for the people table, two different, some kind of internal record IDs for the tuples that were modified, and then the actual changes that were made. So this looks a lot like the delta versioning stuff we talked about in MVCC. It's the same basic idea. We're starting a subset of the changes that are actually made. We don't have to store the entire tuple. So let's go through a high level run through of all the different steps of how we're going to do logging. So for this example here, I'm only going to have one storage device, and it's going to have a bunch of log files that the logger thread is maintaining. We'll have one logger thread because we're running on one socket, and it has two different buffer pools, one for the free buffers and one for the flushing buffers. And then we have one worker thread. And of course we have our epoch thread that's running somewhere else in the system, but we don't really care. So when a worker thread gets a new transaction request, it's going to get handed off to a worker thread, and it's going to start executing. So remember that Silo only executes store procedures. I think they call them the one-shot API. Store procedure, again, is just a function call that contains the program logic for some operation we want to perform in the database system that's intermixed with SQL statements. So this thread starts running it. And as it sees all the changes it wants to make, it applies them into its private workspace, but then it starts pending them to its log record buffer that it's going to get from the free buffer pool from the logger thread. So when it goes to commit, it applies the changes to the shared database. It then updates the transaction ID in the log records. And let's say, in this particular case, the log buffer is now full because we made all the records we wanted for this transaction. It took up the whole space. So then we hand it back to the logger thread. It goes in the flushing buffer pool. And we start executing another transaction. And we'll go get another log buffer. And we do the same thing. So let's say, again, this guy fills it up. We hand it back to, if we change the epoch, then we have to throw it back to this guy as well. We hand it back to the logger thread. It now goes in the flushing buffer pool. We then try to get another log buffer, but we see we're empty here. So our transaction thread, or worker thread, has to stall. In the meantime, the logger thread is going to flush all the changes it has in the flushing buffer pool in the order that they were modified out to these different log files. And once we know it's done, and it's f-sync, and it's durable, then we can put those buffers back in the buffer pool. And we notify this guy that they're done, and he can pick the next one up and keep on running. It's pretty straightforward. So we're going to have multiple sockets and multiple storage devices. We need to do an extra layer above it to coordinate exactly what the different changes were writing out to these files. And this is because we're doing what's called distributed logging. So under areas approach, the canonical approach, you have a sort of centralized log. It's one log file that everyone has to write to, and therefore you don't have to worry about having some changes at some point in time in one file, some changes in another file. You can go back to that one log and know exactly that you're looking at everything in serial order. Under silos, as they want to have multiple storage devices, then you need a way to keep track of what's the sort of global state of the durable transactions, the durable epoch, across all the storage devices. And they're going to do this by keeping track of what's called the persistent epoch. And this is going to be a special log file that contains the highest epoch that is durable across all the other storage devices. So you can use log file as simply just a number that gets appended one by one by one. And you flush that, and when you come back, this is the last epoch that's durable. So what happens is a transaction, when it completes, and it hands off its modifications in the log buffer to its logger thread, the logger thread may flush that and that's now durable on disk, but we still cannot send an acknowledgement back to the application until we know that the persistent epoch file has been flushed as well that says that the epoch that the transaction belonged to is now durable across all devices. Because again, what you don't want to happen is you don't want to say that an epoch completed on two out of the three storage devices, the last guy didn't finish, and then we crash, and then we have only part of our changes, one third of our changes that transaction made now get reapplied to the database. So this is, again, this is a global counter that keeps track of exactly where this isn't left off across all the different storage files. So let's expand our example that we have before. So now we're going to have three different storage devices running on three sockets, and on disks we're going to have the different log files. So then we have our three logger threads that are each assigned to each of these storage devices, and then we have our worker threads that are making changes and passing them off to the logger thread. And so what we'll do is we'll designate one logger thread to be special to handle the persistent epoch log file. And again, we just pick whatever storage device we want to store it on, we can store it there. It doesn't matter which one it is, we could replicate if we wanted to, but we just need to know that it's there. So what happens now is, let's say the epoch changes, and all the logger threads start flushing out their buffers, out the disk, and as they do it one by one, they notify this thread here to say I flushed epoch, in this case 200, and once it gets an acknowledgement from all the threads, it knows that this is now the persistent epoch, and therefore it can write it to the persistent epoch log file. It does an F-sync there, and once that's durable, then it can notify all the logger threads and say epoch 200 is now considered to be persistent, and therefore everybody can release all the transactions and we can send back the acknowledgement. So what's sort of one downside of this versus the areas approach before when we were flushing the log files for those transactions? So the statement is yes, you're relying on the slowest one. So if you have a straggler and an F-sync, so it takes a longer time, then the whole process gets slow. Beyond that, you're also essentially doing two F-syncs for every transaction. My transaction updated things here. My logger thread did an F-sync, but then I still have to wait for this guy to do an F-sync as well. So you really need to be able to use really fast storage devices to make this work. If you're using a spinning disk hard drive that has maybe a 5 to 10 millisecond F-sync latency, that's bad. If you have, say, an SSD that you hit a garbage collection pause while you flush it, while it reorganizes the pages, that can be a long time. That's bad. So you'll see why they used the storage devices they used in the evaluation. Yes? Yes, so the statement is I said it's two F-syncs per transaction. Yes, it is two F-syncs per epoch. But the transaction belongs in the epoch anyway. So the transaction still has to wait for this guy to F-sync and then this guy to F-sync. So his statement is would you have a lot of burstiness in this? Because you can imagine you queue a bunch of things. They're all waiting for this guy to do an F-sync. And then when it releases, all these log buffers become available now. And then all the transactions start running again. So it's like all of a sudden, the gates open and all these things come out. I want to say no, because in this case here, the log or thread can F-sync any time it wants, right? When it gets full, it just can F-sync right away. The transaction will get queued up. And you're still waiting for this guy to finish. Yeah, I guess you're right. Yes, you can imagine that. Because if this guy is F-sync, F-syncing, and you're waiting for a straggler for this guy to get back up, then all of a sudden, this releases the latch. It says, now you can push out, acknowledge for all the other guys, and then everything comes all at once. They're doing this, though, in like a, I think the epochs are in 40 milliseconds. And then when we look at the evaluation, they're doing roughly the average latency, I think, for transactions are 100 milliseconds. That's not too bad. Your database systems do nothing for 10 seconds. And all of a sudden, you just get all of these acknowledgements, right? It's happening so fast, it's not really going to be too bad. You're not going to have huge oscillations. Yes, the statement is if this guy does an F-sync, clears up the log buffers, but it hasn't, the persistent guy isn't F-sync, yes, you can release them and give them back to the transactions. Yes. Can't say it's committed. You can't send in the acknowledgement, but you can reuse the log buffer. And I think they even talk about in the paper that to reduce the burstiness, rather than waiting for the epoch thing to finish, it's better to just give back the log buffers right away. That sort of smooths things out. And then they also recommend, too, that the log buffer should be about 10% of the total amount of memory that's available to the system. I actually don't know what other systems do, like when we built H-Door. We just took whatever memory was available and logged it. Although we're doing logical logging, so we didn't have to store as much. But I don't know what M-sql or Time-Send-Do for this. So his question is, why do you have to persist this epoch here? Why couldn't you just go back and check to see what actually was flushed? You need this because you need to know when you can send the acknowledgments. And that's the key. The statement is, couldn't this just be stored in memory? Why does it have to be persisted to disk? Yeah, it's what you're saying, yeah. I don't remember what it was. I know there's an answer, but I forget why. Yeah. It might be for recovery. Because you could just process and say, what's the max of everybody that's already there? That's a good question. I don't know the answer. OK. So now let's talk about checkpointing. They're going to have one checkpoint thread per disk, the same way we had one logger thread per disk. And basically what's going to happen is they're going to use some approximate range partitioning across the entire table or the databases. And every checkpoint and thread is just going to do a sequential scan for all those tuples and write them out to disk. But one of the things that they'll do to reduce the amount of data you have to write is, since they know the epoch when the checkpointer started, they can ignore any tuple that they come across that has an epoch that's in the future for when the checkpoint started. Because at that point you know that since the epoch has switched, you know that the log record for the modification to that tuple, to get back to the version that it was before, the version that it's looking at, is durable on disk. So therefore we don't have to log it. Or we don't have to write it into our checkpoint. So they say this reduces about by 20% the amount of data they have to store per checkpoint. The other thing that they'd be able to handle though, these are again obviously fuzzy checkpoints, we've got to be able to handle the case where a transaction has committed and it only applied half of its changes to the table when a checkpointer thread came through it. It would just be mindful on the recovery process of how to rectify that and make sure that transaction changes are durable and atomic. So one side real quickly, so we're running out of time. But I'll just say that doing a checkpoint in NVCC is actually really easy because you already have the older versions available. It's the same sort of thing as in snapshot isolation. So if you have your checkpoint start like a regular transaction, you know that you can get a consistent view of the database as it exists in that particular time and it makes it really easy to flush everything out in the snapshot. And we'll see on Monday, BoltDB is actually not, by default, a multi-version system. It does in place updates, but when you take a checkpoint, it actually switches into this multi-version mode or copy and write mode where all the older versions of tuples that existed at the time the checkpoint started still sit around in memory so that the checkpointer thread can go through and find them very easily. But we'll see more about that on Monday as well. So another side we can talk about too is the frequency in which to take checkpoints. So the reason why we take checkpoints, again, is to reduce the amount of time it takes to recover the database from the log. But if we're not careful if we checkpoint too often, then it may end up degrading the performance of the system. So a lot of times in database systems, there's a trade off between checkpointing all the time because you want to make sure that if you crash, you can come back up very quickly versus not checkpointing that often so you don't slow down the regular execution of transactions. In the case of VoltDB, when you turn on checkpointing and logging, it hurts performance by roughly around 15%. And in the silo case, I think it's about 10%. So in the silo system, they're basically doing checkpointing all the time. So you checkpoint a thread, checkpoint a thread's run, they create the checkpoint, and then when it finishes, they wait 10 seconds before they do the next checkpoint. In VoltDB, for example, I think the default might be checkpoint every five minutes. Some customers turn it down to be every minute. Some customers only do it for every hour. It depends on what you actually want. The other thing I'll point out to is in the case of silo, they're focused on being able to recover very quickly on a single node. And if you had what I call a mission-critical system, or a production application where you need high availability, you probably wouldn't need to checkpoint that often because you would use a replicated setup. You would have your master node, and it would replicate to some other slave nodes so that when the master crashes, the slave node can pick up where it left off and become the new master, and now your database is still up and running. So you don't have to go through the recovery processes at all. But again, there's a trade-off as many things in database systems. Doing a replication scheme to guarantee that you have strongly consistent replicas is probably just more expensive than doing checkpoints and logging. So there's sort of no way to get around paying this overhead of ensuring durability. You're either going to pay it in replication, or you're going to pay it in logging and checkpoints. But I don't think anybody's really done a study to see which actually is worse. But again, in most production systems, you're going to run replicas, and you maybe not checkpoint as frequently as the silo guys do. So another observation I'd like to make is if we assume that we're running in the environment that we talked about in OLTP systems where you're going to have some portion of the database that's going to be hot, and that's going to get all the new updates, all the modifications, and then you're going to have the other larger portion of the database that's going to be cold, and it's not going to receive as many modifications, then maybe instead of taking a complete snapshot the same way that silo does of your database every single time, instead what we want to do is just keep track of what blocks in memory actually got changed since the last time we ran the checkpoint, and only write those things out to disk in our checkpoint, because we know the other stuff hasn't been changed. And instead what we'll do is we'll have a pointer or some kind of marker that says, for these blocks here, if you go back to this checkpoint on my disk, you will be able to find the latest version of them, because they didn't change since the last time I ran the checkpoint. So these are sometimes called delta snapshots or delta checkpoints. Because I'm going to take a guess why this actually might be a bad idea. The statement is you can't do garbage collection on checkpoints. Yes, even more broader than that, though. So garbage collection in checkpoints is easy, right? If you can do something like if I have 10 checkpoints, I create the 11th one, I'll delete the oldest one, because I know I'm still consistent. But a lot of times, for regulatory reasons, they keep around all these old checkpoints so they can roll back in time to get a previous version of the database. So that doesn't fully answer the question. I'm going to take a guess. So it's not a scientific reason. It's a software engineering or management reason. So there's this religious dogma in database systems where you want the things that get stored in the disk to be independent of other things on disk. So what I mean by that is you want the checkpoint you take to be self-contained and not rely on some other files or group of files in your system. Because what you want to avoid is, let's say you have this delta thing where the latest snapshot says, oh, if you want this version of this tuple, go to this snapshot. And that's going to have delts, let's say, the next snapshot. So now you have a bunch of files that all need to be linked together. If just one of them gets corrupted, then that fucks everything up that comes later in the delta chain. You can only run back to that run point. So a lot of times it's important to database administrators that the checkpoint, it can be self-contained and I can move it around freely and not worry about other things getting corrupted and messing up that file. You see this in other aspects of database systems as well. So in Oracle, for example, they do dictionary compression for their pages, but they don't have a global dictionary for the entire table, the entire database. Every single page has its own dictionary table that only compresses the data that's on that page. So that means that if you have a global dictionary scheme, if the page that had the global dictionary gets corrupted, then that corrupts the entire database, because you can't get back the original values. But if your single page with its own dictionary gets corrupted, you only lose one page. You don't lose the entire database. So this is a big precaution that we'd like to take in database systems to avoid having catastrophic affairs or complete corruption losses by trying to make things more self-contained and independent. This is why people don't normally do Dazzle checkpoints. You can obviously use a file system that can do its own deduplication and checkpoints and things like that, but the database system is not going to do this. All right, so now we can talk about recovery. So I'll go through this very quickly, because we're short on time. But the basic idea is that we're going to load the last checkpoint as you would on our disk base system, and then we're going to replay the log. But the key is that instead of going in the order from the oldest log record to the newest log record, we're going to go from the newest to the oldest. And we can do this because the value of logging will guarantee that we always end up with the correct version of the database, and we can ignore anything in the past. So for recovery, we're going to have multiple threads run the process of the different checkpoint files on disk. This is just reading the file, scanning through, and loading the database and rebuilding the index. Remember, in an in-memory database, you don't log changes to indexes, and you don't write the indexes out to a checkpoint. You always rebuild everything after a restart. For the log recovery, we're going to go check on our persistent epoch file to see what the most recent persistent epoch across all the different stored devices that are available. As we then fire off all the threads or replay the log, we'll ignore anything that came after that persistent epoch. And then for each thread, they're going to go grab the newest file that's still available, and they will go replay them in reverse order. And what will happen is, as you write the changes to the database, as you see them in the log file, you're also updating the transaction ID of the epoch that corresponds to the transaction that made the change the first time when we were running. So what happens is, because you're going in reverse order, when you apply a change, and then if another transaction modified that same tuple earlier in the past, we can ignore it because you know its transaction ID or its epoch came before than the one that's currently installed. So this allows you to skip most of the log file if the changes are mostly in the older portions of the log file. Whereas, again, in areas, you have to do everything from beginning to end in order to make sure things end up in the correct order. So this is a quick overview. Nothing really fancy here to explain, but we have our persistent epoch thread. He comes online, figures out what the latest epoch should be. Then we fire off a bunch of these worker threads. They're going to parse through the different files we have in parallel. So this notifies them what epoch they should be looking at. When we first go through the checkpoint, and basically that's just copying the page in memory, copying the file in memory, scanning it sequentially, and writing it out to the database. And then we go do our log replay in reverse order. OK? So the one thing to sort of understand that I think is really cool about the silo R stuff is that you don't need to use the log sequence numbers at all because the order that the transactions committed and executed corresponds to the serial order of the log replay. So what I mean by that is in the case of two-phase locking, if you're using Aries, you have this other transaction ID that you're using to order transactions as they run. But that doesn't tell you exactly how you should order their log records when you need to replay them. And that's why they have to use the separate log sequence numbers. But in the case of silo, because everything's sort of distorting these batches, these epochs, that's enough for you to figure out the serial order. So you don't need a separate log sequence number. You just rely on what already exists. So I think that's sort of the novel aspect that makes what they're doing, I think, much more interesting than what Aries has done in the past. OK, so quickly run through the evaluation. They're going to pair a silo versus running with and without the checkpointing and logging recovery, using the YSTB and TPCC benchmarks. And part of the reason why I have you guys write down in your paper reviews what workloads or what they're using in their evaluation is that when it comes time to do project number three, you want to see, based on what your project is, you want to pick a workload that can stress or that does the kind of thing that affects the portion of the system that you're actually working on. So YSTB and TPCC or OLSB benchmarks, whereas TPCH, for example, is an OLAP one. So when you read the paper, you may have skipped over this key aspect of how they're going to get really good performance. So they're running on a machine at MIT that has four socket Intel Xeons, where each socket has eight cores. So in total, there's 32 real cores and then 64 with hyperthreading. And it has a quarter terabyte of DRAM. But you may have missed this part here. They're running with three Fusion IO drives. These are super high-end SSDs that run on PCI Express card. So this is like $20,000 worth of hardware. This is why they're able to do this. But apparently, they only had three of them. So the fourth drive, because you need one storage device per socket, the fourth one is a RAID 5 disk array. So I guess that might be end up being the slowest one. So this goes a bit beyond what most academic papers look like. And this is why Eddy Kohler is awesome. So for the first benchmark on YSTB, so this is going to do 70% reads and 30% writes. And the writes are just in place updates or updates. They're not doing inserts. And so what they show here is a time series of silo R, so logging and checkpointing, log silo would just logging, and then mem silo without any logging thing. And what you see is mem silo is performing better than the other ones. And it's not just because they're not paying the penalty for logging. That's because mem silo is able to use all the cores that are available to the system to execute transactions. Whereas silo R and log silo have to use some additional threads for the logging and checkpointing threads. So the other aspect you see is that the silo R is not that much slower than the log silo. So here you're getting checkpoints without paying a huge, huge penalty. All right? And now you see in the bottom you see the latency. So again, they run at 40 millisecond epochs. And so transactions ideally, if you were running without any logging and recovery, should be able to complete in roughly about 40 milliseconds. But they're hovering around about 90 milliseconds across the board. This little blip here, I'm assuming, is some kind of flushing on the drives or garbage collection. So even though we're running a 40 millisecond epochs, it still takes about 90 to 100 milliseconds for transactions to complete on average. So then we can look at TPCC. So TPCC differs than YSB in that it's very insert heavy. It's sort of an order processing system like Amazon. So you have all these new orders coming in from customers and you're updating stock information. So this is just showing across the board the sustained throughput of silo R, log silo, and mem silo. And they're roughly around the same. So a silo R is running about 548,000 transactions a second, then then 575 and 592. So this sort of tells you that they're getting bottlenecked on other aspects of the system, like its concurrent control scheme, rather than the actual logging mechanism. I don't know why they didn't include the latency for the no recovery case. I think that would have been useful to know how much better you can get without doing this. And then to finish up, they have a table that shows the size of the different files on disk and then how much time it took to recover them. So in this case here, the total size the database would be to recover would be this plus this. That gives you the last number, and so they can recover a 200 gigabyte database in 200 seconds. If you don't know this, that's really fast. Under Postgres or a disk-based database system, depending on the hardware, to reload something like that might take an hour. So this is pretty fast. So any questions about this? So the main takeaway that you should remember is that physical logging is sort of a good, general purpose logging protocol that can work pretty much on any different kind of concurrent control scheme you can have. So although, in this particular case here, they were doing physical logging with silo, and they exploit the semantics of OCC to allow them to reuse the epochs as the log sequence numbers. But in general, you can use physical logging for two-phase locking, NVCC, and all other protocols. Whereas in logical logging, we'll see on Monday can only be really used for what's called determinant to concurrent control schemes, like the HRE1, where you can guarantee the order of transactions across different nodes or after restarts. So next class, we'll talk about other checkpointing schemes for in-memory databases. So a lot of work done in the late 1980s that we can look at. And then we'll talk about logical logging or command logging in VoltDB. And that's the assigned reading for you guys. And then we'll talk about a restart mechanism for Facebook that you just shared memory that I think is kind of cool. Any questions in the back? Your statement is the logger thread has to maintain three buffers. And you think that's a critical error? What's the last word? His question is, if you maintain three buffers in the logger thread, is that a race condition? Oh, free buff, not free, free. So his statement is, could you have a race condition where the multiple worker threads go to the logger thread and try to acquire a free buffer and there's a race condition where it may get assigned out by two threads at the same time? For that one, you just use a latch. The statement is, would a latch be a bottleneck? So it depends on how often you're going to get the going back for your log buffers. So yes, it potentially could be. I actually don't know what they do in their implementation. But it's not like the log sequence number. Everyone's trying to get a new number all the time. You can imagine, I think it depends on how often, what the modifications look like to the database, how big the log buffers are, and how often you have to reuse them. I mean, I think there's different ways of tuning to make it go better. If you imagine it's a queue, there's a latch at the front of the queue, you pluck your thing off. So it's not too bad because you're not holding the latch for a long time. You just pop the queue and you're good to go. What's doing a week from now? BWTree, if you have any questions, focus on Piazza and enjoyably around for the weekend, I think also to have office hours to help you guys out. OK? All right, see you next class.