 Okay, let's get started. There was a question on Piazza about one of the tests for the skip list. I think the person might actually be correct. So we're going to go double check to make sure the test is actually correct. But everything else should be working. And Dana is going to set up Autolab either today or tomorrow. So that will open it up so you can start submitting it and do the full range of the test that we have available as well. Okay? Any questions about the skip list assignment? All right, so remember, I think it's due Thursday next week. Okay? All right, so today now we're going to talk about doing logging and recovery. So for this class and next class, we'll be talking about how would we actually store information on disk for our in-memory database to make sure that when we crash, we stop the system, we can come back and make sure everything's still there. So today I'll start off talking about the different types of logging schemes that you can have. And then we'll do sort of a quick crash course of Aries crash recovery pun intended. So we don't teach Aries in the undergrad course or intro course because it's sort of complicated. And so I'm not going to go really deep into details how Aries works, but I'll go out at a high level so you can understand the differences that we'll make when we talk about physical logging or logical logging in either SiloR or in InfoldDB. Okay? All right, so it sort of should be, again, obviously everyone here because you should have the background in introduction to databases, but we're going to use the logging and recovering protocols in our database system to make sure that all the changes that transactions make are always atomic, meaning they either all happen or none of them happen. They're always durable so that if you stop the system and come back, you don't lose any changes, and that we're going to show that the database is always going to be in a consistent state. And that sort of falls in line based on the atomicity and durability guarantees, right? So we don't want to see torrent transactions. We don't want to see partial updates or things like that. So every protocol, recovery algorithms in a database system is going to have two parts. The first part is all the actions that you're going to execute at runtime while you execute transactions normally. It's also the additional steps you're going to do as you run queries to make sure that you can always recover the database if there's a failure. And then the second part is the things you'll do after a crash or at least after a restart to put the database back into the correct state based on the data you were storing in sort of this phase here. So in a in-memory database system, remember we said that the primary source location of the data is assumed to be in DRAM or in memory. So when we say that we're doing recovery, the things that we'll talk about, it's not always going to be necessarily due to a failure, right? If you stop the system, like to stop the process, stop the database system, and then turn it back on, it has to suck all the data that was out on disk back into memory, right? And this is because we're not using M-Map, right? We have to read everything back in. We can't be guaranteed that everything will be in shared memory when we come back online. So in a typically what happens in an in-memory database, you obviously want to tell the database system, go ahead and shut down, and then it'll take a checkpoint or it'll flush all the logs to write everything out. And then when you come back, it's going to be loading the database back in in the same way that it would do, the same sort of process it would have if it was trying to recover from a failure, right? So the same mechanism to turn the database on is the same thing we'll use to recover the database after a crash. There's no real distinction here. So there are essentially two types of logging schemes you can have in a database system. So the first is physical logging, and this is where you're going to record all the low-level physical changes that transactions make to individual records in the log. So an example of this would be like when if I update a tuple and I modify a value, I want to store my log, but the old value was and the new value was, and I put that in my log record. The other type of logging we have is called logical logging, where instead of actually storing the low-level physical changes, we're instead just going to record the high-level operations or the SQL queries that we invoked for a transaction to make the changes, right? So if you have an insert, update, or delete operation, or delete query, you can just store the raw SQL in the log, and that's enough information that you need to come back after a crash because you know how to re-execute these queries. So in the side of the R paper you guys read, this is, again, another example where someone, you know, people that do sort of systems literature or systems research versus database research, they call things differently. So in the side of the R paper, they refer to this as value logging, and then this I think goes operation logging. But again, in database literature, it's always going to be either physical logging or logical logging. So what are some of the trade-offs we can have for these two different logging schemes? Well, with logical logging, the nice thing about it is that you actually, you have to store less data than you would normally have to do in physical logging. So let's say I have a database with a million tuples, and I have a single update SQL query that's going to update all one million tuples. In logical logging, the only thing I need to store in the log is just that one single SQL query. Whereas in physical logging, I have to store all the one million records that got modified from that update operation. So with logical logging, you end up storing less data. Now you may think this is amazing, right? Why would anyone want to do physical logging when logical logging is much, much faster? And the reason why everyone pretty much uses physical logging, not logical logging, is that it's actually really difficult to implement recovery with logical logging when we have concurrent transactions. I'll even be more specific. When you have concurrent transactions in a non-deterministic concurrency show protocol. And I'll show examples of this later on. And the challenge is because when we have concurrent transactions, the database system is interleaving the operations of those transactions in any way that it wants. And if you're not recording what wrote to what first, where you have two concurrent queries, when you recover the database, you may not be able to put it back in the exact same state. It may still be a serializable state, it still may be consistent and still follow a serializable schedule, but it may not be the same way, it may not be the same state as it was before the crash. The other downside of logical logging is that it is going to take much longer for us to actually recover the database because we're essentially re-executing all the queries all over again. So let's say my example when I did my one update on a million tuples. If that query was very expensive to compute and it took one minute to run, when I have to recover the database and I look at my log and I see that there's a single logical record that says, here's the update query, I have to run it all over again and it's going to take that same minute. Just because we're in recovery doesn't mean the queries are going to run faster. We're in physical logging, it literally is just sucking the data in on a log, looking at where it goes and plopping it down as a straight mem copy or copy to the location where it should be located. So in general logical logging we faster at runtime, but slower at recovery and physical logging is slower during runtime but faster in recovery. So let's look at an example of how we can recover database using logical logging that may actually put us in an inconsistent state. So for this I have two queries or two transactions. The first guy wants to do an update on the employees table and wants to increase all everyone's salary by 10% and the second one wants to update the employees table and go to my single record and set my salary to 900. So for this example we're going to assume the database systems running with read committed isolation level. So it doesn't matter whether we're doing MVCC or two phase locking, the only thing we care about is ever running with read committed. So when my first transaction starts and it wants to invoke this query, I'm going to write into my logical log the exact copy or the exact string of the SQL query that I then invoked. And then once I do that then I can have my cursor now do the scan on the table and start updating the tuples. So let's say that it gets to the first two guys and then there's a context switch or something, there's a pause and now my other transaction starts running. Now he's going to start off by writing to the logical log to say I'm going to update the police table and set the salary to 900 and again it's a straight copy of the SQL string that the client sent us. So now he's going to come in, he's going to get to this record here and he's going to update it to 900. He then commits, the other thread starts running again and it comes down here and now it's going to update my salary and add the 10% raise. So this is still serializable, this is okay, because we're doing the update for this one before this one. So we're still following the serial ordering. But now the problem is if there's a crash and I now have to come back and replay my log, if I replay the log in this order, as it's defined in the base one it was inserted, I'm going to end up with this state here where I'm going to lose that 10% raise for this last tuple here. And again that's okay, that would still be considered serializable but as you can see when we ran the first time we had this state and we recovered the database, we come back, we have a different state. So now if there was another transaction that read this value and gave it to the outside world when I recover and now if someone comes and reads this again it's going to get a different value. So this is what I mean by having an inconsistent database after a crash. So this is sort of a very small example but these are the same kind of complications you can have when you have now multiple queries running and you want to do logical logging. Now if you run in pure serializable mode or serializable isolation or snapshot isolation this problem would not occur because when this thread got to this last guy here he would see that my other transaction already updated it and the first writer wins so the other transaction would abort. But there are some other scenarios where I think logical logging could have problems but I will fully admit that no one has really thought through whether doing logical logging at sort of the SQL level in an in-memory database is a good idea or not. We'll see how VoltDB does it but they are doing a different kind of logical logging called command logging where they're not logging individual SQL queries but instead they're logging stored procedure invocations but as far as I know no in-memory database does what I'm describing here and I haven't really read the literature or thought through or whether it's even possible to do this or not. So we're in that one state so that's bad. So now we're going to spend time talking about what pretty much every single disk-based database system does how they do logging and recovery and again this will set us up but when we talk about Silo R we'll see all the cases where they'll deviate because you can do certain things or you apply certain optimizations in an in-memory database that you can't necessarily do in a disk-based database system. So the canonical method for doing logging and recovery in a disk-oriented database is called ARIES. An ARIES stands for the algorithms for recovery and isolation exploiting semantics. So this is like if you take the textbook for any database class what they're essentially describing to you when you talk about logging and recovery is some variant of ARIES and pretty much every single database system that does a disk-based that does logging and recovery is using something that looks a lot like ARIES. So ARIES was invented at IBM Research in the early 1990s by this guy Mohan who's now I think like an IBM fellow so he's like a very distinguished researcher at IBM. He's an awesome guy, he's super fun. And I'll say one thing about this when you look in the literature you know I didn't have you guys read the ARIES paper that describes all this logging stuff because it's actually it's a real chore to read. It's like 70 pages, it's really in-depth because he's covering all the corner cases you have to care about to make sure that your database is always correct. But one thing to be mindful about this there's a bunch of subsequent papers that Mohan then put out that all sort of follow under this ARIES umbrella. So like the key value locking that we talked about with indexes a few weeks ago that comes from a paper called ARIES key value locking sort of like he hit a home run with the logging stuff here with ARIES and then he sort of used that moniker with a bunch of different other papers for branding reasons. But when people say ARIES you pretty much think about the protocol that I'm going to talk about here. So I'll also say too, even though this paper came out in 92 it's not to say that nobody else was doing logging and recovery before this. There was certainly enterprise ready or enterprise database systems running in production that could do logging and recovery. What this paper really does is it sort of lay out in excruciating detail exactly how you want to do it. I said before then everyone sort of could think about how they're doing it but then he really codified the steps for doing this. So since now we're going to go back and discuss doing a disk based database system we have to put a sort of disk based database system hat on and now I'll start thinking about buffer pools where we got to ignore that before because we were in memory. So for ARIES we're assuming we're having a database system that's with a buffer pool manager that's going to be using the steel no force buffer pool policy. So as a quick reminder what steel mean? Dirty pages from dirty pages that have been modified but uncommitted transactions are allowed to be flushed out of the buffer pool and written to disk and what does no force mean? Somebody out of the mat. No force means that we don't require transaction to flush all its dirty pages out the disk when it commits. We do have to flush all its log records but we don't require to flush the pages. So the main ideas of ARIES that are a relative discussion here is that we're going to be using write ahead logging to make sure that any changes that we make to a page in memory there's always a log record that's stored out on disk before that transaction is allowed to commit and we're always going to have to write the log record that modified page out the disk first before we can flush that page because we need to know what actually modified the page when we write it out. And the way we're going to do recovery is that we're going to repeat the history by replaying the log record, the log from far back in time going forward and we're going to redo all the operations from transactions that have committed since the last checkpoint and then we're going to undo by going in the other direction and now reversing changes from transactions that did not commit before the crash. So that means that in order to do these two steps in our write ahead log these records are going to have to have both the before image and the after image of every actually that's ever modified. You need the before value to undo it and then the new value to redo it. So now at runtime what's going to happen is anytime a transaction modifies the database you're going to pen some log record to the tail of the log and then when the transaction commits we have to make sure we flush all the log records that correspond to that transaction out to disk as well and this log is going to be done in sequential order so there may be log records that get appended to the log that came before our committing transaction but then we'll also get flushed out to disk as well. So this is why we have to do the undo phase to go back and maybe reverse changes of transactions that got flushed out to the log but didn't actually really commit. I'm not going to talk too much about checkpoints here because we'll do a whole lecture on that on Tuesday but the one thing to be mindful about what Aries is doing is using fuzzy checkpoints to flush out all the pages that are in memory out to disk at different periods during execution so that we don't have to replay the entire log when we have to recover and fuzzy checkpoints means that rather than having a sort of consistent snapshot of the database where we know there's no dirty pages from active transactions it's going to allow transactions to still modify pages while we're still taking the checkpoint and that means the checkpoint, the database system has got to maintain some extra metadata in the checkpoint to know things like what were all the transactions that were running at the time I was taking my checkpoint what are all the pages that got modified that I was taking the checkpoint and then this is going to require you to do on recovery to look at this information and reconcile what actually should be the correct state of the checkpoint when you start and then you start even before you can start replaying the log so again, I don't talk too much about checkpoints here because we'll focus more about it on Tuesday and we'll talk about different ways to do fuzzy checkpoints but the one thing that I'll say now is that there are some cases where multiverse and concurrency will make it easier for us to take checkpoints because we can use snapshot isolation to be guaranteed that we have a consistent snapshot of the database we're not going to see any changes from transactions that have not committed yet and we'll see again on Tuesday how different database systems even if they don't do multiverse and concurrency control maybe switch into this special mode whenever they take a checkpoint so now, as I said, the log is essentially this sequentially ordered data structure, this list that's going to keep track of all the changes that transactions are making now every single log record is always going to be uniquely identified by this thing called the LSN, the log sequence number and this is essentially how the data system is going to determine the serial order of log operations and this looks a lot like what we talked about in concurrency control where we're using time stamps to figure out the serial order of transactions but in this case here, we're maintaining this just for the log so there will be the transaction IDs that you sign in transactions and a log sequence number that you can assign to individual log records so now before I talked about how the generating transaction ID or transaction time stamp could be a bottleneck we talked about using atomic operations or batching or special hardware counters to make these things go faster it's now even a bigger issue if you take Aries and try to use it in memory system because now, say you have a thousand transactions running at the same time it's not that big of a deal to maybe get a new transaction ID for those 1,000 transactions but if all the transactions are updating the database and they all make a thousand updates now you need to generate a new log sequence number for all those 1,000 updates per 1,000 transactions so this actually can become a bottleneck in a multi-core fast system and you'll see again why the silo guys choose to use this persistent epoch batching technique to avoid this bottleneck here so the LSNs, the data system is going to assign for its log records it's going to be used all throughout the system to keep track of almost everything that's going on keeping track of what's in memory and what's out on disk what's dirty and what's clean and things like that and so every page is going to have its own page LSN that corresponds to the log record that made the most recent change to it the data system is going to track of what's the last page or log record that I flushed out the disk and then for every page that I need to flush out you need to say it has the log record that modified this page has that been successfully written out the disk so there's all these different sort of things we need to keep track of for these LSNs all throughout the system to know when it's safe to write things out so this is a quick high level diagram of what Aries sort of looks like and this is a gross oversimplification but for our purposes this is enough so again every single log record either whether it's in memory or out on disk will have a unique LSN and then we'll keep track of inside of any pages we have the LSN that corresponds to the log record that last modified it and then we'll have the flush LSN that keeps track of like what was the last log record we successfully written out then there's a master record stored in disk as well and this corresponds to the last checkpoint that you took so now while again while our transactions are running we're updating pages we're pulling things in and out into the buffer pool from disk we have to make sure all these things are correct and all our invariants are satisfied to make sure that we don't write out dirty data when we shouldn't have and again this is super high level what Aries is doing there's a lot more going on but these are the main things we have to care about so in order to flush a page I have to go make sure that the page LSN is less than the flush LSN because I don't want to flush a page that has been written out the disk yet I don't want to flush a page where the log record that modified it hasn't been written out the disk yet so I showed you this graph in the beginning of the course from the study that they did with the shore system where they measured how much time the database system was spending in the different components and remember I talked about how for the buffer pool manager that was 30% for locking that was 30% and then for recovery and logging recovery it was 28% and now you kind of see why this is actually so high this measurement here is just based on CPU cycles so this is not even counting like actually flushing out to disk or an SSD this is literally just all that LSN management stuff that I talked about before keeping track of these log records checking all these different counters to make sure that things are in the right order and that's why this 28% number comes up you're doing all this stuff because you will need to make sure that the state of the database on disk corresponds to what the correct state of the database that's in memory right and so when we'll see you and we talk about silo r we're going to reduce this number by not worrying about LSNs or not worrying about individual pages we're going to worry about things on a per transaction basis so for the recovery phase again real quick what Aries does Aries has three phases so in the first phase you have to read the right ahead log from sort of beginning to end and you want to figure out what are all the dirty pages that are hanging around that were in memory while you were running what are all the transactions that were active when there was a crash and you sort of you compared this with all the extra information that's stored in the checkpoint to know what was going on then you jump to some point in the log and you start from beginning to end and redoing all the transactions even for transactions that are eventually going to abort you redo everything then you get to the bottom of the log and then you go back in the reverse direction and you start undoing all the transactions that you didn't see commit so they don't have a commit record in the log you know that they didn't actually finish successfully so in the undo phase you reverse everything there's sort of like three passes on the log you have to go through once to figure out what's even in there then you go jump to some point and start redoing things and then you go back in the other direction which is usually the undo part is shorter than the redo part you go back and make sure that you undo changes from transactions that didn't finish alright so we're not even going to do comparison on recovery times here but this can be pretty expensive I don't have exact numbers if you ever have to restore a very large Postgres MySQL database it's going to take hours possibly days depending on your hardware and things like that but it's doing all these steps so that's sort of the high level issue high level issues that occur in disk based log and recovery so I want to talk a little bit now what are some optimizations we can apply to make sort of transaction execution and logging go faster in this kind of system what I want to also do is that the techniques that I'll talk about here are not specific to disk based systems we can then also apply them to the in-marry systems that we talk about next so in the pie chart I showed before again that was only measuring CPU cycles so it wasn't measuring the time it actually takes to flush out data to disk right and in actuality that's always the slowest part of any database system right writing out the disk unless you have a really really nice device is always going to be slow so and the reason why it's going to be slow is because we have to do an F sync on the disk controller to make sure that any changes that were in the sort of the disk controller buffer are then definitely written out to disk because again we don't want to crash and come back and something we told the outside world was safely committed is actually gone right so that's why we do have the F sync and wait until we get an acknowledgement before we can send it to the result back to the client so one standard technique to speed this up is you use something called group commit and so the basic idea here is you're going to batch together a bunch of the log records that the different transactions generate whether it's physical logging or logical logging doesn't matter you're going to batch together a bunch of their log records and then you're going to write them out all at once and do a single F sync and the idea here is like you want to amortize the cost of having to do this F sync across multiple transactions so that means if you're the first transaction that shows up you'll wait a little bit longer and if you're the last transaction that shows up then it's just the same thing as if you did an immediate F sync this sort of evens out the average latency of everything so there's two ways to do this the first is like you have a buffer and you set the size and when the buffer is full then you just go do the F sync right there another way is to have a timeout to say after 5 milliseconds or 10 milliseconds no matter whether my buffer is full or not then I do the F sync for that so this technique was originally developed in IBM's IMS FastPath and FastPath is sort of like in the same way that Hecaton was a optimized execution engine for SQL server FastPath is like the optimized execution engine for for IMS there's not a lot of there's some early literature in the 1980s that discusses what it is but it's very archaic and then now when you actually try to read what FastPath is on IBM's website there's a bunch of corporate mainframe stuff and so it's hard to sort of decipher but the best of my knowledge is basically like an in-memory execution engine for IMS and so now obviously this seems like an obvious thing of course you want to batch together your transactions and pretty much everyone does this but maybe in the 1980s this was considered a breakthrough idea the other technique that we can apply optimization we can apply is do early lock release the idea to think about this is technically when a transaction tells you I want to commit it hasn't really truly committed until you know that the log records are flushed to disk and so under proper concurrency protocol that means that you have to hold all the locks for transactions while they're sitting in the queue waiting to flush out the disk but when we think about it that's kind of unnecessary right because you're just going to assume that it's going to be right out the disk successfully so when under early lock release you can give up your locks while you're waiting in the queue to get flushed out to hand them off to other transactions and let them continue to run because otherwise it is blocked and waiting for your f-sync now this obviously would be bad if you have a read-only transaction that came behind a writing transaction that was waiting to get flushed and it read its changes and it wasn't running under read-on committed isolation and then it read its changes and then send out the data to the outside world before that flush happened and then the flush fails and then now we have we had data leak that shouldn't have occurred so the database has got to maintain some extra metadata to know that the transaction that is being waited the transaction that you're waiting to flush it releases its locks and any transaction that comes along that reads the data that that waiting transaction has modified it has to wait to know that that transaction has successfully flushed before it can it also can commit there's certain dependency information you have to maintain about who's waiting on what and what locks you're allowed to get in order to make this thing work right but again this is sort of a no-brainer idea this is what a lot of systems do you sort of specally assume commits are going to secede and so forth you release your locks early and again there's nothing about what I'm describing here is specific to a disk based system even a two phase locking you can do this in some of the time stamp ordering schemes that we talked about before okay so any questions about disk recovery, any questions about Aries or any of this before we switch over to the in memory stuff and again the pretty much every single database system out there is using some variant of Aries they may not call it Aries and maybe slightly different but at the high level all these protocols are essentially the same alright so now with the in memory system one of the nice things about this is that logging recovery is actually easier for us in some ways right because now we don't have to worry about tracking dirty pages in our buffer pool right we don't have to make sure we don't have to worry about when I load my database back up and I read my last checkpoint are there a bunch of changes that I shouldn't have been allowed to see um and this is because and so because of this we don't have to worry about now storing undue records in our logs no records if we're using physical logging only need to record the the new values or the after images there's just doesn't need to perform there need to store the before images because we never have to undo once we once our transaction commits and we know it's been flushed out the disk then we never have to reverse it we may want to replay it if we have to recover after a crash but we never have to unroll it or undo it so that's not also that's not to say that we're not going to maintain undue information in memory right because if we have our transaction that gets aborted because of a conflict we obviously need to roll back any of these changes but what I'm saying is again after you do your commit and your flush to disk you can blow away all that undue stuff so our system is still always going to be slowed down though by the slow sync time of non-volatile storage right the disk storage so even though we have a fast in-memory database system we're always going to be slowed down by the slowest device in the big disk so one thing I'll point out though when you read some of the early papers from the 1980s 1990s on logging and recovery from some of the early in-memory database systems they always make this sort of statement like oh well non-volatile memory is going to come out pretty soon and we think we can use that and that will speed things up our in-memory database will be super super fast so obviously it's going to be in five years and then 30 years later we still don't have these devices so in all the protocols that we'll talk about here we'll assume that we don't have non-volatile memory and then we want to talk about how we do a protocol on SSDs and HDDs so I'll talk later in the semester about some of the research we've been doing for non-volatile memory I would say that like you know so this paper is from 1987 it says like yeah we'll have non-volatile memory pretty soon and maybe these early systems they were maybe talking about having regular DRAM that had a battery backup or like a super capacitor so that like if the power gets pulled it has enough charge to flush everything out to some kind of stable storage you can buy that hardware today it's not cheap it's certainly not commodity hardware like you can't go to Amazon and go on EC2 and get an instance that has non-volatile memory so the newer technologies that are coming out they're actually true non-volatile memory so it's not DRAM plus a battery it's actually sort of special materials that can actually store data like DRAM, like at a byte granularity when you pull the power everything is still durable so again I'll talk, I'll give a whole lecture about this later in the semester I'll just say that like when I gave this talk last year I said like oh well I think non-volatile memory will be out in five years so now it's a year later I still think it's four years although Intel claims they're going to have something out later this year but it's not going to be it's not going to be like a DRAM replacement it's going to be like a fast SSD that goes for the PCI Express again I'll talk more about non-volatile memory and the implications of it but I would say that again I still stand by my statement it's probably four years away before you get something that can replace DRAM and be non-volatile and then in the lecture we'll talk about what do you need to change in the database system to make use of this okay so the paper that you guys read was the login recovery version of silo and they called silo R and you guys read the silo concurrency control paper early in the semester and part of the reason I had you read that paper instead of the hackathon concurrency control papers is that it makes it easier to understand how they're going to do recovery so you can understand how they're doing this epoch based OCC and you can see how that's going to fit in with this logging scheme so the key idea that they're going to do here is that they want to do high performance logging by parallelizing the log records or the different logs they have to maintain so as we'll see in a second every CPU socket is going to have its own dedicated physical device to store data and so that means there's essentially going to be multiple log files being generated at the same time and they have to do some extra work to make sure that they can coalesce these things on recovery and make sure they always put the database in the correct state but whereas in the areas that I was sharing before that was assuming that there was a single log file that all the different threads were appending to and so again I love this paper because I think Eddie Kohler is one of the best systems researchers out there now and silo R is a well well written and really lays out all the key key ideas very well very well so the logging protocol for silo is that again the database system is going to assume that there's going to be a single storage device that's dedicated per CPU socket and it's going to have one thread in that CPU be allocated as the log or thread and that thread is going to be responsible for writing all the log records out to this device and all of the other cores will be used for these worker threads that are going to be responsible for executing transactions so what happens is when a worker thread wants to get to the transaction it's going to create all these new log records in a buffer and then it's going to hand that buffer off to the log or thread to write it out so again because we're in memory database system we don't have to record any undue information in our log we only have to record the redo information because that's enough for us to restore the database back to the correct state so now in the log or thread it's going to maintain this pool of log buffers that it's going to hand out as needed to the different worker threads the worker thread's log buffer is full it then hands that buffer off to the log or thread which then queues it up to write it out and I'll use group commit to batch a bunch of these buffers together and write them out all at once and do a single f sync so the key thing though is the way that they're going to support avoid having to overload the system is that the number of log buffers they have in the pool is finite so if a worker thread gives back a full log buffer and then asks for another one if there's not another one available then they have to stall and wait and the idea here is that you don't want the you don't want the worker threads to get too far ahead of executing too many transactions and the logger threads can't keep up by running it out fast enough so there's sort of access that built in governor to make sure that you don't overload the system and provide proper back pressure to make sure that things sort of are smoothed out nicely so avoid sort of burstiness of like all these transactions committing out once and then a stalling for a while then committing all again so in the log file the logger threads are going to append to a single file all these different records and in each record you have to maintain the idea of the transaction that modified it and then a triplet that corresponds to the table that was modified and then a key and a value and the value could be a single entry, a single trailer value, or it could also be an attribute list where of the individual attributes that were modified so this is sort of like the delta encoding scheme that we talked about for MVCC where instead of storing the entire tuple you only need to store whatever the individual attributes that were modified and what's going to happen is the logger thread is going to keep appending to this file and then when it gets to about 100 epochs then it's going to switch rename that file and then create a new one and the idea here it makes it easier to do disk management now because you're going to have all these individual log files that you can then clean up and throw away if you know that you don't need them anymore sort of a way for you as the administrator to be easier to manage this thing rather than having this giant single log file which you always have to keep around if you break it up into smaller chunks then you can throw away older files more easily and this is not a novel idea to silo pretty much every single database system does this so this is a screenshot of a MySQL installation that I help manage and what you see here is that there's two files called IB log file IB log file is still IB log file one so this is the same log file that the silo R is also maintaining every single time you make a new file you rename the old one and create the new one and that means that I know that if I don't need to keep track of my old history of my log records and blow this file away easily without worrying about corrupting this one again this is not fundamental to the protocol it sort of makes it easier to manage these installations alright so let's see they have an example like this so this is an example of what the log record format is going to look like so I'm going to do a single query I'm going to update the people table and I'm going to set the islame flag for Dana myself so in this log entry here you would have a table it modified and then the key for the value and then the update to the attribute so one key difference here again from what Aries did is that there's no log sequence number we're actually now using the transaction IDs to figure out the serial order of these log records and that's enough to guarantee that when we restore the database we're always going to go back in a consistent correct state alright now in the case of OCC because Tyler was using OCC recall that you don't actually get a transaction ID until your until your transaction finishes the validation phase so we'll be appending these log records during the read phase of a transaction as we're actually modifying things and then for that since we don't actually have a real transaction ID we just put a little placeholder in and then it knows how to come back and fill that in replace and then change that to be the correct one when the transaction actually truly does commit yes so your question is why can your question is why does this logging scheme allow for interleaving of concurrent operations your question is why does this scheme allow for current interleaving of operations from multiple transactions and still guarantee that transaction ends up in a correct state because the transaction ID corresponds to the serial order of operations so even though I may have log records appear in different orders there may be a log record here for transaction 999 that comes after this one when I do my recovery phase I know that this thing should come before that thing in the real serial order so physically on the log, the order might be switched but logically the transaction ID tells me what the order should be sorry, say it again so let's say there are two transactions and transaction execute operation 1 and transaction 2 execute operation 2 and then the thread switch back to transaction 1 and transaction 1 execute operation 3 so this three operations both has transaction ID attached in the log record so again, every log record has to have a transaction ID and that's unique to whatever the transaction that made this change and this transaction ID is always increasing in order so in that time you need to determine the serial order of transactions so it doesn't matter that when I ran it the first time the interleaving might be different the transaction ID will guarantee that I know the true serial order of these things so when I recover I'm always going to be in the correct state because I go by these IDs and again the fact that we're doing these persistent epoch batches also makes this a little bit easier but with an epoch you know how to generate the order of these things so it's there's enough information around in the log for you to reason about what the correct state should be when you come back and these transaction IDs tell you what that order should be okay so let's look at a high level example that's going on so we have our in-memory database we have our worker thread we have our logger thread it's free buffer pool and then it's a list of buffers that wants to flush and then we'll assume that this is running on a single socket so there's one storage device with one series of log files that we're maintaining for our transactions then we also have our persistent or sorry epoch thread that's responsible for updating the epoch every 40 or 50 milliseconds and it doesn't matter what thread this is on in our socket because it doesn't have to run that often it runs periodically so let's say that the client invokes a transaction and we want to start running on this worker thread so remember that in silo they talk about using a one-shot API which is just another word for doing store procedures so this is sort of a self-contained sort of procedure call a function that has program logic intermix with SQL queries so this is running all in one place on the worker thread so the transaction is going to start and so it wants to start updating the database so we have to go to our logger thread and say give me a free log buffer and then once we got that then the transaction can start running and start making modifications to the database and we'll append these entries to the log record now let's say that before this transaction commits it makes so many changes that it actually can't you know it fills up the log buffer so what we'd actually do is to go back into the logger thread and say this thing's full go ahead and flush it and then it can ask for the next one the next free buffer same thing let's say that it writes a bunch of stuff to this and it fills up so it's writing all those changes then the epoch thread updates to a new persistent epoch and that requires the thread to say alright whatever I have going on now I need to go ahead and write this out because what we want to happen is anytime we change the epoch any outstanding log buffers have to be written now and you don't start another round of transactions until you know everything is durable before you start the next epoch so in this case here if this guy starts running again it can't because it's not going to be able to get another log buffer so it has to stall and wait and in the meantime the logger thread will flush these things out and once we know these are durable then these log buffers go back into the free pool and then this guy can get one and keep running again and again the fact that we're maintaining a finite memory pool for the log buffers prevents us from overwhelming the system by running too many transactions and we can't write them out fast enough so I think on the side of the paper they talk about this being like 10% of the total memory size yes Dustin, if the worker thread gives a buffer to the logger to flush before it's finished committing how does that violate the I guess the rule that we don't have to undo so his statement is if I'm here actually here if I give if this sorry, it's here when I hand off this log buffer and I tell it go ahead and write this out but the epoch hasn't finished yet or the transaction hasn't finished yet would that violate our guarantee that we don't have to worry about writing dirty pages right and the answer is no because we'll see in a second because we're not to maintain a separate log file to keep track of what is the what is the persistent epoch what is the sort of high watermark to know that any transaction prior to this epoch has been safely committed so we haven't written that out yet so we know that this is still fine to do so we would come back and know that yes we see some log entries for an epoch but that epoch shouldn't have committed yet the thing of this is they're doing everything in batches so once I know my batch of transactions have been flushed at every single log record or every single log device then technically all those things are then committed sort of like a super group committed across multiple sockets excellent segue so as Matt sort of pointed out the logger thread can write out things incrementally over time but a transaction truly is not committed until we flush out the persistent epoch entry and you can think of this as like it's a separate log file we're going to keep track of the current epoch the highest epoch that's been successfully written out the disk from all our logger threads and so what we'll say is that a transaction that executed in epoch E can only be truly committed and it has results released back to the application once you know the persistent epoch for E is actually saved to this special log device so the way to think about it like this say now we're going to run on multiple sockets we have say three CPU sockets and each socket is going to have its own logger thread which writes out to its own dedicated storage device which then maintains multiple log files and then we'll have a bunch of worker threads that correspond to it but then there'll be a special thread a special logger thread that's a persistent epoch thread and this guy is responsible for knowing maintaining this special persistent epoch file on any device it doesn't matter which one it is and then when transactions actually commit and flush an epoch it needs to know once it gets a response our acknowledgement from all threads that they have successfully flushed it out 200 then it also then goes and writes 200 to this and therefore the system knows that all the everything that came before this has successfully finished so the idea is like this guy's written out 200 this guy's written out 200 this guy's written 200 so I update 200 here so when I come back I look into this and say anything that comes after 200 has not committed so even though the logger thread may have flushed out something at 201 it's not going to get recovered because it didn't correspond it wasn't part of this batch that got successfully written out so last show we set and had a discussion about do you actually need this do you actually need this to have a separate log file and the answer is actually no this is only being done for convenience reasons there's a single location you look on recovery to figure out what the persistent epoch is but all these different logger threads are essentially recording this information already so on recovery what you could do is you could go to the end of every single log file and this is the max persistent epoch that was successfully written by all the logger threads and that's essentially the same thing as this so you don't need this you don't need this for correctness to restore the database date they just do this for software engineering reasons because it makes it easier and again you're writing out a small 8 byte persistent epoch number every 50 milliseconds so it's not that big of an overhead so correct this point of view in the case again the example that Matt wrote up say my log buffer gets full in this top guy here and we write out something like 201 in the transaction as a committed yet but when I crash to come back I see these guys only wrote out 200 and this guy wrote out 201 so I can't allow 201 to be successfully restored because all these other ones didn't get to 201 how can 1 being 201 isn't the epoch consistent at all the other statement is how can 1 be 201 and the other ones still be behind 200 so when I say 201 they wrote 201 these other ones were going to they just didn't get to it yet so therefore and again you don't release the acknowledgement you don't acknowledge to the client the transactions have committed until you know that all these guys have flushed the epoch this is different than in areas in areas you're flushing individual transactions you're returning acknowledgments to the client for individual transactions based on knowing that their log records have successfully written out in this case because they want to avoid the log sequence numbers because that's going to be a bottleneck and a really fast system they sort of again do these things in matches because then you only have to coordinate across all different sockets every 40-50 milliseconds alright so now let's talk about how to do recovery from this so in the first phase you're just going to load the last checkpoint that you took and again we'll talk more about what checkpoints look like next class the one thing I want to point out though is in silo and pretty much every in-memory database that I'm aware of you always have to rebuild the indexes when you turn the database back on so what I mean by that is the database system is not going to log any changes you make to indexes so when the system crashes all that information gets blown away and then when you come back you load the last checkpoint because you're essentially doing a sequential scan on the checkpoint to reload the database that's when you rebuild your indexes and this is not that big of a deal because as you'll see in your skip list when you start really sort of hammering it you can build an index pretty quickly you can process a couple million keys and sort of a couple million keys a second so that's pretty fast so they're willing to pay the penalty to rebuild the index on recovery to avoid having to log information or changes get made to index at run time so again so far as I know every in-memory database makes this choice so now once we've loaded our checkpoint now we're going to replay our log and then the difference though of what we're going to do here versus what Aries did is that we only have to make a single pass to the log and we're actually going to go in reverse order we're going to start from the end of the log and go backwards in time where in Aries and when you do the redo you start back in time and you go forward in time so what will happen is our system starts up and we're going to go check our persistent epoch file to see what's the highest persistent epoch that was successfully flushed by all our devices and again any log record that comes after this is ignored because it didn't actually truly commit so then we'll go now at the tail of the log and we'll go back and the log record will apply that change to the database again they're doing physical logging they're doing value logging so the exact value or the data that you want to put for a single tuple will be exactly in the log record you just do a straight copy into the database we don't have to re-execute any query now what will happen is because we're going in reverse order the thread needs to check to see whether the tuple art exists so if it doesn't exist then you're going to go ahead and create it because that's the same thing as inserting then if it does exist then you need to check to see whether the transaction ID of the log record that last modified it is newer than the log record I'm trying to apply to it so what I mean by that is it's because we're going in reverse order that means that whatever I see at the bottom for a tuple is what the last value it should be and I can ignore anything that comes after it again if you go in forward direction then you would if I update tuple 1 I do my first update and then later on in the log file I would do another update because I need to make sure that's the newer value but because they go in reverse order they know that whatever the first entry they see as they go back in time should be the recent version of the tuple so this also speeds up recovery as well if you can't do this in Aries you have to apply all the changes and then you go back and undo the ones that shouldn't be there in their case because they know that anything they read after this comes before this persistent epoch is successfully committed then they know they just grab the recent version so this is another example of what you can do to speed things up if you're in an in memory database that you can't do in a disk based database system alright so let's get in real high level what recovery looks like then we have our special persistent epoch thread it's always going to read in the persistent epoch file what the current one is and then we're going to have a bunch of these recovery threads because we can't start processing transactions because we're still doing recovery and it's going to tell them here's the persistent epoch you should follow they all then first load in the last checkpoint you took restore the state of the database and then they look at the log files and go in reverse order and apply all the changes and then once all these guys are done you know all these recovery threads and then you start up the logger thread and the worker threads and you start processing transactions like normal alright so and this is the point that he brought up before and I'll sort of say this again we don't need log sequence numbers because the transaction IDs that we're generating at runtime is going to be enough for us to guarantee that we can execute things in serial order because remember Sylo is doing serializable OCC so these transaction IDs are going to be guaranteed for us that we know the correct serial order so as long as we put the database back in the same state based on these transaction IDs we're going to be consistent from crash to crash to restart from restart and then we don't need these separate log sequence numbers okay yes so this question is why not use transaction IDs in the disk oriented database in the case of Aries Aries assumes that it does not assume that you're running a serializable order so that wouldn't guarantee that where in this case here because they're serializable you can do that your statement is transaction ID not guarantee that things are serializable order in a disk based database system that's actually incorrect to you so this question is why can't you just use the transaction ID in a disk oriented system instead of having to use log sequence numbers let me get back to you on that I know this answer I just can't figure it out okay that's a good point though okay so we're running out of time so I want to go through this quickly so quick evaluation they're going to compare SILO against itself running YSSB and TVCC benchmarks and they're going to run this on a four socket machine at MIT it's a few years old now but that's okay but it has a quarter terabyte of RAM now this one important thing you may have overlooked when you read the paper is that they're running this system we're using three fusion IO cards again this paper I think is four or five years ago so these are not as state as they used to be but at the time these are pretty high end hardware each of these drives is like $5,000 so they're basically using you know $15,000 of of storage devices to make the disk latency of doing S things go away and because the box only has three PCI express slots so you can have three cards and then they have to use a RAID 5 disk array to sort of match what these guys can do here now fusion IO cards I don't think they're as exotic as they used to be there's a bunch of different manufacturers that will sell you these sort of PCI express SSDs as well so the first example I want to show is the difference between silo running without any logging whatsoever and then silo with only logging and silo with checkpointing so the way silo does checkpointing it basically takes checkpoints all the time it finishes the checkpoint and waits 10 seconds and then it starts another one up so these sort of gray bars here are when it's taking a checkpoint and then there's a small pause where it waits a little bit before it starts taking the next one and so the main thing to point out here is that when you only when you don't have any logging you're running about 10 million almost 11 million transactions a second but when you enable logging you're paying about a roughly about a 10% which is actually really good because normally if you have a slower disk device the flushing is always going to be the major bottleneck and now if you just account for the CPUs as I showed in the pie chart beginning that was 28% of your overhead but now here they're showing you that you're only paying a 10% overhead to do complete logging and checkpoint and that includes flushing out the disk so that's pretty significant that's a big deal and then they have numbers for doing TPCC and again here we're doing logging and checkpoint here's only with logging and then you have without any recovery and the latency of when you're doing checkpoints is slightly higher and you see in some cases there's little blips and performance here and this could be due to like the SSD doing your garbage collection or reorganizing the actual the flash shells so again here they're doing almost 600,000 transactions a second on pure memory database and then it's only paying about 10% overhead when you're doing logging and checkpoints so again we'll talk more about how you do checkpoints next class the frequency and how often you checkpoints depends on how fast you want the system to recover so I want to skip this real quickly but the main thing to show here is that they can recover 200 gigabyte database from a checkpoint in a log in roughly 200 seconds and that's pretty significant you're not going to be able to do that easily you're not going to get these kind of speeds using my SQL Oracle they're definitely much, much, much slower alright so so now I want to jump real quickly and talk about logical logging in VoltDB so one observation we can make is that that the proponents of logical logging will say well database failures in a modern setting are actually kind of rare if you're running a system that needs an emery database and you need to be really fast well then chances are you probably have money to have a bunch of backups a bunch of replicas so therefore you want to use a logging protocol that can be optimized for writing things out to a replica it doesn't slow you down at runtime so that way if there's ever a crash you're almost never going to have to replay from the log you just fail over to the replica so therefore it's better to optimize the runtime operations instead of the failure cases so command logging in VoltDB is a variant of logical logging where the database system is only going to record the transaction invocation so like silo VoltDB requires all transactions to be executed as predefined store procedures so the only thing we need to log now is just the name of the store procedure the input parameters to the function and then some additional safety checks to make sure things like if someone comes and updates the store procedure I ran my transaction it gets logged last week then some comes updates the store procedure if I need to replay the log and restore that previous transaction invocation I want to make sure I use the old version of the store procedure not the new one because again you always want to make sure that you always go back to the same state so another way to think about command logging is essentially transaction logging your logging transaction is not on an individual query basis you're logging them at a high level store procedure invocation so this is only going to work if we have what is called deterministic concurrency control so we talked about this when we talked about hshort's concurrency protocol before but I didn't really get the details of like what makes it actually special or different than these other protocols and so the unique thing about is that it can guarantee that if you execute queries if you execute transactions on the same database state one day you'll get the same result using the same database state on the next day in the MVCC in these other protocols there's like a time stamp or a race condition aspect of it where like one thread might come in for another one and that's still correct because it still may be generating a serializable schedule but it may not be the same from one day to the next or from running one horror to the next so the only way that this actually works is that if we require all the logic in our transactions to be deterministic so what do I mean by that let's say we have a single database with a single record A equals 100 and I have three transactions and they're going to execute one after another in serial order so in this one it's doing A equals A plus 1, A equals A times 3 A equals A minus 5 these are all deterministic meaning no matter how often I execute these transactions on this database state I'm always going to end up with the exact same answer today I get 298 tomorrow I'm going to get 298 a non deterministic transaction would be if I replace the A equals A times 3 do you need to be A equals A times now like the current time stamp then what happens if I execute today I'm going to get a different answer than I had tomorrow or yesterday so this is non deterministic so this is we can't do this so in the case of VoltDB for example they have a bunch of checks to make sure that you can't use time stamps that are sent from the operating system it tries to prevent you from using random number generators they don't allow you to make calls to the outside world let's say that your transaction ran for a little bit then it made a RPC call to some fraud detection system or something else that could be bad because you run that today it may give you a different answer tomorrow so everything has to be encapsulated inside the transaction logic and be deterministic alright so I didn't really talk about the architecture of VoltDB at the beginning we talked about the H-Store protocol but I'll say also front 2 like full disclosure when I was in grad school I was working on the team we helped build this thing called H-Store and then they later commercialized it and became VoltDB so in some ways I'm biased by VoltDB because I helped build it but I'll be upfront about where I think the problems are because it's not it's good at some things and not so good at others so the basic architecture is that we're going to split our database up into disjoint subsets called partitions that are in memory and this is the same thing as the partition silo that we saw before in the silo curcurtial paper and then each of these partitions are going to be assigned a single threaded execution engine for a single core that has exclusive access to the data at this partition and the way the curcurtial protocol works in H-Store is that it uses time stamp ordering to assign time stamps to transactions as they enter the system then when your time stamp is the smallest time stamp of any transaction in the system then you're given access to the lock and then you can start running so it requires you to have all the locks for all the partitions you need before you're allowed to start running anywhere you have to wait to acquire everything first and then you can go and just like silo again it's going to be using a store procedure API so you're going to pass in the name of the store procedure and its input parameters where you have a bunch of predefined, prepared statements then you have a run clause where you pass in the input parameters and then these guys then make invocations to those queries and again this logic has to be deterministic you can't have random number generators inside of this everything has to be directly based on the state of the database so then my transaction starts running and I'm going to go ahead and write out into the command log the transaction ID I was assigned the name of the store procedure I invoked and then once this is I write this out before I start running and once I know it's flush then I can go and when I finish I send my result back to the application and then in the background we'll take additional snapshots so in this case here in command logging you're writing out to the command log before the transaction actually starts running and that's different than what we saw in silo in areas because you do it either while the transaction runs or when it finishes so the reason why we can do this before it starts running because we're assigning the transaction ID to the transaction before it starts and we want to get it into the log so that we can document that serial order so that when we crash we know what that ordering was and we don't worry about is there some transaction that could have been running while we crash that may have updated something so the tricky thing though is that if you have to touch multiple partitions then wherever the transaction's home is, wherever the store procedure is running it's considered the base partition and that's the only place where you actually store this log record so unlike in silo where if you touched multiple data managed by different sockets each of those socket logger threads had to write an entry about what you modified to its own log device in the case of command logging the entry for the transaction only appears in one location and that's this base partition so then when you recover the last checkpoint you took from disk and again we'll talk about how to do that on Tuesday and then you're going to just re-execute the transactions in the order that they're written out to the log since the last checkpoint and essentially you can be doing the same thing you would do at runtime when you normally execute these transactions and this is an example where logical logging is slower for recovery because if my transaction executed 100 queries when it ran the first time I have to execute those 100 queries over again and essentially replaying the store procedure the other downside is because I'm logging I'm writing the log record out before transaction starts running I don't know whether it's actually going to abort or not later on so unless you store some extra metadata which in the H store we didn't I don't know VoltDB actually does this now unless you store metadata later on that says oh yeah that transaction you know 1, 2, 3 that you had earlier it's going to abort so you can skip it again then you're essentially going to have to re-execute it all over again and then abort and roll back because there's no extra information in the log that says that this transaction actually never finished you just see the entry say go ahead and invoke this so the one big advantage that VoltDB command logging has over all the other protocols is that it makes it really easy to do replication and strongly consistent replication so again we're not going to talk about multi distributed settings in this class but I just want to show you one key idea here that is actually really interesting that you can only do with command logging so say that we have our application it sends the transaction request to the master node and again it says the store procedure and the input parameters the master will write that out to the command log but it also forward the same information out to its replica to the slave and now VoltDB uses what's called active active active active replication and that means that the transaction invocation is going to be executing at the exact same time on these two different nodes and then when this guy finishes it just sends back the it just sends back an okay message to this master to say yes I completed this transaction you asked me to do it came in the right order and here's the result that I got and as long as that response matches with what the replica the master saw then it knows that everything is consistent and then go ahead and immediately send a response back to the transaction and say that I finished and you don't need to do two phase commit to say did you really get this right answer if yes and then actually need to commit all you need is one round trip to say execute this okay I did it and you know that everything is consistent because your transactions are deterministic meaning if you execute it here and you execute it here in the exact same order you're going to get the exact same answer right so you can't do this with physical logging because what would happen is physical logging replication is called active passive and then what would happen is the transaction runs on the master and then it sends the log update records over to the replicas and then they apply them and then you have to do two phase commit to make sure everyone got all the log records they expected to get right so this is one of the big advantages of that command logging gets you and it only works if you have deterministic recurrence scheme using the sort of the partition based programming protocol that we saw in h-store alright so what are the downsides of this the downsides are if you have a transaction that has to touch multiple nodes again I don't want to get too much details about distributed databases but if you have a transaction that touches multiple nodes then if one of those nodes goes down and all its replicas go down you basically have to crash the entire system and restart it because when you replay the command log you don't know how to say to the other nodes you don't actually do this update when I recover alright so what I mean by that so let's say that we have a simple transaction like this that does a read on partition 2 and then whether the value is true it will update partition 2 if the value is false it will update partition 3 alright so say my transaction is running here now if this if this node goes down and I come back these guys may be in a state farther in the future after I exude this transaction they continue to exude other transactions and updated the state and then this guy crashes and now you come back and this value may be different so you may be ending up updating whereas before you may be updated 2 but when you recover you updated 3 right and that would be bad because you're updating some kind of counter here so in command logging if this one partition goes down and you don't have any other replicas you have to knock everything down and then everyone comes back and you reload from your checkpoint so under physical logging you don't have this problem because you know how to restore just this guy on all its changes without having to notify these other ones whereas in command logging you have to keep everyone in sync right so again both people would say you can avoid this by having multiple replicas of of each partition in each node so that's sort of clear alright so what are the main takeaways from this so physical logging is used almost everywhere and it's really nice because it's general purpose it works pretty much every single concurrential scheme and if you use some of the techniques that the silo guys did it actually is not that it does not become a major sort of bottleneck to prepare these log messages and write them out but as we saw when we looked at that table real quickly the log is usually going to be much much bigger than the database table because you're basically restoring every single update logical logging is going to be faster at run time but it's not universal it only works if you have a certain kind of protocol that can support it as I said I think it's an open research question about whether you can do logical logging at a sort of sequel level in an in memory database system and still guarantee correctness and consistent restarts but that's again that's a research question that I don't know the answer to so any questions about logging they mentioned that their logical scheme would actually bear out that this is being a little over a year so do commercial in memory databases actually use it because so her question is in the silo paper they were writing out log records pretty fast so you look at this I forget how long they were running this for but your log is 180 gigabytes they were basically writing this thing as fast as possible and at this rate you would burn out the SSDs and these are really high end SSDs in a year so her question is given that you're going to burn out SSDs do in memory databases actually use SSDs for logging absolutely the speed difference you can get from SSD versus HDD is so much faster that people are willing to pay the cost of having to replace them every so often to get the better performance over a spinning disk hard drive and we'll talk about non-volta memory later in the semester I don't have exact numbers but those reported have longer wear down rates or longer wear out than SSDs so you can get the same kind of throughput you can get with this but you won't burn them out as quickly there's also different there's different classes of SSDs you can buy the high end enterprise ones that have the support long wear downs and there's the cheaper consumer grade ones you can burn those out pretty quickly and if you're running an memory database that needs to be super fast you'll pay the money to get a real nice device it'll last you longer any other questions as I alluded as I talked about most in this class for next class we're going to talk about doing checkpoints so we're talking about different ways you can sort of traverse the database and decide what data to write out during a checkpoint we're also going to talk about a unique idea or a nifty idea that I like from the Facebook scuba system that allows you to do fast restarts without having to load everything out from a check point any single time you need to refresh the system okay any questions alright so again keep plucking away at your your skip list and then we'll open up AutoLab with either today or tomorrow we'll let you guys start submitting a testing awesome guys see you on Tuesday next week enjoy the weather