 All right, so this lecture is the sort of second half of the lecture we're doing on this week on recovery in our database system. So last class was everything about logging, how do you write all of the changes to that transaction we're making at runtime. And then now today's class is about taking checkpoints. And we need to do this because we don't want to process the entire log whenever we turn the system back on. So this is allowed to take a snapshot and recover from that. So real quick, before we get started, there's sort of three things coming up for you guys in the course. As I said, last class on Wednesday in class next week, we will have the midterm exam. I sent out the announcement on Piazza about what's expected. And then I'll provide you guys with the study guide this weekend. The second thing is that project two is due on the Monday during spring break at midnight. As at least from yesterday, it didn't look like anybody actually submitted anything on AutoLab. So I advise you to do that or just don't wait until the midterm and then do it. That's not going to have a good time. Do not disappear in spring break without doing the project. The first year to help the scores, somebody did that because their girlfriend broke up with them and wanted to go to spring break in Miami. They failed the class. Don't do that. OK. And then the other thing is that I'll post the project page maybe this weekend, early next week. But you guys are going to have to do your proposal for project three the first Monday after spring break. So on Monday's class next week, I'll spend some time at the end talking about the different project topics you could explore. Some of you already started your projects by working with their group. And I've already sort of thrown out some other ideas, but I'll go through in sort of a bit more detail about here's the things you could work on. And if you go back to the 2017 to the 2016 versions of the course website, there's a project showcase link. You can see the type of projects that other students have done. OK. So for this, I'll post on Piazza a link to the spreadsheet. Again, I'll make a spreadsheet on the Google Doc. This says, here's how to sign up for your group and what your project topic is. If I haven't talked to you already, you might want to send me an email and say, hey, look, I'm thinking about doing this. Is this going to be OK? Or something like that. Unless it's one of the ones I've given you ahead of time, you may want to check with me first before we propose something. And then on this Monday, everyone, I think, will have to spend five, six minutes, come up here, and give the pitch about what it is actually going to do. And these proposers should not just be like, hey, I think this is kind of cool. I think I want to do it. You're going to actually want to look in the code. Maybe I don't understand everything, but I have a rough idea of, here's the files. Here's where I think I'm going to have to make my changes. Or here's some feature I'm going to need that doesn't exist yet. Any questions about any of these things? Blank stairs. OK. Cool. All right, so let's start getting in the material for today. So today we're going to talk about checkpoints. And again, the idea here is that we want to take a snapshot of what's in memory and write it out at some location. It may not actually have to be disk. And then that way, if we restart our database, we can load that back in instead of having to replay the entire log. So we'll spend most of our time talking about doing checkpoints for recovery. This was in the paper you guys read. And this is what most people think of when they think of checkpoints. But then I want to spend a little time talking at the end about a technique from Facebook, which I think is really clever, about using shared memory to write your checkpoints so that you can restart the server very quickly. OK. So as we said last class, the database system logging scheme is necessary because we want to be able to recover the database system after a crash. And to do this, unless we have checkpoints, we're basically going to have to replay the entire log. And as we said before, depending on what logging scheme you're using, physical logging versus logical logging, this may actually take a long time. So if you have a one-year log and took you one year to process that log, when you turn the system back on, you're going to have to take another year to put the database back into the correct state. And that's not good because your system is essentially useless. So what checkpoints are going to allow us to do is basically the system can say, I know I have a checkpoint that's on disk at this point in time. And therefore, I don't have to replay anything in the log, modulo, maybe a few transactions, but I don't have to replay anything in the log that came before this checkpoint. Anything afterwards, I want to replay because that was what the database was doing right up to the point I restarted. But the checkpoint basically allows me to truncate the log, and I can ignore everything before then. And that's going to dramatically speed up our recovery process. So for an MRE checkpoint at a high level, it's essentially the same thing that you would do in a disk-oriented database system, where you're basically, again, taking the contents of memory and writing it out to disk, some set of files on disk. In a disk-oriented system, what they're going to do for checkpoints is they're essentially just writing out any dirty pages in the buffer pool out the disk to the heap file for that database. But now for a memory database, we don't have a heap file that we're maintaining. We don't have a buffer pool. We don't have any notion of dirty pages. So we're essentially going to write out all the data, all our blocks of tuples. We'll see when we talk about delta checkpoints, this may not always be the case. But in general, that's what you do. Basically, you have the contents of the data in some memory, and then I'm going to plop it down to a file on disk. And how you're actually going to implement this is going to be actually really tightly coupled to what concurrently school scheme you're using. Or maybe I'm going to put it a different way. If you're doing multi-versioning already to do multi-version concurrently control, whether it's timestamp ordering or 2K is locking, then you can rely on that in some ways to make your checkpoints go really, really fast. So just use that to take your checkpoint. If you're doing in-place updates, like in the H-DRAW or a volt to B scheme we talked about last time, then you're going to have to do some extra stuff. And essentially the way a checkpoint works is that you just have some number of threads in your system that when the checkpoint starts, it starts scanning all the tables and preparing these blocks of the checkpoint data and then writing them out to the disk. Now, unlike in logging, where if the transaction committed, we have to do an F-sync, we care about having everything be synchronous, we have to do an F-sync and make sure our buffers get flushed and written to disk for real before we can send the acknowledgement back to the application that we successfully committed. In the case of the checkpoint threads, we don't have to do this. We can just tell it to do a flush and let the operating system decide or the disk controller decide when to actually schedule these out because if we crash in the middle of a checkpoint, the checkpoint's invalid anyway, made the very last right, you want to do F-sync, make sure our checkpoint is truly written out, but as we're doing this, we don't have to make sure we're doing F-sync every single time. So this makes it a little bit easier for the operating system to schedule when they actually want to do writes actual to the physical hardware. So for the paper you guys read and this other paper from the guys at Yale in 2016, they came up with some sort of edicts or properties that they want any checkpoint implementation or checkpoint scheme to have for an in-depth database. And in some way these are kind of obvious, but I want to go through them because when we start talking about the different checkpointing schemes, we can use these to make decisions about whether this was a good idea or not or whether they make the right design choices in their implementation, all right? So the first one is that while we're taking our checkpoints, we don't want it to slow down the regular transaction processing. In some ways this is going to be unavoidable, in the case of Silo R when they were taking checkpoints with TPCC, that's a real insert heavy workload and so they saw about a 10 to 50% drop in performance while you're taking a checkpoint. That's usually the range you want to be in, 10 to 15%, if it was like 50%, all of a sudden my machine performance cut in half because I take a checkpoint, that's bad, so we don't want to do that. And related to this, we also don't want to introduce any unacceptable increases in our transaction latency or query latency while we're taking a checkpoint and this will come out when we start talking about the wait-free or components or the wait-free checkpointing schemes that can avoid having to do long pauses that block all threads while we're taking the checkpoint. And the last one is obviously, it was always important because we're an in-memory database environment, we want to avoid excessive memory overhead, meaning ideally we don't want to have to make multiple copies of the database while we take our checkpoint. Now, in the ping-pong thing, a ping-pong protocol that you guys read about, they're making three copies of the database and my opinion that's unacceptable and nobody actually implements that. In the worst case of Zig-Zag, they take two copies but for the other schemes we'll see it'll be much, much less, right? We're already, again, in multi-version concurrent control, we're already making multiple versions anyway so the nice thing I got, again, at MVCC is we're gonna pay this penalty to make new versions but we can just use those versions to take our checkpoints. All right, so there's gonna be three things, three different sort of categories or yeah, I guess categories of checkpointing schemes that we can talk about. So the first one is whether it's doing a consistent checkpoint or a fuzzy checkpoint. So the paper you guys read, all those checkpointing protocols are what are called consistent checkpoints and what this means is that it's just like under Snapshot Isolation when we talked about timestamp ordering before, the checkpoint file that's been written out the disk is a consistent snapshot of the database at some point in time. Meaning in our checkpoint, we will not see any changes from transactions that were actively running and made modifications to the database but had not committed yet while we were taking the checkpoint, right? And so the, again, when you think of the context of Snapshot Isolation, think of it as being the exact same thing but now we're running out the disk. And the advantage of this is that when we turn the system back on, right, and we load this checkpoint in, we actually don't have to do any extra work to figure out what are the transactions that may have not committed and therefore we should roll back their changes because they're stored in our checkpoint, right? Now contrast this with fuzzy checkpoints where you do have this problem or you do have this issue. So fuzzy checkpoints is the most common checkpointing scheme used in disk-based systems. This is what Aries does that we talked about last class. The idea, again, is when we start a checkpoint, we don't pause or stop any other transactions. We let them continue running but then we may end up having updates from uncommitted transactions reflected in the Page of Trailer. And furthermore, you actually may see for a transaction that maybe does later commit, we only see half its changes in our checkpoints and the other half are in the log and the other half are actually everything's in the log. Half of them made it to the checkpoint, half of them didn't so we have to resolve that. So that's why in the case of Aries, they have to do multiple passes over the log because they have to deal with the fact that they have a fuzzy checkpoint. And so what they end up doing to make this work is you have to record some extra information in the log of when the checkpoint starts and when the checkpoint finishes and if you're a disk-based system you keep track of what are the dirty pages, what are my actively running transactions and you can use that as a guide to figure out what was going on in the system, why are we taking the checkpoint. In the case of a consistent checkpoint, you don't have to do this. You say at this point in time I'm taking my checkpoint and I don't see anything that came for anything that was running at the time I was doing this. Yes? Is there any doubt consistent checkpoints always required for like consistent checkpoints? Your question is, does this mean consistent checkpoints are always required for what? For the intersections. His question is, does this mean for consistent checkpoints do I always have to pause transactions? No. So again, if you're doing multi-versioning think of the checkpoint as just doing a select star query on a table as a read-along transaction. You get your snapshot, you get your timestamp and now you'll be guaranteed you have a consistent snapshot. You don't see anything from uncommitted transactions and you just scan through everything and you write that out the disk. Now what it does do, it does prevent the garbage collector from cleaning things up because if your checkpoint takes an hour to write out you have an open transaction for an hour. It won't block anybody else from reading and writing it just blocks the garbage collector from cleaning things up. So you may have a spike in memory at that point. All right. Okay, yeah so again in the way of the fuzzy checkpoints actually the one thing I want to point out here is that I'm not saying this explicitly but I'm telling you this is how it works. When you take the checkpoint you write a log entry that says I'm taking a checkpoint at this time. In the case of consistent checkpoints if you say it's at this time in the case of fuzzy checkpoints you have the beginning, the start and stop time for the checkpoint. Yes? So is it like associated with my MCCC? Is what associated? Consistent checkpoints. So this question is does consistent checkpoint does this only work if you have MCCC? No, we'll see an example where you don't. All right, so the next question we have to deal with is what do we actually store in our checkpoints? So the most obvious one is that it is a complete checkpoint meaning for every single table and every single tuple it gets written out to my checkpoint. And typically this could be done every the checkpoint will be a single file. If it's a single no database it's often a single file and that file represents the checkpoint. So the advantage of this is that when you crash and come back you just one file you know that you need to read it to put the data back in the load the last checkpoint in. The downside is obvious that if your database is 100 gigabytes and since the last checkpoint you only modified one gigabyte you're gonna write out all 100 gigabytes every single time, all right? The alternative is to do a double checkpoint which essentially solves the problem I just said where instead of writing out the entire database in your checkpoint you just write out the things that actually got changed and you can do like a bitmap and keep track of you know here's the things that actually got modified and only write those things out. So I would say for the in-memory databases that we'll talk about the only one that actually does this is Microsoft Hecaton in SQL Server their in-memory database engine. Everybody pretty much does this. Part of the reason I think is because from a human standpoint like from a person that actually has to manage the software it gives people sort of a sense of satisfaction or comfort knowing that like here's my single checkpoint file that I can be guaranteed that this is my entire database whereas if you have deltas then you got to go back and figure out like all right well make sure I replay all my checkpoints and make sure I get everything. So an example I said before where if I have 100 gigabyte database if I first load that 100 gigabyte data it's in and then I only make modifications to one gigabyte over time I need to make sure that when I load all my checkpoints back in I go back to the very first checkpoint and bring that in because otherwise my data could go missing. And so in the case of Hecaton they do some extra work to merge these things so that you avoid having to go back every time but for again a lot of people like the idea of having a single checkpoint file that you could then ship around or save in a single location and make sure that everything is always persistent. But there's obvious performance reasons why you'd want to do this. Now a research question would be can you actually combine these things and I don't know the answer. I mean I'm assuming you could but is it actually feasible. Like could you take a complete checkpoint every 10 minutes and then a Delta checkpoint every minute. Yes. For a Delta checkpoint you're actually like update those in place or you just add like a Delta record somewhere. So the question is for the Delta checkpoint do you update the two points in place or do you add a Delta record there. Yeah so let me be clear here. Every time I take a new checkpoint I write a new file, right. So in this case here in my Delta checkpoint like I'll have the one for now and then for 10 minutes later that'll be another directory and another set of files to take my checkpoint there. And then depending on what the system is actually how you configure it it can know that like all right well I don't need to keep my checkpoints from five days ago so I can start pruning those. And in the case of Hecaton they can be clever about merging things so that like two 10 minute checkpoints or an afternoon combines it to a single file. And that ease of manageability and also makes it faster to recover. Yes. So can we think as the Delta checkpoint is just a batch of logs? Yes, again so in I guess I should have made a whole slide about how heck are none of those. But they basically they have a log file. Everyone always has a log file. Then their checkpoints are actually two files. One is like another log file that says here's all the versions I've inserted and then another log file says here's all the versions I've deleted and then another log file says here's all the versions I've deleted and then another log file says here's all the versions I've deleted and they know how to coalesce them and reconcile them to then put the data that's back in the correct state. Right, you can sort of think it as like a condensed log. Right, yeah. And if you have a separate machine that can take the Delta checkpoint and convert it into a company checkpoint. Yeah, so his statement is his question is can you have a replica another machine feed in not just the Delta, not just the checkpoints but actually the log itself can you have a replay the log and reconstruct the database. Yes, this is essentially this is another topic I was like I should do replication but like I don't think I could do a whole lecture one. I guess I probably could but essentially how replication mostly works in these systems is it just streams out the right ahead log and then the other machine the replicas are essentially always in recovery mode and there's always replaying and then you have to do a little extra work to know that alright well if I committed my transaction and I wrote these log records not only do I have to make sure that it's flushed to disk but I want to make sure that it's flushed to my replica and I got a response and there's different ways you can set different things like how much does it have to really be installed in the machine or do you just as long as they get the packets is that enough. There's different settings you can do with that. Okay. Alright, so the last issue we have to deal with is how often we should take checkpoints. So this should be sort of obvious, right? If we take a checkpoint too often then this is going to cause our runtime performance to degrade and think it's a hackathon or not hackathon, sorry. In silo, they are basically taking checkpoints all the time and then when the checkpoint ended they wait 10 seconds and then fire up another one. But again if it's all in the case of TPCC there was a 10% overhead while taking a checkpoint. So the alternative is you don't wait longer but waiting too long is a bad idea because now if you only take a checkpoint every day if you crash then it's going to take a long time for you to recover. Now as I said last class or I said just now typically if you're an in-memory system if you're running an in-memory database and you're running like a million transactions a second that's a serious OTP application. So you probably have money, right? Because you know why else would you be running transactions? So you can afford to have replicas you can afford to stream out your redhead log and replay them and fail over so that fail over to the replica whenever the master crashes so that you don't have to reload the entire checkpoint and replay the log. But still you know the whole data center could go down and you won't have to do this and you want to cut down your minimize your downtime. So there's no magic rule I can tell you say hey set your thing to do this, right? It depends on what kind of performance guarantees you want and what the how much time you have for being down, right? If you're paranoid you can have zero downtime then you're probably willing to pay a performance penalty but taking checkpoints more often. But if you're like alright well we don't fail that often and if it does whatever it's Saturday night nobody's using the website. So in that case I'm okay with taking checkpoints less frequently, right? So in terms of like when you actually want to take a checkpoint or what is the mechanism to trigger a checkpoint there's essentially two approaches that you can use at runtime to figure this out. And the last one I'll say on shutdown this is pretty much everyone does if you send a shutdown command to the database server this system says alright I'm going down let me take a checkpoint write all my contents of memory out the disk and then I can actually finish, right? You don't want to do a kill-9 because that's going to ruin ruin your day, right? So this is sort of a clean way the clean way to shutdown is just take the checkpoint and then shutdown. Alright so the two approaches of time-based or log file size-based. So time-based is obvious, right? You basically say I take a checkpoint every 5 minutes, every 10 minutes whatever you set it to be. And then the log file size basically says when I've written this amount of data out to my log file like my write ahead log then I'll go ahead and take a checkpoint. So for a mem sql for example if I write out 250 megabytes to my log that will trigger a checkpoint. Alright and the next time I do this I take another checkpoint. So the this table here is a sort of summary that I came up with about the major in-memory database systems that are out there that support checkpoints and how they actually do it. So the first thing I also point out is that pretty much everybody except for Hecaton as I said takes complete checkpoints, right? There'll be some file or set of files that will be a complete copy of the entire contents of the database. In the case of Hecaton as I said to him earlier it's going to be like the here's all the versions I've just inserted and here's all the versions I deleted. It's like two log files and then they can combine them together. And then in terms of like when they actually take a checkpoint you see that it's all the systems do different things, right? In the case of times 10 there's also a consistent checkpoint that blocks all transactions when you shut down the system but then during that regular runtime they'll do one that's time-based and that'll actually be a fuzzy checkpoint. Both of these are time-based MemSQL is log-based HANA is log-based and HANA is time-based. So in terms also now of the type of checkpoints again all the base is going to be doing fuzzy checkpoints and this is sort of expected to be talking same thing for times 10 but you would think that as I've been saying so far that like MVCC makes it really easy to do consistent checkpoints because it's just snapshot isolation so what's interesting to point out is HANA is an MVCC system and they do fuzzy checkpoints and then VoltDB is not a MVCC system and they do consistent checkpoints so you still can do consistent checkpoints even though you don't do MVCC if VoltDB is going to do this we'll see in a second they actually basically switch into a multi-version mode only when you do a checkpoint but it's not actually multi-version it's only two versions but the basic idea is the same. Peloton does not do checkpoints it's a work in progress Hyper as of this morning they told me they don't do checkpoints because I couldn't find any paper that describes this so you need to have this in a commercial system for an academic system you can get by without it but we like to have this like sending a command explicitly oh sorry a manual so I looked in the documentation for AltaBase like this weekend I can't tell what the hell they're actually doing there's always going to be some command that you can give like in a sequel terminal says take a checkpoint HANA calls it save point other systems call it checkpoint basically you're telling them take a checkpoint now so the AltaBase manual describes that here's how to do this as a DBA but then there's nothing that says how to tune it to do it every so often so I don't know presumably they've been around for 20, 30 years that they would have something that's automated time basis is the easiest one every five minutes take a checkpoint but I couldn't find anything that says how to do it maybe I just missed it so now I want to go through the different checkpoint approaches you can have so the the first thing I'll say is that all these implementations here are consistent checkpoints in some ways this is easier for us to reason about and only really Microsoft does the DUTL checkpoints for their memory database the second thing I'll say is that nobody actually implements these two as far as I know and it should be pretty obvious when you read the paper like wow this is really complicated right the approach number two is going to be the most common one and the third thing I'll say too is that the paper you guys read it's from sigmod so it's a database conference but they don't use terminology that they use terminology that doesn't exactly match up with everything you said so far in the semester and everything we'll say for the rest of the semester right they talk about having to like checkpoint application state right and that sort of seems kind of you know it's not exactly what we've been talking about but you know the application state is just the in-memory database right and so in their environment they are imagining something where you have application state in memory and it may be part of a larger database that's out on disk and you want to take checkpoints for the in-memory part for our purposes it's just the application state is the in-memory database and so we can imply all these same techniques okay alright so the first one is do naive check or naive snapshots and basically how this is going to work is that we're just going to create like the dumbest implementation of creating a consistent snapshot of the database and we basically for every single thing we want a checkpoint we make a copy of it we block all transactions make a copy of it and then we write that copy out to disk and once we have our copy then we can let you know anybody else can then operate so whether you block the entire system while you do the checkpoint or you do it on a per block basis or maybe run as a transaction that takes exclusive results on the data you're trying to checkpoint it doesn't matter actually that wouldn't work because you have to block everything to make sure that nobody updates anything so there's two approaches to do this you can do it yourself meaning then you block the entire database and you just scan all your tuples and you just write them out write them out to some region of memory and then start writing them out to the checkpoint this is obviously bad because you're blocking everything while you do this so the other approach is actually a pretty clever idea where you let the operating system do everything for you or write out all in your entire checkpoint the issue is going to be though is that when you do it yourself you can be careful about only copying tuple data but when you let the operating system do it the operating system doesn't know what's tuple memory versus what's indexes versus what's internal data structures so it ends up making a copy of everything so this is the approach that actually Hyper did originally so the version of Hyper we've been talking about so far the art indexed off the compilation system the multi-version cardstroll that's sort of the second version of Hyper the original version of Hyper was actually based heavily on the H2OVTB system that we were building in New England and they sort of barred some ideas that we had where you were doing these single-thread execution engines to execute transactions but the problem with the single-threaded engine approach is that it's really faster transactions that are doing small changes but if you have to do an analytical query that has a really large portions of the database then that's going to be really slow because you have to lock all these partitions at every single node at every single core then do your read and then you're done and you're blocking all the transactions from running at the same time so the way the Hyper guys got around it was they would fork the database process and then that would create a new child process that had a copy of the contents of memory from the parent process and they could run their analytical queries on the child process and take a checkpoint of the database and write that as a disk from the child process so when you do a fork in the operating system the child process needs to have an exact copy of the contents of the memory from the parent process but rather than doing that copy immediately when you do the fork they actually do a copy on right so they map the physical pages of virtual memory to be the same from the child process and the parent process so only when now you start one of those processes starts modifying the pages of memory then actually does a copy and make real separate pages so what they did with the clever which I think was really clever they basically figured out if we fork the process now we have a consistent somewhat consistent snapshot of the database there may have been action transactions running when you do the fork so they have to go look in the in-memory on new logs for those transactions which are now technically aborted on the child process roll back their changes and now they have a consistent snapshot and they can then run the analytical queries that they want on the child process or if they have the checkpoint thread to scan through and write everything out so again the nice thing about this is like other than having to do this little extra work of rolling back transactions that were actively running on the parent when you did the fork you don't have to do anything special the operating system does everything for you of course now the downside is that since the parent process is still going to continue processing transactions it's going to start modifying memory and the operating system is going to have to start making copies of those pages and updating them because the child process shouldn't see them and so for tuples the changes that the parent process is going to make is not just for the tuple data it's also the indexes, it's also like networking buffers, everything that you need for your system to actually run is going to start getting updated and the operating system has to start copying it so again I think this is a really nice approach but they eventually switched and got rid of this approach and switched over to use NVCC and manage everything themselves because the operating system the overhead in the operating system was too much so when I first started at CMU I thought this was a clever idea and we actually tried implementing this in our own system, the system I built in grad school the h-door system we had a master student try it out and we weren't doing the checkpoint thing we were just trying to see whether we could do the do the forking and run the analytical process so this is running the tpcc workload on a single machine with eight warehouses and the red line is h-door running without the forks and then the red line is the h-door with the ability to do forks on the child process so one big difference between h-door and hyper is that hyper is written entirely in C++ h-door is a combination of Java and C++ so all the front end was in Java, like for the networking layer the catalog, the transaction processing and then the back end data like the extrusion engine, the storage manager for the system was all in C++ so we'd use J and I to communicate between Java and C++ this is before the off-heap stuff or the cogent stuff with Scala and Spark became in vogue this is like 2007 we did this so when you go read the manual for the JVM they explicitly say do not fork it you have a bad time if you do this and this is because when you fork the only thread that's still alive in the child process is whatever the forking the thread that actually called the fork so in the JVM in a managed memory environment there's all this other crap that runs in the background like the garbage collector and other event threads so those things don't get spawned in the child process so basically you have this JVM process it's like a half-dead zombie because you have the one thread that you forked with and everything else that you need is actually dead right so this is why they say don't do this and the other tricky thing too is like you have to make sure that any lock you hold from a thread there's no thread holding that lock when you go to fork because you're not going to be able to unlock it in the child process if we do it anyway and so what you see here is there's two blips the correspond here and this is when we're running TPCC normally and then we run an analytical query and again so you see in the case of the blue line the performance basically goes to zero because we're not processing any transactions we have to lock the entire cluster or the entire machine at every partition in order to execute this analytical query because that's how H-Short's Concertio protocol worked and you can dip here and here in the case of the for the snapshot you see again you pay this penalty where the parent process is still processing transactions so that's why it doesn't go to zero it's still alive trying to do stuff but now because the transactions are updating things the JVM on the parent process is running the garbage collector so it starts reorganizing the layout memory in the heap the OS starts copying all our pages and after like the analytical query is done it still has a time to take a while to catch up because at this point whatever changes we've made have diverged enough for the garbage collector in the parent process so OS stops copying things but then when you come to the second OLAP query you don't have to pay that forking cost and it's actually able to sustain performance so again in this toy example here this looks okay other than this being really unreliable you wouldn't actually want to run this production but in the case of HyperGuys they saw that this was it was an unacceptable slowdown in their case because because the operating system doesn't know that oh this is something that I don't need to propagate or make a new page over on the other thing on the child process alright so the the next approach which is more common and again this could be either through uh leveraging the fact that we're using MVCC we have multiple versions and we just know how to take a snapshot of the database based on those versions or we could do something manual if we're not on a multi versioning system so basically again it's just multi versioning so anytime we start a checkpoint when a transaction modifies data we're going to create new copies of it ourselves and write to that rather than overwriting the checkpoint thread is trying to read rather than overwriting the tuple of the checkpoint thread is trying to read and again it's basically the same thing as the copy and write snapshot with forking and hyper except here the data system is doing everything itself which is always the better idea so you can have the copies you generate be at different granularities it can be on a per block basis, a per tuple right typically everyone does it on a per tuple basis and then what happens when the checkpoint thread starts running anytime it sees something that was created after it started because the checkpoint thread is essentially going to run as a transaction too so it's going to have a time stamp it's going to have a way to order itself with every other transaction running at the same time it just knows that anything that was created after I started I can ignore it can you go back just like this one okay is that right before the work that was switched to the OLAP and it starts to do the what so it's always running TPCC always it's just that these two time ticks an OLAP query showed up and I have to execute this right before the OLAP query it begins to do the checkpoints no so this is not it's forking the process it's like taking a snapshot or checkpoint so it's when the query shows up then it takes the checkpoint the way the hyper data was if you see you have a read-only query and you can estimate how much data is going to access if it's better to run that on the child process they would then fork one if they didn't have one or they would basically say I already know I have a child process fork and they would just route the query to run over there so that's what we're doing here the first time it shows up we do the fork and that's what you see in the red line this huge drop and then it takes a while for it to get back up the query shows up we know that we have a fork and it's okay for us to go run on it so now the fork process is not going to get any of the updates that the the parent process is still running in an OLAP system that's probably okay in an OLAP system in this case here I'm running this 20 seconds later the fact that I'm running my query and it's reading 20 second old data it's probably okay so up to the middle second accuracy then I'd pay the penalty to have the performance drop okay so what VolTv does, so VolTv again the HR VolTv model is a single version system doing in place updates and so it's really really fast to do transactions that don't have to touch multiple partitions because when you're running at a single thread nobody else nobody else is running at the same time there's nothing, there's no current control you actually have to do while the transaction is running but when now you want to take a checkpoint the system will tell every single execution engine we're now taking a checkpoint so switch into this copy on copy on write mode, a copy on update and then what happens is that any time a tuple transaction updates a tuple, instead of overwriting the existing one it just makes a new version of it right and it doesn't actually have to update it doesn't have to create a version chain like you do in MVCC all the pointers in your index will now point to your new version and you'll never actually be able to see the old version it's just when the checkpoint thread is essentially doing a sequential scan it'll find the old version that it wants and it can ignore the newer versions so in this environment it's essentially a two version system you have the old one, you have the version of the tuple that existed before the checkpoint started and then you have the newer version after the checkpoint started so for a lot of tuples, they may not get modified so you never have to create that second version the checkpoint thread can always find the old version but any time I update something I have to create a new one so the only thing you need to do is just put a little bit flip a bit in the flag in the header of the tuple to say this thing was created after a checkpoint and then when the checkpoint as the checkpoint scan goes along if it finds a tuple that has this bit set to zero it knows it's the thing it should read if it has one that's set to one, then it knows it should ignore it then flip it to zero and keep scanning because there's only one checkpoint thread running doing a scan on one particular tuple at a time and it cleans up the old version as it goes along so this is really a simple approach that solves the problem of having to lock the entire system while we take a checkpoint I think this is actually in both the environment this is the right way to do this okay so what are some of the issues we have with these approaches well, the transactions in the case of these snapshots we may have to wait for the checkpoint the checkpoint thread to complete before we can actually start running like the nice checkpoints is literally just lock the entire table copy it out and then write it out whatever else to keep running after you're done so that's bad and then in the case of we're doing copy and update with depending on the current scheme we're doing, we may have to acquire latches in order to read something but the checkpoint thread is going to acquire those latches too so we would get blocked on that alright so this is the problem they're trying to solve in the paper you guys read with these two different approaches wait-free zigzag and wait-free ping-pong so the first thing to point out is that when I say wait-free I don't mean lock-free or latch-free it really just means that the worker threads are never going to have to wait for the checkpoint thread to start to finish whenever they want to update the database there'll be some pauses we'll see in some of these cases where we have to update some metadata about the copy of the database we're trying to deal with we're trying to flip from one copy to the next but at runtime as we're taking the checkpoint we're never blocked by anybody so in the case of wait-free zigzag what they're going to do is they're going to make a trade-off that instead of having to have a single copy of the database or a multi-version database they're always going to have explicitly two copies and then what will happen is when a transaction starts updating once updated database we'll have some way to figure out which of these two copies we want to write to and the other copy is essentially what the checkpoint thread is reading and writing out to disk and the way they're going to do this is that they're going to have these bitmaps that basically says when a transaction shows up which version or which copy of the database should it read and write from right? the argument that they make is that yes, having two copies suck because you're spending wasting more space and memory but this is better than having to any copy on updates because you know it's not something you have to keep repeating over and over again you don't have to malloc every single time you have to make a new version you've already malloced a giant block of space at the beginning so let's go through an example like this so say we have two copies of the database and then we have our two bitmaps so for this example I'm just showing you these vectors of single digits for simplicity reasons but again think of these will be tuples like multi-attribute tuples for our purposes it can just be single numbers so the first thing we see is that we have our read bitmap and our write bitmap and the offset in this corresponds to an offset in the fixed length data arrays for the actual tuples so the read bitmap is going to tell us which one we should read from and so if it's zero we read from the first copy if it's one we read from the second copy and the write bitmap is the same thing if it's zero we write to the first one if it's one we write to the second one so say we have now at the very beginning we want to start taking a checkpoint so the checkpoint thread is going to look at the write bitmap and it's going to figure out use this to figure out what it should read from so essentially what it's going to do is invert the bitmap and for every single slot in the bitmap it's going to invert the value so if it's one it goes to zero if it's zero it goes to one and that's going to tell it where it needs to read from so in this case here at the very beginning write bitmap started off as one so our checkpoint thread is going to read from when it does the inversion everything goes to zero so therefore it should take a checkpoint we're just scanning through the first copy so again we do the inversion that tells us where we need to go and then now we start scanning through and for every single element we find our list we start writing that out now let's say at the same time where the checkpoint is running our transaction comes along and wants to do an update so for this I'm ignoring concurrent control entirely assume that there's some higher level scheme or protocol that the database system is running to read or write something so for our purposes here to assume that two-base locking doesn't matter that the database system says it's okay for this thing to read this so the thing we want to do here is we want to make sure that when we do our writes we don't end up writing into the slots, the data elements that the checkpoint thread is reading from because we don't want to see any of our updates because the checkpoint needs to be consistent so we go in here and this transaction wants to update the first people and then the third and the fourth so the write bit map is at the one so we're going to know that we want to do our write in here so we go ahead and apply our update we can commit and do whatever else we need to do and we're fine we also now need to also flip these bits in the read bit map so that if anybody comes along behind us they know that now they want to read the second copy not the first copy so again I'm being hand wavy about the concurrency scheme of this but you can think of this as this transaction has an exclusive lock on these two pools so that means no one can read this until we're actually done with it to release those locks so it's okay that we're doing this in multiple steps so we could do this read here we already have exclusive locks so no one's going to flip this on us we apply our update here and then we can apply the update here and that's all done atomically because there's some higher level protection mechanisms going on alright so the transaction commits and then our checkpoint is done so now you immediately want to switch over and start doing another checkpoint so we just keep doing this over and over again so the same thing as before the checkpoint thread is going to look at the write bit map and that's going to tell it where it wants to read from but before we do that inversion we actually want to apply the updates from the read bit map that's what we have in here and set that to be in the write bit map so the last checkpoint round the transaction updated these three tuples so we set this thing to one so now on our write bit map we want to set this to zero so that in our checkpoint we invert this so now this is telling us we should read from one right because that's where the last update from the previous transaction came from so now our checkpoint thread will basically again scan through this and now you see why it's called zigzag because for some of these elements it's going to get from this one then it's going to go here and back and forth so as it's scanning down it knows how to zigzag back and forth and make sure that we now have it guarantees that it's a consistent snapshot so is this clear okay so say now again we have another transaction that comes along same as before we want to do an update same thing we we use the write bit map to tell us we need a write and then we do our write we're done so in the next checkpoint we'll make sure that we get our updates so the big problem we're going to have with this is that when we do when we're back here before we started the next checkpoint we needed to update the write bit map atomically right because we don't because again we're not we have other transactions that can be running at the same time and they may start flipping around these bits on us so we need to make sure that we have a consistent snapshot in memory of what the correct bitmap should be we have to protect this thing with a latch so there'll be a long pause potentially that no other transactions can actually run even though they may acquire locks for tuples whatever in the higher levels of the current control system we have to lock them out in order to do this update atomically so there'll be this big pause after every checkpoint where we block all transactions in order to update the bitmap so this is the problem that the wait-free ping pong is trying to solve so these guys are going to trade, make a tradeoff that they're going to sacrifice extra memory and CPU overhead to avoid that long pause at the end of a checkpoint and the way they're going to do this is that they're going to basically now three copies of the database so you're going to have two copies and you're going to flip back and forth as the master and the shadow and then you have a third copy called the base copy where you're always storing the latest version of the database and what happens is again the pointer says which one's the current master which one's the shadow and then at the end of the checkpoint you just flip that pointer doing compare and swap and that's the only update you have to do and you don't have to block anybody to make this happen so let's go through an example of this this one's a little bit more complicated but it should be pretty easy to follow so again we have our our base copy and then we have our two copies we're going to flip back and forth and then we down below here we have our pointer for what's considered the master and what's the shadow so at the very beginning we see that the current master is copy one and this is what we're going to use to start and then the shadow copies we're going to take our checkpoint and this is going to have an exact copy of the always in the base copy over here and then these bits are basically going to tell us whether we've modified this during this checkpoint round so if our transaction comes along our checkpoint thread can just scan through this and write that out and no one's going to be modifying this what we're doing this so it's going to be consistent and then now if our transaction comes along so it's updating things and say it wants to update these these three elements it's going to make the change in both the base copy and the current master copy and when it does the change in the master copy it's going to flip that bit to one to say that I've modified this right and if you're doing a read you can always read from the base copy because this is always going to be guaranteed to be the correct version you should read right so now let's say that our our checkpoint thread finishes and so before we switch over to have a new master and a new shadow what we need to do is go through now in our current shadow and we got to flip all these bits to zero to tell them that we're going to get ready for doing writes in here and I set it to zero so that when I do a write I can flip it to one to know that I actually wrote to it now I'm also showing that we're essentially clearing out the contents of memory of what was in here before for our purposes for simplicity reasons this is fine you don't actually have to do this because you're never actually going to read from the from the checkpoint from the master you can always read from the base copy so it doesn't matter that it got flipped to zero or flipped empty yes hearing this it kind of reminds me of garbage collection so his statement is this reminds me of garbage collection so I think what you're trying to say is like I mean it's not really maybe there's it's like yeah but in garbage collection like it's sort of like that it's sort of like the epoch management stuff where you're trying to be clever about what's visible and make sure people don't see things that they shouldn't see and that's part certainly true and the main thing I want you to get out of this is like this is a terrible idea there's just three copies of the database but here's a way to do it if you're not blocking or wait free here's a way to do it but you pay a big penalty because you make three copies like in general so the statement is is checkpointing and garbage collection not overlapping problems so the way to sort of think about is like when you take a checkpoint if you're using MVCC you just turn off garbage collection right at least for anything that while your checkpoint is done you know it should be able to see right whatever about things like if my checkpoint is going to take an hour and there's no other transaction other than the checkpoint that is reading the versions in between the checkpoint one and the latest version could I prune those things out no one does that as far as I know so you're right that there are there is a bit of a symbiotic relationship between the two of them but they're usually treated as sort of separate piece yeah so now again if I flip here the copy two now becomes the new master copy one is the shadow and so my checkpoint thread wants to read this thing and write this thing out but now what's the problem here it's missing data because when this was the old master I only updated three tuples and didn't have the old versions in there so you could say alright well maybe try to get the values I'm missing from the base copy because now someone could have updated this and I would go check here this is still zero so I know that this is the latest version and by the time I go check zero and go copy this into me someone could have updated this right and the only way to protect myself is to set a latch but I said we didn't want to do that because we wanted this thing to be weight free so we can't do that and so what they proposed when I read the paper I was like oh I did not see that coming what they proposed to do the way to fill in these missing values here is that you actually go back to disk for the last checkpoint you did and because it's a consistent snapshot a consistent checkpoint you know it's going to have all the values so you look on disk and that file you just wrote out and go read back in the values that you're missing and then you write that back out to disk it works yeah but again so you have to do this because otherwise you take locks protect things and they didn't want to do that so her question is what about the first checkpoint you have nothing on the disk you're here so you have the base copy and then the shadow copy will have the exact copy of everything alright you know alternatively you could if you want to do delta checkpoints you can just skip these things but again as I said you have to manage that on recovery to put it back to the correct state yes what if these copies are not in memory but replica on different inter cluster so basically open map will be able to balance and shoot it in the right way does that solve the copy problem so his statement is don't think of these copies as being all in memory in the same node think of them as separate nodes in a distributed environment and then the bitmap is stored at a sort of centralized query router or load balancer that can then figure out which one you should actually go to and does it solve what problem does it solve the problem where you have too much to copy because then you do recommend any of this yeah so that's actually a good point so it's like well what if I'm in a distributed environment where I want to have copies anyway can I just use this approach I'd have to think about that certainly I don't think anybody does this does that usually everyone updates sends the app log for application yeah I have to think about that alright so for these different implications the paper brings out what are the actual low level primitives you need to do to really build any checkpoint mechanism in a database system there's another reason I like the paper even though the idea is a bit far fetched I feel a good job of laying out what does it mean to actually build a checkpoint scheme in a database system so the four different primitives we can use are essentially doing bulk state copying as we saw in the case of naive checkpoints again you're just copying data that you need then make sure that it's consistent and then you can write that out you can use low level locks and latches to isolate the checkpointing thread from the worker threads as they modify the database you can do the bitmap schemes as in the zigzag approach you can do bulk updates or bulk resets for the bitmap that you're using to keep track of the dirty regions in the database and then when the checkpoint finishes you reset everything and then take the next checkpoint alright and this allows you to figure out what has actually been modified since the last checkpoint you took and the last one as we saw in the previous example where you just copy a lot more maintain more copies and avoid having to synchronize the writes across those copies until you actually finish the checkpoint so this table here is a sort of summary of all the different approaches we see and so the only thing I'll sort of main thing I'll point out is in case of memory usage for the naive snapshots of copy on updates it says they're 2x but think of this as like 2x in the worst case right in the case of the VOLT-B scheme where I was switching the copy on write mode and then any time I update things I have to make new copy in the worst case scenario if I have 100 tuples and all 100 tuples get modified during the checkpoint then yes, it would be 2x in practice it's not going to be that case question I'm worried about copying the write copying the updates to kind of like to modify the same tuple did you make 2 copies? in the VOLT-B scheme so VOLT-B is single threaded accessing the tuple so if the first transaction updates the tuple and you make a new copy the second transaction doesn't run to the first guy to the first guy commits second guy runs, if he updates the same tuple you just overwrite the second copy right, so there's only over two copies and then the wait-free zigzag and the wait-free ping pong these are definitely 2x overhead, 3x overhead for each of these so in the remaining time I want to then talk about another idea of checkpoints again which I really like is using them to solve a different problem than just recovery so everything so far has been about if I crash, how do I load my checkpoint back in and what's in my checkpoint but maybe the case that we're not just restarting the data system because there was a crash or someone tripped over the power cord and it was like this all the time especially in production environments and you have to do things like update your operating system to put new security patches in if you buy new hardware like you can put more RAM in or upgrade your EC2 instance you have to shut the system down and bring it back up and it's also very common if you want to update the actual software itself you want to, you have to stop it and restart it so the the problem we're going to try to solve here is this last one but how can we deal with the case that we want to update the data system software which again when you shut the system down you have to take a checkpoint how can we have the restart time be really fast by trying to avoid having to go to go to disk and the reason why Facebook wants to solve this problem is because they have this sort of development philosophy it's an agile environment where they're always pushing out updates or new releases for their software services every two to three weeks and so if you have a database system and you're putting out updates every two or three weeks you have to restart your system but now if you're running on a system with a thousand nodes and it takes you two hours to restart a node and load the last checkpoint in then a large portion of your fleet will be down for extended periods of time every two or three weeks and so the way they're going to solve this is that they're going to essentially decouple the contents of the database in memory from the lifetime of the life of the database system process itself so again this should be operating system 101 I malloc a bunch of memory in my process my process ends the operating system takes that memory away if I come back and restart the exact same process the exact same binary the operating system doesn't know it's the same thing it's gone, you have to do it all over again so in their environment again if you have to take the checkpoint to stop the database system you want to turn it back on and when you want that in-memory database to still be there and so the way they're going to solve this problem is through shared memory so what's going to happen is they're going to have the database restored in shared memory and then they stop the process they write a little file that says here's the address location in shared memory then you turn the data system back on it looks in that file looks in the memory address sees it in shared memory see whether the data is still there if so now it has the database that the last process had and that's going to allow you to restart way way faster because everything is already in memory so now the question is how are we actually going to do this so the system they're doing this for is a system called scuba and I'm sure someone will email me and tell me that I'm wrong but as far as I know it's still used in production it's a distributed in-memory database system that they use for analysis of the data that they collect for all their services the idea here is that all their services and web apps that they have they're generating these log files they process the log files into these events and they write them out into scuba so now you're new tracing like for a single HTTP request what all the different layers of the application stack I had to go through and what was the timing for all of these so you can use that to figure out if I put out a new service is it slower today than it was yesterday and that's the problem scuba is trying to solve so as I said in the beginning this is not a course on distributed databases it's probably the only distributed database we actually really talk about I think it's worth mentioning a little bit because it'll get you to understand what they're actually trying to do the problem trying to solve a little bit better so scuba is going to have a heterogeneous architecture it's a shared nothing system it doesn't really mean like with any single cluster for a database instance some nodes will be tasked to do some things and other nodes will be tasked to do other things so some nodes will be called leaf nodes and this is where the data is actually stored in memory and these are in charge of doing the scans and the filters on the raw tables themselves and then they generate results and they push them up into aggregator nodes that will combine these results from multiple leaf nodes and then do whatever aggregation or whatever computation join you want to do on that and then shove up the answer to the client so it sort of looks like this again you have some root aggregator node and then you have middle aggregation nodes and then you have your leaf nodes so if I have a single query that needs to touch all these nodes I'll break it up into four pieces and these guys all push it down the leaf nodes do their scan generate their output shove it up to the aggregation node he then combines it and then moves it on to the next guy this is actually a very standard approach Facebook uses it for a lot of their distributed systems the founder of MemSQL after he spent some time at Microsoft seeing the SQL server stuff then he went to go at Facebook spent a little time there and saw essentially how they do this and this is essentially how MemSQL works as well so again what we're talking about is the leaf nodes down here the aggregation nodes are stateless they have no data so these guys die who cares you just bring them back and they just pick up where they left off so we want to be able to restart these guys very quickly without having to read the checkpoint entirely from disk so there's two approaches you can do this two ways to do this the first is essentially you write your own version of malloc that instead of allocating memory from the private address space heap of your process you malloc or allocate it in shared memory so Facebook owns or bought or hired the guy that wrote GE malloc which is an awesome memory allocator we use that in our system and they had him try to figure out whether this actually would work so the GE malloc guy basically said that this won't work because you have to make major changes to your memory allocator to use shared memory because you have to do things like subdivide memory up for thread safety and scalability there's other issues too where like the in shared memory all the memory you allocate has to be backed by physical pages immediately right so if you malloc in your heap the operating system will say you have that memory but it hasn't actually not mapped, it hasn't mapped that virtual memory to physical memory yet but in shared memory it has to do that immediately right so this would be really slow because every single time you malloc memory if I need two gigs, if I malloc two gigs but I only need 1k now you gotta actually get really get two gigs so what they said was I'll make the blooper real, that's alright but they said instead of doing it on the fly for your shared memory heaps, instead what you're going to do is when the system is going to shut down right again we're doing software updates so we know that we're going to restart the system so it's a it's a cordial process we tell the system hey we're shutting down so when this happens they're going to copy all the contents of the heap for the database out of local memory into shared memory and then when we restart we know how to come back and find that data and bring it back in right sorry so again the administrator says I'm going to do a shutdown and then immediately the system is going to start stop ingesting new information so again while it's actually doing queries for the analytical stuff it's actually taking these streams of new updates coming in so we stop ingesting anything and then now we have a consistent snapshot of the database for us and then we start writing all our data out to the heap essentially just doing scans on our tables and every single time we get a tuple we copy it into shared memory and then delete it from our local memory and then when we're done we restart the system boot back up and check the special file that tells us whether we have a copy of the database in shared memory if so then we just do the process of doing the scan again and copying it from shared memory and putting it back into our regular database there's some extra stuff you have to do to make sure that you're reading compatible data so if from one version to the next the layout of memory changes for your data you don't want the system to boot back up and think it actually has things it can comprehend so you'd want to throw an error or stop the don't recover from the shared memory if you know that it's incompatible version so there's a bunch of extra checks they have to put in place to make sure that they know they're reading valid data so again I really like this because it's like the hybrid example we showed before we're doing the forking snapshots it's sort of relying on another aspect of the operating system or a feature the operating system provides to make our database life easier I'm extremely interested in shared memory trying to use it for other things we talked about doing huge pages before and I said it was a bad idea to run this in your database process with huge pages because not everything needs to be one gigabyte page sizes turns out what you can do with shared memory you can have shared memory have different page sizes as your regular heap memory so maybe we can put some things in the database on huge pages in shared memory it's not really shared because no other process is going to access it another heap space allocated with different page sizes so I think shared memory is actually really cool and it's not something that people actually investigated in memory databases that we could look into later but anyway the main thing here is again it's this basic functionality that the operating system provides and they're using it in a really clever way to solve a problem okay so what are the main takeaways so I think in my opinion and it should be obvious to you and I think I've said this multiple times during today the copy on update checkpoints is the best way to do this and this is pretty much what everyone does and if you're doing multi version concurrency control it's super easy because you just take you treat the checkpoint as a check transaction that has a snapshot in time and you just scan everything and write everything out and the question of whether you want to use delta checkpoints or complete checkpoints I think that's actually an unsolved question I think Microsoft does it a really good way but I'm not sure why nobody else does this like in HANA for example those guys have oodles of money they could have done delta checkpoints like Microsoft but they didn't so I don't know why that's the case and of course at the end I've said shared memory has some use after all alright so any questions for that yes for all those checkpoints do we need like arrays, bytes, docs alright so good point so his question is for all the checkpoints schemes we talked about here we need like multiple passes of the log no unless you're doing fuzzy checkpoints if you have a consistent checkpoint all you need to do is is load the last checkpoint then jump to that place in the log and replay the figure out what transaction we're running at the time you were taking the checkpoint maybe go back a little bit farther and find where they start and then just replay everything after that right to do it the silo way going in reverse order you still load the last checkpoint then you go in reverse order and play all the changes alright and you make sure that anything any transaction that committed committed after the checkpoint but started before the checkpoint you go back as far as you need to go to make sure you get them as well so you don't need to do again in an emery database we don't need to undo if we're doing consistent checkpoints because there's no uncommitted transaction changes written to the checkpoint makes that clear okay yes the shared memory require like locks this question is does shared memory require locks or scope by dispute as a like a storage medium without locks so shared memory is useful for sharing memory as it says across processes so Postgres does this so you have to use locks to prevent other processes from making changes that the other person shouldn't see now in the case of the scuba case scuba system the system is shutting down it's a single process anyway it's just piggybacking the fact that it can restart the database process and have shared memory still stick around so it doesn't do any locks because nobody else could be modifying the shared memory at the same time now you may have multiple threads writing into shared memory regions in the same process and that you have to protect for the latches but we're not coordinating any other transactions in general again if you have multiple processes then you have to put your locks to shared memory if you're a single process writing to shared memory with multiple threads then you just do the regular latching scheme that you did before Facebook's trying to invite it down to that they actually say the indexes or do they rebuild them since Facebook's trying to save on down time do they actually rebuild the indexes or do they actually save them as well actually I don't know I actually don't know whether they even have indexes still a lot of stuff still a lot of stuff so you may actually need it but it's a time series database so often times you want you want index to be able to jump to different points in time they might have something like that I don't know the answer there's no reason they couldn't it's easy any other questions alright cool so next class this will be a new lecture that I haven't given before I'm super interested in networking protocols because we support JDBC some of the students some of you people here in the audience have lost hair girlfriends, boyfriends, whatever trying to get JDBC to work correctly and so spend some time actually understanding what it is so the paper you guys are going to read is from Moen ADB they basically show how JDBC or the wire protocols for databases are inefficient for moving bulk data around but it gets you to understand what these protocols actually do alright any questions alright guys enjoy the warm weather and I'll see you on Monday one court and my thoughts hip hop related ride a rhyme and my pants intoxicated lyrics and quicker with a simple moan liquor to summer city slicker play waves and pick up rhymes I create rotate add a wave too quick to duplicate fill a breeze as I skate mics at Fahrenheit when I hold them real tight then I'm in flight then we ignite blood starts to boil I heat up the party for you let the girl run me and my mic down with oil records still turned the green burn for one man I heat up your brain give it a suntan so just cool let the temperature rise to cool it off with same eyes