 All right, we can get started. So I'm going to pass around now the sample final exam. There'll be two versions of it. The ones with the solutions and ones without the solutions. So if you want to practice, and here, if you want to practice without seeing the answer to do it, take both copies. One's with solutions, one without. So in this practice exam, I'm giving you guys is just two questions. It's sort of showing you at a high level what the final exam will essentially look like. And as you can see, it's not something like we're true or false questions or multiple choice questions based on a particular lecture or a particular reading. The idea here is that you want to synthesize the various topics and things that we talked about through that entire semester and be able to answer new questions or apply the knowledge to new problems. So in the practice one I gave you, it's two questions on the real final exam. It'll be four questions, and you'll roughly have an hour and 20 minutes to complete it. So the idea is roughly about 15 minutes per question. Again, that'll be on Thursday in class next week. The other administrative stuff we have are the second round of the code review will be due next Thursday as well at night and midnight. And the idea here is that I decided that we'll just keep the same groups for everyone from the first code review. And that keeps it, that way you're not reading a whole bunch of new code that you didn't understand all over again. And then the final presentations will be somewhere in Wien Hall on the 9th at 5.30 PM. And it'll be the same sort of setup as we had for the proposals and the project updates. You'll come present on stage the final outcome of your project. And there'll be pizza and t-shirts and prizes for everyone. So any questions about the final exam, any questions about what was expected for the final presentation of the code review? So if you want to make this cleaner, what we can do for the code review, we can close your previous pull requests on GitHub and you can submit a new one. And that way, it's not all dirtied up with the previous review. And it's more obvious what people should be looking at. It's up to you guys to decide what you want to do. OK? All right, so for today's class and the final lecture, we're going to be looking at what's called non-volatile memory. And so unlike all of the lectures that we've done so far in this class, this one is actually looking at hardware that doesn't quite actually exist yet. Or at least doesn't exist to awesome the general public where we can buy and use it. So everything else has been assumed that the hardware we have is actually something you can use. This is looking at more future-looking devices. So we'll start off some background about what non-volatile memory storage looks like. And then we'll cover the paper you guys read that we wrote here at CMU where we evaluated the different sort of design decisions or architectures you can have for your database system to see how they perform on non-volatile memory. So the term non-volatile memory is it can mean different things for different people. Technically, nan flash is like a type of non-volatile memory. But when I really sort of say non-volatile memory, I really mean something where you have this storage device or the storage technology that it's going to be able to have the same speed or almost the same speed you can get for region rights as DRAM. But it's going to be completely persistent and durable like an SSD. So sometimes in the literature, you'll see these things called storage class memory or persistent memory or NVRAM. For our purposes, what I'll say NVM is this special device that has this unique property where it's like DRAM, but it's completely durable. So the first devices that will come out are actually now. Intel announced that they came out I think it was last month or so. The first devices will be completely block-addressable, meaning it's going to look like a nan flash card that sits on the PCI Express bus. And you can read and write to it in terms of blocks. And the protocol that you use for this is a new technology called NVME. And now you've got to see why this term NVM is overloaded. I'm using a lot of different things. So you can use the NVME protocol for nan flash, but that's not what I'm describing up here when I say NVM. So as we see as we go along, these block-addressable devices aren't going to be that interesting for us from a database perspective because it's just going to look like a faster, better version of what the nan flash technology we have today. It's the later devices, and I'll give you a prediction when I think they're going to come out, that are actually going to be byte-addressable. And that's going to be the real game changer for how we design our database systems. So what I mean by byte-addressable, we can access a single pointer to a tuple, and we can do our reader writes at the byte level. And then as we see as we go along, it is my conjecture that you're going to have to redesign your database system architecture to be able to use this technology correctly. But we'll go along. So to understand now the sort of background of how we got to the point where we are today, where we have this non-volatile memory, I actually think it's a really, really interesting story to sort of go through how people have sort of discovered sort of the modern incarnation of non-volatile memory, how they sort of came about figuring out what that we can actually manufacture these things. So I'm not an electrical engineer. I've never taken an electrical engineering course, but the general understanding of how we understand circuits for almost 200 years is passive circuits is through these sort of three fundamental primitives or fundamental elements of circuit design. So the first one you can have is the capacitor. And this is essentially a battery. Like you can store some charge in it, right? And this was first discovered I think in England in 1745. And this is sort of the first passive circuit that they helped or that they invented. Then a little bit later, they came along with the resistor. And the idea here, it's a two terminal device where you put in some charge and then based on the property of the resistor, you can modify what the voltage is gonna be coming out on the other end. And then a few years later, they came out with the inductor in 1831. I think of this as just like a heating coil. You put a charge in and then these coils converts the voltage into heat. So if you take an ECE course, they'll basically describe these three fundamental elements to you as the sort of basic building blocks for all circuits. So what is really interesting about this though is that in 1971, there was this new professor at Berkeley called named Leon Chua. And he worked out the math that there had to be actually a fourth element, the three elements that I showed you, the capacitor, the inductor and the resistor, were not enough. In order to get the equations to balance correctly, there had to be this other fourth element. And the distinctive property about this fourth fundamental element is that its resistance on the circuit could change based on how you applied the voltage to it. And then what would happen is even if you then stopped applying the voltage to it, it would still remember what the resistance was. And he claimed that this had to be a fourth fundamental element because you could not build this type of circuit or element using the other three primitives. So it had to be its own fundamental element and its own right. So he published this paper in 1971 and no one read it. It was very esoteric, it was very mathematical and it was just sort of like no one thought of it. And so what he was actually into proposing was what we call now the memrister. And again, the basic idea of the memrister is that it's like a resistor but you can change what its resistance is by changing a voltage to it. And then when you take with that voltage away, it maintains or remembers that resistance. And that's obviously gonna sound like something we can use later on to store ones and zeros. And that's how we're gonna build a non-volatile memory. So the Chewis paper came out in 1971 and as I said, no one read it. Flash word down to the early 2000s, there was a team at HP Labs in California led by this researcher named Stanley Williams. And he was leading a team where they were trying to develop so these new nano devices to build like sort of self-assembling circuits. And what they found was that they kept seeing this weird property in the devices that they were generating in their lab and they didn't actually understand what was going on. It had this weird property where that again, if you put a charge into it, it would remember its resistance and you take the charge away and it would still be there. And so they kind of like went on for this for a couple years not realizing what they actually had. And by pure accident, pure luck, they end up stumbling upon Leon Chewis 1971 paper about the memory stir and realized what they actually had invented, right? Because they were then again, for years they were trying to figure out why do these circuits that they developed, these nano circuits, why do they have the properties that they were having? And so there's a really great paper that came out in 2008 called How We Found the Missing Memoryster that sort of talks about how, you know, from Stanley Williams how they sort of stumbled upon this. And if you're interested in learning more about Stanley Williams when you Google him, you're gonna come across two people. And so when we first started this project, I had Joy sort of reach out to Stanley Williams and try to contact him and talk to him. And Joy inadvertently contacted this guy. This is Stanley Williams who's actually on death row in California for founding the West Side Crips, right? This is not the Stanley Williams that invented the Membrister. It's this other guy here. And there's a great talk from him in 2008. Again, when he was at UCLA, that talks about again how they sort of discovered this and it's sort of laid out in this paper here. So if you Google for Stanley Williams, you want this one, not this one. And actually he might actually be dead now too. Right, so what was the unique property they were found in when they were building these circuits was, could be graphed in sort of this loop here. So this is called a hysteresis loop. The idea here is along the current and along the voltage, as you change the voltage, it modifies the current. And you have this sort of loop, right? And it's sort of weird property because not the other fundamental elements of the circuits don't exhibit this behavior. So what's really kind of cool now is then they went back after they figured out, oh, we have actually have a Membrister. And this is sort of the tell-tale characteristic of what you see when you actually run the circuit. They actually went back now into all the old publications and annals of sort of scientific journals from the last like 200 years. And what they found was that people kept coming across this sort of property and showing graphs that look a lot like this, but not realizing what they actually had invented. So there's another great paper called The Two Centuries of Membristers that was published in Nature, where they look back and get all these old papers and they see people drawing the sort of same thing. So this is a paper from 1948 about people doing experiments with vacuum tubes. And again, you see more or less the same hysteresis loop. And you read these papers and they essentially say like, we see this property, we don't know why it's doing it. We just think it's kind of interesting. So here it is, right? I think this one's from 1948. There's other ones that go back to like the 1920s and early days of circuits and things like that. Again, people were essentially inventing Membristers but not realizing what they actually had. So now we can go through and talk about what the actual technologies that, the main way we're actually going to manufacture these non-balto memory devices. And so what I'll say upfront also too is what's kind of confusing about this terminology as well is that when HP invented their Membrister device, their non-balto memory device, they called it the Membrister. The name of the fundamental element circuit is a Membrister but HP's sort of marketing the thing as a Membrister. But all of these devices here, there are technologies I'm going to tell you about, these are technically classified as Membristers. But HP calls theirs thing a Membrister and nobody else does. But basically, they're the same thing. So we'll go through phase change memory, resistive RAM, which would be what HP markets as the Membrister and then the spin-tron magnetic resistive RAM. And the idea here is again, not so much necessarily to understand at a real detailed level how these storage mediums are actually manufactured and how they actually work at a really low level. Just to give you a high level idea about how they actually work. And then we can talk a little bit about what the implications are for when you actually build a real computer, a real database server, how these things can actually fit into the hierarchy. So for the longest time it seemed that the most promising technology for non-volta memory was this thing called phase change memory. And the basic idea of how this works is that you have this sort of special material here, the calcium genide, and think of it as sort of like a little crystal. And it's an oversimplification, but the basic idea is the same. And what happens is that when you sort of heat up this special material, it becomes opaque. And that changes the resistance of the current that you run through it. So think of it like it's a little line going into it and then if you wanna make it a one, you send a big charge through it and it becomes opaque. Or if you do a slower voltage into it, then it becomes clear. And that represents the zero. Sort of like it has a little heater that it's not obviously a little lighter, but like you heat it up a little bit and it changes its property. So for the longest time, this seemed like this was gonna be the most promising technology. And actually, I think maybe five or six years ago, you could actually could buy phase change memory in a very limited amount. Like you could buy a little, you know, something you maybe wanna embed in like a cell phone or something like that. You could buy like 128 megabytes of it. The reason why this never actually really took off is because of having to send this current into it. And that actually generates heat. So now if you wanna have a lot of these things, maybe stacked together, it's gonna generate a lot of heat and that sort of limits what you'd actually do with it. We won't talk about this much in this class, but like later on when the new designs of CPUs are coming out, they're talking about doing three, stacking of actual memory on top of the CPU itself. And if you're using something that generates heat like DRAM or phase change memory, then you're not gonna be able to do that. Whereas in the memoryster ones we'll see in a second, you can do this. So for the longest time, this seemed like this, this is the first technology that I was gonna come out. It seemed like IBM was talking up about how they had phase change memory. I think Intel for the longest time talked about that they wouldn't have phase change memory. But it doesn't seem like the first NVM stuff that's coming out uses this thing at all. All right, so now what it actually thinks was going to be the prominent storage medium going forward for NVM is what's called resistor RAM. And what I'm describing here is actually what Hewlett-Packard put out, the HP, when they asked the members for this, I'm gonna describe how their device actually works. Intel has their new 3D crosspoint that actually is on the market today. And they've marketed this as resistor RAM. I don't know how it actually works. So they haven't really disclosed anything. I think it works essentially the same way because they talk about it being 3D crosspoint which sort of the terminology matches up sort of the crossbar technology that HP put out in their memorysters. But I'd say it's gonna be different because they want to avoid all the patent mess that HP has for memorysters. So the way that Hewlett-Packard one works was actually really kind of cool. So you have like two layers of platinum. And this is obviously done at a really small nano scale. And then you have two layers of titanium dioxide. And in one, the top layer, it's gonna have missing electrons and the bottom layer is gonna have all the electrons. And the idea is that as you put a voltage through this, electrons move up and down and then that ends up changing the resistance of this particular circuit. So then now you can measure zero and one. The cool thing about memorysters, and I guess full disclosure, like when Stanley Whim's announced that oh, we invented the memoryster, it's gonna come out, I totally drank the Kool-Aid because they're like, oh, this is awesome. This is totally gonna be real for now. The problem with HP is they always announced that it's always me two years later. So they're like 2008, they announced they found it and they're like, oh yeah, what are we selling this in two years? And then 2010 came along and it was like two more years and it keeps going on for it. It never actually comes out. But what's really kind of fascinating about the memoryster stuff is that this titanium dioxide is like super cheap and super common. Like titanium dioxide is the same thing they use in white house paint and it's the same material they use in sunscreen. So it's not like a rare earth metal that's hard to find. There's tons of it. The problem is though, manufacturing this at the mid-nano scale to have the property that you want at the density you want is, I think, sort of what they've been struggling with. Another really interesting thing about the memoryster, which again, we won't talk about in this class, is that they claim that you can actually turn the storage material, like the cells that are storing the zeros and ones, you actually can turn them into executable logic gates. So you can sort of think it's almost like an FPGA. You have your memoryster device and you can turn one half into the actually be storage and the other half can be like executable gates that actually run programs or compile queries and do store procedures. What's really kind of cool about this is that unlike sort of the CMOS circuits that we have now, this memoryster, the way that these logic gates work is actually based on another type of logic called material implication that was invented by this famous philosopher and mathematician from England named Bertrand Russell from like the 1920s. So this is crazy when you think about it, like we're talking about like 21st century technology using like 1920s math and theory to actually execute things. And they talked about in these papers that came out about the memorysters, they were talking about how like, oh, we'll be able to turn these executable logic gates into neural networks and we'll be able to build clearly on memorysters models of the human brain. Like this is sort of how it is, how we're gonna get artificial intelligence, true artificial intelligence. And it never panned out. Like, you know, HP kind of dropped the ball on this so I don't know what's going on, but they keep putting this out for far and farther in the future and it never looks like it's gonna actually become available. And this is actually a screenshot from their, one of their HP labs, one of their product conventions or showcases in 2010. As you can see here, 2006, they found HP proves the Fourth Fundamental and Lentonic Circuitry. 2008, they claimed their development ready, right? And then two years later, it was 2010, oh, we're still two years away, two years away, two years away. And then I think it was 2014, they announced their sort of moonshot idea that was gonna save HP, was gonna build this new type of computer called The Machine, which is a terrible name and I realize this is on video, but I'll fully admit this. It's like, I'm not ashamed claiming naming your computer at The Machine is terrible. And so one of the things that was gonna be in The Machine that made it different than anything else that was ever invented was that it was gonna have memristors. And I think last year, I think The Machine is still supposedly on the way, coming out, but they no longer claim it's gonna have memristors. It's just gonna have a lot of CPU cores, right? And some kind of fast fabric to communicate things with a certain end-flasher of DRAM. So I love the idea of a memristor, but they haven't stepped up to the plate and actually put anything out. So it is one of those. All right, the last sort of technology that I think is actually kind of cool, but is way much, much farther away is called magnetic resistor RAM. And it's sort of more commonly called spin electronics. So the idea here is that you're gonna have two sort of magnetic storage elements that we can use to measure the polarity and decide whether you have a zero or one. So the idea is that at the top layer, you have a magnet that's fixed in one polarity and at the bottom layer, you can flip it based to be, you know, go one direction out of the other based on what charge you put into it. And then you can measure that to be zero or one. I think Samsung is one of the early, one of the people that actually were working in this area and maybe some of the other sort of manufacturers. But this is much, much farther away. But actually, this is really cool. I think this is better than the 3D cross-plane or the memristor is because you can get to much, much smaller scales than you can with those other devices or other storage mediums. And it's supposed to be really, really fast. Like almost to the speed of like your L1, L2 cache. Whereas the resistive RAM is probably gonna be four to eight X slower than DRAM, which is still really, really fast, but not as fast as what Spintronix supposedly can do. That sound about right, Tony? Okay, he can't say anything. Okay, all right. All right, so now, given that I just told you that HP keeps claiming it's gonna come out and it never does, right? And we've been thinking about non-volta memory for a long, long time, right? There's some early papers from the 1980s in databases where they talked about having battery backed up DRAM, but that never really went anywhere. And the reason why I would argue that this is actually happening for real this time, like real real, not like fake real, is there's been three major changes in sort of the landscape in computer science and databases, or not necessarily databases, but in just computer science and operating systems that are actually gonna make this, I think this is happening now. So the first is that the industry has put together the, have agreed now, the sort of come up with the standard terminologies and the form factors and the expectations and the protocols for these non-volta memory devices. So before, a bunch of one-off companies have made their own thing. Now there's an interesting distortion that says this is what the standard will be. The other big thing was, so that happened last year and there's a newer update coming in 2018 for the NVIDIM that goes in the DIMM slot with persistent memory that I'll talk about in a second. The other big change was earlier this year, actually last year, was both Linux and Microsoft added support for non-volta memory in their kernels. And it goes under this codename of DAX or direct access. And the idea here is now you're gonna have actually support in your operating system to say yes, here is actually truly persistent memory. Because up until now, we've been assuming we're under the von Neumann model where you have volatile DRAM and persistent storage where they're spinning this hard drive or an SSD. And now we actually have in the operating system a notion that, oh yes, there is actually byte addressable memory and actually it can be persistent. This doesn't necessarily mean that with this new kernel bypass method that if you pull the plug on your computer and the plug it back in, you're magically all gonna come back as you were before. Because obviously there's things that are still gonna be volatile like all your registers and SRAM, your program counters. So but this is now gonna allow you to write applications that know that it's writing to these NVIDIMs with this non-volta memory directly. And then the last one is that Intel added in 2017 for the Xeon ISA, they've actually added new instructions to now do cache line flushes directly to NVM. Because as we see, as we walk through actually how you use this in a database system, the big problem with if you just had NVM and not had the CPU or the operating system be aware that it's actually persistent, then you would do writes to cache lines and you would have no guarantee that it's actually been written out to NVM because the CPU can decide how it moves things up and down the hierarchy. So now there's these new instructions, the CL flush and CLWB, that can now say yes, block my process, block my thread until this cache line has been safely written out to NVM. So you can't obviously build a database system using NVM without this or actually all of these things. So this is why I think this is happening for real now. So to understand what the consortium has agreed upon for what these non-vault memory devices will look like, again think of now, like not in the new Intel device where it's sitting on the PCI Express bus, think of now something that's gonna go into the DRAM slot, the DIMM slot on your motherboard. That's what we're talking about with this new NVDIM stuff. So they've announced that there's basically three types. So you have NVDIM F and this is where you have basically a, it looks like DRAM but it actually just has a NAND flash, right? And obviously you still need to have another DIMM slot have actually a real DRAM because this would be too slow to use for your operating system. The next technology is NVDIM N and this is where you have flash and DRAM together actually on the same sort of, on the same DIM and it's gonna appear to the operating system as being volatile memory but it's gonna have a larger capacity than you would normally have to regular DRAM. So think about it, if you have this single DIM it has one gigabyte of DRAM but then 10 gigabytes of flash it'll appear as 10 gigabytes to the operating system. And then the last one is again what I think is the big game changer which is NVDIM P and this is where you're gonna have truly persistent memory using one of the three technologies that I just talked about before. There's no DRAM or flash whatsoever on this actual device. And the idea here is that you would use this conjunction with the new kernel support that I talked about before with DAX that you can read and write to these things and have it be persistent and the operating system can help you know where to get the data you're looking for when you restart the server. So these are out available today. This is the major one that I think will be a big game changer for us going forward. All right, so now since we're taking a database class what, how do we actually use this? What do we have to change? And as I said in the beginning if it's block addressable it's not that interesting to us. It's just gonna appear to us as a flash, a fast SSD. Like a Fusion IO card that we saw in the Silo R paper or some of the newer high-end flash devices you can get from Samsung and other people, right? In that case, we don't think there's actually gonna be any major difference. But now if things are byte addressable then we'll be able to get better performance than we could otherwise but it's gonna require some work on our end as database developers, as people actually building the internals of the database system is gonna require us to do some extra work to make sure we use these things correctly. And it is my belief that when these NVDMP devices actually come out the in-memory database systems that we have today are actually gonna be better suited to actually be converted over to use non-volta memory than sort of the traditional disk-oriented block block-based systems, right? The oracles and Postgres and MySQL's of the world. And the reason is because these guys already had this huge buffer pool that they had to maintain and all these other architectural components to deal with the distinction between non-volta DRAM and non-volta SSDs are spending these hard drives, whereas the in-memory guys are already, assuming the architecture of the system can have pointers directly to tuples and access them directly. And so it's just a little bit more extra work to now convert them over to make them use, to recognize that they're doing reason rights to non-volta memory that it's actually persistent. I can't prove this but in talking with people in industry, I'm getting the sense of like, there's people, there's companies that are actually looking at building new engines that will, you know, that replace the old disk architecture that they have to use an in-memory architecture that can then be converted over to use NVM. I can't say names because they're on video but not hard to think about which ones we're talking about, right? So I think this is true but it's a sort of, you know, we can't prove this because it's a software engineering comment. All right, so then now the paper you guys read was our sort of first foray into exploring non-volta memory database systems here at CMU where we were looking sort of well-forward to the future and saying, how would you actually want to design a database system when you only have non-volta memory? And this is sort of my way as a new professor, like in my first one or two years here at CMU, I was like, all right, I can look 10 years in the future, right, what should a 10-year future system look like? And we said that, oh, DRAM is gonna go away, which I actually don't think is true anymore but at the time I thought this was true. So we said, well, how would you actually build a database system when you assume that all memory is non-volatile and byte-addressable and persistent, how would you actually wanna design your database system? And so for this, we built a sort of separate prototype called EndStore, and then EndStore essentially would got sort of rolled into and became Peloton, but this was sort of our early prototype. And the idea that we were gonna do, we were gonna take sort of the standard textbook definitions of database system architectures and run them on non-volta memory and see what are the parts that actually are suboptimal and then come back and say, well, how can we tweak them? How can we change these architectures to use non-volta memory correctly? Not only to get better performance, but also to reduce the wear down on the device. So this is another thing I didn't really talk about, but it's expected that these non-volta memory devices are not gonna be infinitely writable sort of like how SRAM and DRAM are. It's not gonna be as bad as an SSD where if you write to the cell too many times it gets burnt out and you can't write to it anymore. NVM is supposedly gonna be a little bit more durable, but we still have the same problem where we could burn out the device if we use it too fast. And this is a big problem now. If we wanna do byte-adjustable loads and stores to single cache lines, if we're not careful about how we design our database system we could burn out a single cell fairly quickly. So the things we now need to talk about is how we're actually going to, what are the building blocks we're gonna need before we even get to actually designing our database system? What are the things we're gonna need at our operating system, at a sort of a system level in order for the database to use non-volta memory and have it be persistent and durable? So at the time one of the big problems that we were facing is that all the existing programming models that were out there, this is before the Linux and Microsoft added the DAX or to their kernel. At the time all of this sort of, the programming infrastructure that was available for us on a system assumed that all your rights to memory were volatile. And then we had another problem where the CPU was gonna decide when actually it was gonna move data from our L1, L2, L3 caches out to DRAM. And that's bad because if we wanna do a write and treat that as like a log or make sure that it's actually durable we don't wanna roll the dice and say, yeah, it'll eventually make the MVM but we can crash and lose stuff. So we needed a way to ensure that if we write anything to our CPU caches that we could then be guaranteed that it made it out to MVM before we sent back like knowledge that a transaction actually committed. And so to do this, they're thinking of it like this, right? I do my, here's my data system. I do a load or sorry, I do a store and write something to my CPU cache. And then I wanna make sure that it makes it out to MVM. And so to do this, this is where the new instructions that Intel put out solve this problem because you can then do a cache line flush that says block my process until I know that the data's made it out here. So that's sort of the first thing we had to deal with in synchronization. The second thing we had to deal with is that if our data system crashes, we wanna be able to come back and make sure that all our pointers in our various data structures are now pointing to valid locations of the same data that they were in before. So we're not gonna be able to recover again program counters and low level registers. This is strictly having to do with all the high level data structures that we're gonna build in our database system. We don't make sure that they're all gonna be persistent. So to sort of think of like this, say we have our index and it's gonna be using, it has these pointers to stuff in our table heap and this is all gonna sit now in MVM, right? And then now internally in our table heap, we can also have pointers to all the versions of the same tuple, right? Because that's gonna be, say we're doing MVCC and we need to have a version chain. But now if we crash and come back, we wanna be able to come back and see the exact same state as we were before. Even though, again, the physical location of this data may move in memory, we want all our pointers to virtual memory to still be persistent. So we essentially had to rewrite our own memory allocator that what could guarantee this for us. And essentially what happened to be in the same way that you flushed the actual data out to MVM to make sure it's persistent, you can flush the allocated pages of your allocator out to MVM and make sure they're persistent. And then on top of that, then you can build these, you assume you have these non-volta memory pointers that can guarantee that you're always gonna have consistent data after your restart. So at the time, we actually had to build our own memory allocator because it didn't exist. Now, Intel has a library called pmem.io that provides this functionality for you. Oracle has their own thing, I think Microsoft has another one. But the basic idea is that they're providing you this guarantee that your pointers will be consistent, even though your system can restart. Right, this is essentially what I was saying. So we built our own MVM-aware memory allocator and the way we got synchronization is that we rely on the CL flush to flush out the cache line and then we would issue an STUNS instruction to make sure that we'd wait to know that our data was actually durable. And then for the ensuring that we have consistent data after restarts, we wrote our own allocator that could then ensure that all the memory, virtual memory addresses that were assigned to some memory map region for our process will be the same no matter how many times the operating system or the data system restarts. And again, you don't get the program counters, you don't get registers. It's just the, think of this as you get malloc, a bunch of data and you can have pointers inside of that. And if you restart your process, everything is still pointing to the correct location. All right, so now from this, using this as a building block, now we can then say how do we actually wanna design our different database system architectures? So the sort of three standard architectures you can have are doing in place updates, copy and write and block structured. So the in place updates is, this is sort of in the standard architecture we talk about in the introduction class. And this is basically where you have a table heap and you write to tuples and you overwrite the old value with the new value and you maintain a right ahead log and a snapshot on disk or a non-volatile memory that says, here's all the changes that I made and here's the consistent checkpoints of my database over time. And this is the same architecture that's used in Vulti B and H-Torn. The other next architecture you can have is to do copy and write. And this is essentially the shadow paging approach that we talked about in the intro class that IBM first developed in System R in the 1970s. And here it's basically gonna be organized as a tree structure and we'll make a shadow copy of pages in the table whenever they're updated. And we only flip pointers when we know that our transactions had committed. For this one, you don't need a right ahead log at all because the master version of the table or the database is always gonna be consistent. And so if you crash, if you crash and restart, you just throw away the things that shadow copied that never got updated. And the last one is the log structure approach and this is where the log essentially is the table heap and you append all the entries for the changes that you make. And this is the same architecture that's used in Level2B and RocksDB. So I'll also say too for this evaluation, since we only wanted to look at storage and recovery mechanisms, we're gonna run this, all of these in the same H-Torn-style Concertical scheme where you have a single threaded execution engine that can only run one transaction at a time at that partition and you have to acquire the log for that partition so that before you got to run and nobody else is running so you don't need low level latches on any data structures. So this is really looking at how, if you architect the system, how this will be affected by NBM and not worrying about any other sort of high level constructs like concurrently troll in the architecture. All right, so let's go through each of these and we'll see first what the sort of, the textbook or canonical version looks like and then we'll see how we can improve them by being cognizant that we have non-volatile memory and that we can do recent writes to memory that, we knew writes to memory that end up being persistent. So in the sort of standard in place updates engine, what happens is that when you wanna modify this particular tuple, the first thing you have to do is put the delta or the physical change that you made to that tuple out into a right head log, then you actually apply the change to the tuple in the table heap and then eventually some later point you'll write out a snapshot to say that this is the version of the tuple at a particular point in time. And so for this, you see that we're essentially doing three writes into memory to do one update. Because we have to account for the fact that this thing, in the original design of this architecture that this table heap was assumed to be volatile. All right, it's actually even worse in other systems like MySQL because MySQL actually will do four writes because they have a double write back buffer that they stage the writes first before they get written out for 40 flush pages in the snapshot. All right, so the problem here is we have to look at data because again, we're doing three writes for one update and then it's gonna make our recovery latency really be really slow because we have to replay the right head log and apply all the changes and put it back into memory. Even though when you think about it, if I knew this transaction was committed and I did my update here, when I crash come back, this memory is actually still gonna be available. But because if I treat this as volatile memory, then I have to load the old snapshot. If I know that it's non-volatile, then I can come back and long as I'm careful about what changes are actually visible, this actually would be the correct version of the database and I don't need to load the log or load the snapshot. So this is what I mean by an example of trying to be smart about how we use non-volatile memory. So what we're gonna do now is we're gonna leverage the fact that the allocator is gonna be able to generate non-volatile pointers for us. So that means that the only thing we need to record now in our log is the pointer to the thing that changed rather than the data that was changed rather than the delta record. And so then the only thing we have to now maintain is this undo log that where we keep track of anything that, again, we keep track of the old versions of tuples in this transient log and then when a transaction, when a transaction abortes, we know how to roll them back. But then we can throw that away when the transaction commits, but then as we do this, we never need to do the redo log in the right-hand log because when a transaction finishes, we'll flush all the changes out in our CPU caches from the table heap, we'll make sure they get written out to NVM. And because we have now our log that says here's the pointers to the things that got modified, when we come back, we just make sure that the pointers reflect the changes from transactions that actually committed. So if you have a pointer that points to something that was modified by a transaction that didn't finish, you can go ahead and undo it. So for this, again, now we see that we're gonna have an NVM table heap and NVM storage. No longer we don't need the snapshots anymore. So when we update this tuple one here, all we have to do in our right-hand log is flush out that here's the pointer to the tuple that got modified and then we can go ahead and modify the tuple. And so as long as this thing makes it first before this, then we know it's safe for us to write this out. We won't lose any changes. So now we only have to do two writes instead of three writes, or even four writes for doing these in-place updates because we're using NVM correctly. All right, so now copy and write. Again, this is just shadow paging from the intro class. How would normally work? Does that say you organize this as a tree structure and then the leaf nodes in the tree are pointed to slotted pages, just like as before we discussed in the second lecture. So now what I wanna do an update to one tuple in this page, I have to copy the whole page over and then apply my change there. Then I create this dirty directory that now can point to the unmodified leaf and the updated leaf. And then I have to do a flip to my master record and now point to this dirty record when the transaction commits. So the big bottleneck here, the big issue with this approach in NVM is copying this slotted page or to the shadow copy of the dirty directory. Because if I'm organizing this as a four kilobyte block and I'm only updating one tuple, I'm copying four kilobytes just to copy things over. And again, that'll wear down the device if you're doing this over and over again. So we have expensive copies and we wanna get rid of that. So now what we're gonna do is that instead of having the organizing the tuples and the slotted pages, we'll just have actually pointers to the tuples in memory. So now it's more fine grained so that when we wanna update a single tuple, we only have to just copy over the new versions and we don't only have to modify the pointer to that new version. So then we just go ahead and flip through and modify now these pointers. So we're always gonna have to create this new version because you always have to create the new tuple but you don't have to copy the entire page when you do this. And then now you're just moving around 64 bit pointers to update the dirty directory. So the main benefit you're getting here is removing having to copy that entire page. Yes. What was that? So there would be for this architecture, I think the leads can have multiple tuples. For this diagram it's just one, yeah. But even then you're only updating single pointers. All right, so the last one is the log structured engine which again is from LevelDB or RocksDB. And then this one you have this in memory mem table which I think is organized as a skip list. And then you have the right-hand log where you're gonna append all your changes to that things get modified. And then over time you'll move everything out to an SS table where there's a bloom filter and then sort of, I don't think there's usually an index on top of this. They're usually just sorted heap. So when I want to modify tuple I have to then apply a pen change to my right-hand log with the delta and then eventually this will get flushed out to an SS table where we go ahead and write it again. So now when you think about this in again for non-volatile memory, this whole mem table doesn't need to be, this whole mem table is gonna be persistent. The whole idea in a log structured mastery is that this thing sits in volatile memory and these things are immutable out on disk. But now if this thing's actually persistent itself you can just get rid of this thing entirely and then you don't have to duplicate any more data and you don't have to do any more compactions. So essentially what you just do is just throw this thing away and then now you only have the mem table and you can make different, when the mem table gets filled up you just copy it on the side and start a new one but you don't need to sort of do compaction or reorganize it as an SS table. It sort of, how it sits in memory like this is how it's gonna be persistent going forward. This reduces all the overhead you have of copying things down at different levels. So the high level main takeaway is about what I've just shared you is that we're gonna use the fact that we have persistent memory that's actually byte addressable to avoid having to do extra data duplication or extra writes to either write head log or snapshots or an SS table for all these different architectures. And then the other key thing about is that by now only recording pointers to things that got modified, when we crash to come back we just have to make sure that our pointers should actually be there or not. Based on whether the transaction is committed. And we can use our non-volta memory pointers to build these non-volta data structures that we guarantee that the internal architecture of these systems, these components will be consistent after we restart. So real quick finish up, we'll go through a quick evaluation to shows that the benefit you can get from this having this architecture. And as I said before, we don't wanna measure, we don't wanna have concurrency will be a bottleneck at all for anything in our measurements. So we're gonna use the H-door stock concurrency control that doesn't have any low level latches that can interfere. And for this, again, since non-volta memory doesn't actually exist yet, the way we're gonna run all this is actually through a Harbor emulator that Intel Labs provided us where you can tweak things to actually have DRAM get slowed down as if it was actually NVM. So again, it's still all DRAM, but the way it works is they put extra sort of debug code in, or they put extra code inside the, or microcode inside the debug hooks for the motherboard so that when you do a read and write on one socket to DRAM, there's a little busy loop that'll spin for a bit and slow down your loads and stores. And so you can go down to the kernel and you can actually specify what you want the latency, the slow down to be. So again, it's still just DRAM, you pull the plug, everything gets blown away, but this is the best approximation of what NVM actually would look like that I think is much better than any of the other papers that are out there. A lot of other people do sort of simulations or they run things in PIN where they sort of slow things down. This is actually nice because you just take your binary that you write in your laptop and you can run it on this emulator and not have to change anything. It just works as if it had DRAM. And so for this we'll use YSSB and a moderate size or a rather small size database system but we'll do 10% reads and 90% writes with high skew. And the idea here is that we're doing heavy writes because that's where you'll see the major benefit of this architecture, of the optimized architectures for NVM versus sort of the canonical implementation. All right, so the first one we're just going to measure what the actual runtime performance of the architectures. And so here again we have in the gray bars we have the traditional ones to the textual implementations and then the red ones are the NVM optimized one. So when we first started this project we thought that we could maybe make the copy and write architecture work the best because we had this sort of fanciful notion that we were going to take shadow paging from IBM from the 1970s and if we tweaked it very carefully and used NVM correctly, we could actually make 1970s ideas or concepts perform really well on 2015, 2016 hardware. It didn't turn out to be the case at all. And it's sort of obvious in retrospect that the in place updates was always going to be the fastest for this, right? Because you don't have to prepare anything when transactions commit, you don't have to switch any pointers, you just do all your updates directly on the data that you want to modify. So again, across the board you see that the NVM optimized ones are always doing much better. You see the most benefit coming from the copy and write case, but it's never going to be as good as the in place updates, which is always going to be faster. And then the last one, the next measure we can say is how many store operations are we actually going to do to the storage device? So again, this is important for wear down. If you have your database system writing to the actual storage medium too much, you'll wear it out quickly over time. And so part of the reason why you see the better performance for the in place updates engine on the last slide is because simply it's just doing much fewer writes than these other ones, right? Because you don't have to prepare copies of things, you just write directly to the data that you need. So here you see that you get the most reduction for the in place updates, but the other ones are dropped down pretty well as well. So now the last experiment we're going to do is to measure what actually the recovery time is for the database system. And the idea here is that because now we're storing pointers instead of actual data in our right-hand log, when we recover we just need to make sure that our pointers are actually valid and not go apply the changes back to the database as you would normally in right-hand log recovery. So the first thing is that for copy and write there's no recovery needed, the recovery's instantaneous because all you do is come back and just the master pointer for the master record points to the correct version of consistent snapshot of the database, right? You don't just throw away all the shadow copy there. So there's nothing to measure for recovery for these guys. But now what you see is in the NVM optimized case, as you increase the number of transactions that you need to replay in your log for the traditional architectures, they get progressively slower. And this makes sense because you're replaying more in the log and you're applying more changes back into the database. But in the NVM case, you don't see that much big performance difference at all because the cost of going and checking to see whether your pointers are valid is actually really, really small. So that's why you don't see these guys stepping up as much as the traditional ones. Yes? But I still keep an undue log for the inflates updates. So as you have more transactions, you have more undue log to reply to? The statement is that you have to maintain the undue log for inflates updates then as you add more transactions you have to more to undo. No, because so like undue log is only necessary when the, if the transaction never committed before you crash. And so you would see the commit message and make sure you do, if you get the order correctly you see the commit message and therefore you know anything that comes before the, in your right-hand log for that transaction all those pointers will be valid. So you don't have to undo anything that just gets thrown away, right? So that's why you don't, you, when we say recovery, I don't mean like, you know, here's a thousand transactions that all did not commit correctly. Let me abort them. It's like how many transactions that I have to look at my log to guarantee and to make sure that they're correct, right? Okay, so what are my party thoughts about this? Again, I think that our study shows and in later papers that we've had shows that if you're gonna have to use, you have to redesign your data system to use NVM correctly in order to get the better performance and reduce the wear down. If you just take my SQL and run on NVM, it actually doesn't perform well. We have a previous paper or earlier paper that shows this that like, if you just take hstore and my SQL and you run them on non-volta memory, they more or less become the same, right? But they're still not as good as you can get for the optimized versions that we showed here. And it's only when the byte address when NVM comes out, supposedly in 2018, I can talk offline more about that if you want. When that comes out, I think that's gonna be a big deal and that's gonna require all the database companies to rethink about how they're gonna design their system. And supposedly this NVM stuff is gonna be much cheaper than sort of the early nan flash that came out. I don't know whether that's true. The Memorister guys were talking about petabytes on a square centimeter. Of course, that obviously never panned out. But if you look at the new Intel Optane devices that came out, they're like a thousand or $2,000 and they're not that expensive, right? And so I think that if this byte address to NVM is actually cheap and fast enough, it could be a big deal, okay? All right, any questions? All right, so we'll finish up. This is the last lecture. On Monday's class or sort of Tuesday's class, I'll start off the first like five or 10 minutes that give like a high level overview which should be expected on the final exam. If you came in late, there was passed around copies of the Sable final exam that were handed out. There were these two copies, one's with the solutions, one without. We're also having Marcel Kurnacker come on Tuesday to give a guest lecture on Cloud Air Impala and Kudu. Cloud Air, Marcel was Mike Sternberger's PG student at Berkeley in the 2000s. He worked on Google F1. And he later left that to go build that sort of same, similar, something similar at Cloud Air Impala. I think that's everything I wanted to cover. Right, and then the exam will be Thursday next week. Okay, any questions? Oh, real quick, now I remember. Please fill out the faculty course evaluation. That's actually really important. I realized that when I was in school and I never took it seriously. No, it's really important because like, the feedback you guys provide me is actually very helpful to decide whether the reading views are the right amount, whether the lecture has made sense, whether you wanted more or something else, or less of something else. And I actually do take that consideration and actually help me to make the course better. If you don't like my hygiene, just put that in there too, I don't care. Be brutal, right? I actually do read them and actually do take them seriously, okay, and I will pester people until we get the 100%, 100% of them filled out. I don't have to send you the email about this or no? Yeah, you can. Okay, yeah, so please do it, okay? All right, it's been an awesome class. I'll see you guys on Tuesday.