 Hey, what's up guys? This is 15 721 Advanced database systems at Carnegie Mellon University. I'm Andy Pablo the the professor for this course as I said on piazza Right now. I'm in Reno, Nevada with the TA Lin Ma We had to take care about a bunch of stuff for Some money issues but whatever So I can't be on campus right now, but I still want to you know get the first lecture started and start talking about you guys about you know, what the course is going to get the about and Get you know get the ball rolling on on the sort of the main topics So for today's lecture kind of start off talking about the logistics of the course What's going to sort of be expected for you guys as as the semester goes on and then we're gonna start jumping into actually start talking about The the the topic right away, and that's going to be in-memory database systems Which is going to be the focus of this this semester And then we'll finish off talking about some some interesting early notable in-memory databases that I think are worth talking about a little bit They're not we're not gonna like focus too much on them during the semester because They are not as widely used as the ones what will be will be later been talking about so To jump in really quickly to help sort of motivate why I think you should be take this course The main thing that I say every year is that the There's such demand for people that can build Database management systems right these are complex pieces of software and there's a lot of unsolved problems that people have to deal with in this area and Frankly industry and academia cannot hire fast enough in this area so I would say that if you're good enough to actually write code in a Database management system, which is I hope you'll end up being by the end of the semester Then you'll be trusted at right code in almost any other type of pieces of software So whether you're you know you're building an operating system But in some kind of embedded device a lot of the things that weren't talked about today in in this class I've been relevant to other areas I am also a little proud to also about the Success of that we've had in having people that take this course go off and do other great things So this is just sort of a list of some of my the former students in this class and in our research group who have gone off In industry and now and I'm building some of the the most interesting state-of-the-art database systems that are available today Right, so you know I don't want to name names But you know these are some of my best students. I've gone off and done great things So again if you get through the course you You know if you're able to follow along with everything we're talking about You can go off and do great things at these places as well so the The high-level objective about what this course is really about is to help you understand the How people build modern database management systems I what are the system programming techniques that you need to you know build high-performance software that's actually able to Process lots of data So my goal is that for the end of the semester you become proficient in the ability of writing You know correct and high-performance Low-level systems code but beyond is doing that the goal is also that you end up writing You know documentation and test cases for everything you write along as the way and so it's also going to give you a lot of experience I'm working in a large code base with a with a team and understanding the how to you know the things you have to do to get along and work a big piece of software with a large group so that means doing code reviews that means and writing documentation and Specifications about what we're doing so it's more than just you know We do this we'll describe these techniques and the concept of databases because that's what I care about in my life but the you know the high-level things again can be applied to Other areas computer science or when you go off in your career so the Main topic of the course will be about the internals of in-memory data measurement systems and specifically we're going to focus on single-node systems So these are databases that are going to be deployed on on one machine So we're not going to talk about distributed databases at all in this class And this is primarily because this is not what my research research interest is right now but a lot of the same techniques and concepts that or we talk about for a single-node System can then be also elevated and applied to Distributed environments, so it's not you know, even though we won't talk about distributed databases It doesn't necessarily mean everything we're going to do is could not be used in these things the way that like I think about it is that If you unless you have a your single-node system is actually able to work really well and correctly Going up and you know scaling out and being distributed Or just make you you know It's not going to magically make your system be better Like you're gonna have other problems that you have to deal with as well as the single-node problems so Another aspect of this course is that we're going to focus entirely on state-of-the-art concepts and techniques So that means that this is not a course on classical database maintenance systems We will mention system R. We will mention ingress But we're not going back, you know to the annals of database systems Literature and looking at how they build systems in the 1970s We're really focusing on how people building modern systems today and the sort of classic database system architectures But you know, those are the kind of things you would cover in an introductory You know advanced undergraduate course and that's that's not what we're doing here. This is this is this is very cutting-edge So the kind of things we were talking about through the semester are going to be Concurrent to control like you know, how do you run transactions running at the same time or queries and running at the same time and Updating the database we're talking about modern techniques for doing indexing storage models compression Join algorithms networking protocols and these are all the things you need inside of a database management system You actually be able to run in in the real world So we're going to focus a lot sort of go step-by-step and Describe the architecture how to build out the full system So the goal again would be in the semester that you wouldn't understand be able to you know All the different components you need to build out a full system So in the background that I expect you to have in order to take this course is that I'm assuming you've already taken some kind of introductory course on On database systems before so if you write Carnegie Mellon University Then this would be 15 445 or 645 or again any other sort of introductory course at at your undergraduate institution So the reason why you need this background is because you need when we start talking about things like join algorithms I'm going to assume you understand how to do the classic implementations that Are described in textbooks, but now we're talking about how to do this in in modern hardware setups, right with you know parallel cores and vector vectorization Instructions and things like that So that means the things are not going to cover is how to write sequel how to do serializable theory for transactions relation algebra that began the basic algorithms for joins and data structures for indexes things like that again I'm going to assume that you have you understand all these things and that way we can jump right into the The more advanced topics So the for the course logistics I'm gonna go over real quickly what's expected for you during the semester the course policies and the full schedule is available in the course website and The one reoccurring thing that we're gonna we're gonna have As I go along in this introduction is is asking you not to plagiarize because this is a really big big deal If you're unsure what it means to plagiarize or not plagiarize You can go look at the senior policy page That they have for the university or if you're really unsure just ask me because I'd rather have you ask me then You know is it okay for me to copy this piece of software that I found internet solves the problem? I want to solve right ask me first before you do anything and that way we don't get to the issue of of You know did you plagiarize or steal something that you shouldn't take something you shouldn't take him So again, if you plagiarize in this course and we catch you we will report you and it's just the way it is I Have office hours during the semester on Monday and Wednesday the one hour before class and this will be in my office in 1919 and get the gatesman home and building So whatever you want to do whatever you want to talk about we can talk about implementing the projects We can talk about the papers that you're reading and discussing them. We can talk about how What? What? What do you want? Who are you? I'm here for my money. What are you talking about? That face Rick sent me out here and said you have my money. I have no idea what you're talking about. I'll be honest You're lying. You have my fucking money. I didn't know who you are. Yes, you do You know fat face Rick, right? I know fat face Rick. Yes. Good. Then you know, you have my fucking money Give it to me. I don't lady. I don't know what the hell you're talking about. Give it to me I have to fuck you up man. You're crazy. I do not have your money I have no idea what the fuck you're talking about. You owe me 200 fucking dollars. I don't owe you to- I need that money right now Your student Lynn is inside the casino talking to fat face Rick. Yes, I know Lynn. Yes He said you're out here in your car doing some stupid database thing. Yes, Rick told me to come out and get it from you So where's my fucking money? I Have no idea what you're talking about. Give me my fucking money Are you gonna give me my money? I'm not gonna give you anything. I have no idea what you're talking about lady I'll give you 200 dollars. That's really Get the fuck away from me. Sorry. Are you gonna give me a fuck? I'm not giving you anything. Give me me Yeah, Reno is kind of kind of crazy. Um, I Don't know what she wants. I have no idea what that is Right. All right, so Yeah, the the tier for the class Lynn again, he's in the casino right now With with fat face Rick That's weird anyway, he's awesome. He is smarter than I am He's the lead architect for all the self-driving components we have in in the system. We're building here at CMU You can go talk to him if you have questions All right the for the course What's expected of you are for the following five things reading Simon's programming projects to exams and an extra credit So let's go through all these so for every single class. We're gonna have a mandatory reading And on this in the schedule on the website, there'll be a little star next to it I this is what you're required to read and for every every reading you have to write a short synopsis Just five sentences that describe what the paper is about and you have to do this before the class And there's the link there you can submit online To the Google form And then the idea here is just that you read the paper before we go to class and that way That way, you know, like you can ask questions that about the technical aspects of the paper and not the And not ask like trivial things so the the synopsis needs that have three sentences that Provides an overview what the main idea of the paper is about and you talk about what system that they used for the evaluation And how they modified it for the experiments And Then you have a one sentence to talk about what the workload they used how they evaluate it the idea that last sentence Or the workload is that when it comes time to do your final project You can then go and say alright I this paper described the thing that I'm working on and I know how to go I know what workload they use so I can do the same thing So again, please do not plagiarize these You cannot just find text on the internet and just copy it in there. You can't copy from your friends If we can we catch you plagiarizing we're gonna have to report you So the other major part of the grade or actually the bulk of the grade is gonna be on the the programming projects And so in the old versions of the course, we had a system called Peloton We were building at Carnegie Mellon that all the projects were based on in In this class we threw away all the code and I'll describe this later on but we're starting What Look, I don't have That was that was That was that was Yeah, so the the the first programing project is gonna be on our new system as I said and This would be done individually and the idea here is that we're gonna you can everyone's gonna work on the same part of the system for a project and you're gonna learn how to Build out and optimize a this one of the pieces of the storage component And then we'll teach you how to do profiling and you know running Valgrind and perf and call grind to understand where the bottlenecks are in the implementation how to improve them the The the third project will be Sorry, the second project will be a group project of three people And this is where you're gonna pick some topic That is relevant to the things we're discussing during throughout the semester And you're gonna implement that in in this the system and the idea is that you know This is the bulk of the grade to the for the course and this is a So we a significant programming effort on everyone in the group and obviously that you know You can't have two groups working on the same thing So the the final topic of the thing that you're gonna be able to pick will be it has to be approved I mean, but I'll of course obviously propose a bunch of different ideas of things you could pursue So we're not the word about this yet We'll worry about this later on in the If you come back from spring break Of course again, same thing. Please don't plagiarize. You can't just copy code off the internet. If you're not sure, please ask me So more than this, you know writing a programming project and throwing all the code at the end for this final project of the project to that it really is about Going through the process of building a piece of software in the distance system and having being able to be Usable by other people. So at the end of the day, you just don't turn in your code and get a grade and you're done there's gonna be a bunch of steps along the way of things you have to turn in to To prove that you've thought through and understand what you're actually proposing to implement So there'll be a proposal part where you provide a specification about that your your project how you think you're an implement it and then there'll be checkpoints along the way until the end of the semester of Things you'll need to provide about the show that you're actually making progress towards your your proposal There'll be code reviews where you'll have to be assigned to another group and they'll they'll review their code You'll review their code. They'll review yours. Then you have to address their requests you have to do performance analysis and rigorous testing to prove that your Implementation is working correctly and that would be a final presentation at the end of the semester To sort of a party and everyone can talk about the things that they actually implemented The last piece though is very important the co-drop So the way it works is that you won't get a final grade at the end of the semester Unless your code can cleanly merge into the master branch All right So it doesn't mean you just go fork our code and make whatever changes you want It actually be actually has to be able to put you put back into the full system Right because again, this is part of the engineering effort of working on a real system so the I would say our success rate of Projects in previous semesters that have merged being able we that we actually took from the class and merged into the master branch Has it been about 50% so Which I think is actually pretty good. So about 50% of the projects end up being something we put into the full system So that would be a a lofty goal for yourself There's two exams for the class. There's a midterm exam that'll be In class on March 6th, and then there'll be a final exam. That's a take-home exam That'll be given out on the the last day of classes Again, I'll have more information about what the was expected to you on these exams as we get closer to them The last piece is a extra credit assignment. So this is completely optional but if you want to Just for a 10% boost in your final grade. There's a You can write an article for this encyclopedia of database systems that we've been building here at CMU Where you basically pick a system that no one else has studied before At CMU and then you write a Wikipedia style article about it with citations and attributions for and you know The information that you collect from it, and that's sort of different than what we keep how Wikipedia does things is that I said just having freeform text in the You know in describe what the system is doing it's actually structured you can say you know They're doing your the Concertual scheme they're using is two-phase locking with deadlock detection and here's how it's actually implemented so this is the third semester now that we've done this and so These there's still many systems people can pick from The way to think about this is like if you pick an obscure system that may not have a lot of information It's gonna take you more time to find that information And so that's that's you know, that's one effort But if you're doing a system that's widely used and well known then the expectation is that you have to be very thorough in in your In your article, so again, this is entirely optional. This is extra credit This is something that again, I'll just discuss more later in the semester And again, please, please do not copy anything In in this extra credit article that you're writing If you copy things from Wikipedia stupidly and we find out and you again, you will be reported for plagiarism You just can't copy text from other people thinking that they wrote it better than you could and so therefore You know, you'd be dishonoring them by copying it. Do not copy. Do not plagiarize. Okay? All right, so again the breakdown for the semester is as follows the reading views again Are due every every class and they're 10% of the grade Project one is 20% project two is 50% and course that means again 70% the final grade of the course is for the programming projects And then midterm and final exam for each each 10% So there's the course mailing list is entirely on piazza. So please use this if you have any technical questions about the projects if there's deeper conversations you want to have about Personal issues or other problems you may be having put please don't post on piazza Uh, but you know send those directly to me Again, the idea is that if you have a question about something in the course In a in a programming project Post it on piazza so that everyone can see it don't send personal emails because that way we don't get you don't want to have people emailing the You know sending the the same question multiple times The last piece is I want to thank uh, s.a.p for sponsoring the course and helping out with course development because of them we Have been able to hire an extra ta that's going to help us actually write the the The lecture notes and hopefully build out something that's Better than what we had before so again, we're really appreciative of them uh, we're going to have somebody come at the end of the semester and give a a guest lecture about The s.a.p. Hana, which is their in-memory database that they've been building for a while now Um, and we'll see as we go along. There's a couple papers that will come up that are from the s.a.p. Hana team That are relevant to the various topics that we're talking about all right, so uh with that The that's the overview of the course. Let's now get into Yes, the course material that actually the me the things so, um, I want to sort of start off by talking about in-memory databases and sort of describe What makes them interesting and unique and why we're actually going to spend the entire semester talking about databases in the context of in-memory database systems and how it may be different than What you have learned in a introduction database course so when you look at the history of Database management systems going back to the 1960s um a lot of it it has to do with with the The systems having to overcome with the problems of of or the deficiencies of hardware resources um, and so A lot of the what I call traditional disc oriented database systems their design is predicated on assumptions about the harbor that you know that that have existed Way back then And so when you take like an introduction database course In some ways it is most you know most introduction database courses it's It's it's sort of tailored toward architectures that were developed Back in the 1970s right when the first database systems like system are And ingress were were first created um, and the hardware that they had back then looks a lot different than the hardware that we have have today So in particular like back then they were these giant monolithic machines that had a single single socket cpu with a single cord like a uniprocessor um The amount of ram that they had was extremely limited We're talking like, you know in kilobytes And because of this you had to store the database on disk right if you want to be able to store anything of meaningful content For an application you had to go to disk because ram was just too small or too expensive and But you know disks were significantly slower Uh than they are today. I mean they're slow now, but it was even worse back then um, especially, you know, if you're dealing with mechanical devices and not like solid state storage and so they The architecture of these systems is really predicated around having to deal with moving data in and out of slow disks So but where we're at now today in in our current state of state affairs in hardware is that D ram has gotten large enough and cheap cheap enough where Most databases can fit entirely in main memory Um, I mean certainly there's gonna be outliers. There's you know, there's data warehouses There's there that are in the petabyte range that you wouldn't want to put in um You know in in entirely d ram but most structured databases are Small enough in the orders of you know gigabytes or a low terabytes that you could actually fit it in a In a database system that was entirely in main memory It's the when people think of like big data really really large data sets A lot of times these are going to be what are called unstructured to semi-structured databases So unstructured would be like an image like it's a Facebook has a repository of all the photos people have posted That's unstructured data, right? There's no like it's just raw in the raw bytes of the images You can't really run queries on them directly semi structure would be like log files generated from web services or applications Right that they're like sort of human readable text fields and you can obviously convert them to be structured But then you know A lot of times you just sort of leave them as they are so We're going to be focused on on the first one the structured data sets and you know These are things that people would never think of like when they think about databases And so these are going to be things that for the most part will be small enough to fit in in main memory So Before now we get into discussing what an in-memory database looks like you may be thinking like well, I took an introduction database course I know about uh, you know how those systems are designed Uh, discordant systems are designed And they have a buffer pool that they can use to cache pages that they fetch from disk so If I'm saying that most of the database most databases could fit in main memory Couldn't I just give this? You know traditional discordant database system a really large buffer pool cache And would that be sufficient to get the kind of performance gains I would want from in-memory execution And so the They understand why this is not the case Let's now discuss what it actually a discordant database means So the definition I like to give for a discordant database system is one where the primary source location the database is going to be on non-volatile storage so like a spinning disk hard drive or a solid state store drive flash drive and the Means that the system is going to assume that any time it needs to read data That data is not going to be in memory. It's going to be out on on on disk And therefore the architecture is predicated on this assumption that like at any time you do a read for a tuple Or read for a page for any any piece of data That's not going to be in memory. Therefore, you're going to have to go out the disk and get it And so the way the databases will be organized is through what are called slotted pages It's just a way to pack in Tuples into a single page and then as we as we get the data that we need from disk We'll bring it into our in-memory buffer pool that's the act as a cache and we have to do this because We can't operate on the the data directly on on memory every every time we are sorry directly on disk We always have to bring it into memory and then process it or do whatever manipulation that we want on it And then we can write it out out the disk so the database management system architecture in some ways is It's designed to to move this data back and forth from disk into memory and to Ensure that any time I try to read something if it's not in memory that I know how to go out and get it so the Again, the core piece that's going to do all this for us is the buffer pool again As I said when a query goes access as a page maybe to read a tuple Then the database is going to go check to see whether that page is already in memory in its buffer pool And if it's not then it goes out the disk Copies it into a frame in the buffer pool Which is sort of a free location that that can store some page And then hands off that that that pointer to the Whoever whoever asked for that page so that they can do the processing on it Of course now the tricky thing is that memory is finite and and limited so We may be the case that there's no free frame for the for our new page We want to bring in so then we got to run this eviction now. We're on the other side What's the best page for us to remove that's going to be the least likely to be used again in in the near future Of course now if we choose a page to write out or to effect from our buffer pool and that page has been modified By another transaction that we can't just drop it right we have to then go and actually apply that change out back to disk So then once we we've done this we find our free slot our free Sorry a free a free frame we can put the page we want in there then we can update whatever Whatever indirection layer that we have to translate the on disk location to the in-memory address for that page So let's visually see what this looks like so let's say that we're doing a query that's going to do a look up on an index to find a particular tuple So we traverse the index and we reach the bottom the leaf node assuming it's a b plus tree And we're going to get a pointer to the tuple that we want and this pointer is going to be a page id and a slot number So in order to translate this page id and slot number to an actual member of the location That we can then access for that page. We have to do a look up on our page table Page table is basically a mapping from page id to frames in our buffer pool And so let's say in this case here Uh, when we do this look up we we see that the page we want is actually not in memory. It's actually out on disk So that means that we have to go bring it copy from the disk and bring it to the buffer pool But right now our buffer pool is is filled up, right? There's only we only have three frames everything is being used So we have to make a decision on which one we want to to to evict to pull it in So during this process, we have to take a latch On the entry for our page in the page table to make sure that nobody else comes along And tries to to do the same thing that we're doing at the same time We don't want somebody else to say all right. I wanted this page It's not in memory. It's on disk because let me go also try to go fetch it in separately So let's say now we run our eviction algorithm This could be some variation of clock or lru Or we decide that this is page two for whatever reason is the one that we actually want to evict But let's say that page two is actually dirty like it was modified by a transaction All right, so then we got to go make sure we write out its changes out to disk Then we can then remove it and of course we've pinned this We've we've taken a latch on this frame to make sure that nobody else tries to do the same thing We're doing at the same time Once page two has been evicted then we can go copy page one into it Uh and then update the page table to now point to this So then now anybody that comes along and may want to do the same look up on page one We'll see the memory location in the buffer pool that has the data that they want And then once we're done with everything we release all our latches So the key thing to point out here and again the the question we're trying to try to solve question we're trying to answer is If we just give the database system a lot of memory for a large amount of buffer pool We never have to evict anything is that going to be Would that be good enough for for us to get good performance? The answer is going to be no because of all the steps we had to do just to see whether the The page you won it was in memory or not And this is what I was saying in a disk oriented architecture The system is makes the primary assumption that the the primary location of the database is on disk So it doesn't think it the thing that it needs is going to be in memory So that's why it always goes to the page table to see where it actually is is located And you take latches along the way to make sure again, you have you don't have issues of other threads trying to do the same thing at the same time So means you're always doing this translation from from the record I need to the memory location and you're pinning things as you go along Right, even though everything could always be in memory And the reason why they do this is because just because things fit memory now They don't know you know tomorrow you're not going to come in and load in a bunch more data And things start getting swapped out the disk right so think about it If I if I had a machine with 100 gigabytes of ram Uh, I give 100 gigabytes to my sequel. I load a 90 gig database that fits entirely in main memory My sequel is still going to treat it as if it could swap out the disk because Doesn't know that tomorrow you're not going to come in and load another you know 100 100 gigabytes of data So now as part of this The the next issue we're going to face in a disk oriented system is is it's concurrential scheme So again, concurrential is the protocol the system uses to allow Some transactions to run simultaneously or queries run simultaneously and modify the database at the same time without causing um logical errors in in the database So in these older disk oriented systems They the concurrential scheme was sort of designed that they because they assumed that At any time a transaction that could be accessing a piece of data That piece of data could not be in memory and it had to go out the disk and get it and therefore that thread would have to stall So what they want to do is they say while one thread stalled going to do disk.io They can allow other transactions to run at the same time Uh, assuming that their data is in memory and a lot of them still make forward progress And so the way they're going to do this is that they're going to set locks and other uh constructs that will provide the asset guarantees from transactions and Because this these locks had to be stored in in you always want these to be stored in memory So that you can always keep track of What transaction holds what lock on what data these locks are being stored in these separate data structures these separate hash tables To avoid them being swapped out the disk And we'll see later on we talk about in-memory databases Since they a tuple may not be swapped out the disk Uh, they can actually store the locks in the same location as the data And that reduces the number of lookups So you have to do to go see whether you're going to acquire a lock on something the Next bottleneck is going to be the logging mech and recovery mechanisms in the system so again from from the intro course a Discording database system is going to be using the steel no-force buffer pool policy steel means that a Page that was modified by a transaction that hasn't committed yet is allowed to be written out the disk And no force means that they When a transaction commits you don't have to flush all the 3d pages from the buffer pool And so the way they implement this is through the right ahead log so The tricky thing with the right ahead log is that because that's getting flushed out the disk Separately from the data that that's been modified in the database Right, so the right ahead log contains entries about how transactions modified pages And then that those log records can get flushed out And then the the database pages that were modified that correspond to those log records They can get flushed out separately because now you have these sort of sort of Separate mechanisms for how things are getting written out of the disk. You need to keep track of The whether the log record that modified a page whether that's been written out the disk First before you you can write out the the page that was modified first And so the way they do this is through log sequence numbers or lsn's So the so this concept of lsn's permits all throughout the system To make sure that they know what log record and what what log record was responsible for modifying what page And what and what how far in the right ahead log data has been written out of the disk so Of these these sort of three three key concepts um The question now becomes of which of these is is going to cause the most slowdown in a disk-oriented database system um for an memory workload so they there was a study done from the H-door project, which I was a part of when I was in grad school back in 2008 where they Measure how much time was being spent or how many instructions were being uh spent Are executed for these different components of a database benefit system So this was done on a disk-oriented database system called shore Which is like a early storage engine out of the University of Wisconsin um The way to think about shore was like it's like berkeley db or rocks db Like it wasn't a full sequel engine, but it had a core storage manager I'm gonna had all the features you could expect from a modern database management system And then the instrument of the code to allow them to measure how much how many instructions were being executed for different parts of the system For a database that fit entirely in main memory. So they they gave the system enough ram. They loaded it entirely the entire data set into the database system And then they then they ran a transaction zone and measured how much time would be being spent in the different parts So the categories are going to be the buffer pool management the latching the locking logging and Doing comparisons of the b b plus tree keys And then the last piece will be the amount of real work being done. So the actual the work being done actually execute queries So what they found is about 30 34 percent of the instructions were spent in the buffer pool Right, this is the part we talked about of doing the look up in the page table Setting latches along the way to make sure that you're protected Another 14 percent was the time we spent in Latching mechanisms for the internal data structures of the system So think of this as like again going doing a taking a latch on the lock table in order to execute transactions Um 16 percent of the instructions are spent on the locking mechanisms 12 percent of time was spent in the logging. So preparing the lsn's and the log records And then about 60 percent of the time was doing comparison of the b plus trees So this leaves us only seven percent of the the total time left over where we're actually actually doing real work to execute queries Um, which is it's not a lot So the way to think about this is all of this these are the parts that aren't the seven percent These are the parts of the database maintenance system you have to have Uh Because you could assume that any time the database or the data you need access is not in memory Is that on disk? And so this is why we have to start over from scratch. This is why we Want to build a new in-memory database management systems That don't assume the data is on disk and therefore it can take advantage and apply different optimizations For because we know everything's going to be in memory and that this this sort of pie pie chart really lays out what the whole point of this course is how do we actually Uh, make that seven percent be much larger by assuming data is in memory and execute execute things more efficiently So an in-memory database is one where the system assumes that the primary source location of the database is in memory So as I said before this means that when a transaction or query does a look up and says I want this tuple The system is not going to assume. Oh that thing's out on disk. Let me go prepare to go fetch it from disk It says it's in memory. I I know I can go access it efficiently And so this is going to allow us to make a bunch of different design decisions Uh, that are going to be optimized for this environment So in memory databases aren't new Uh, they go back into like the 1980s. There was a lot of early work done at the University of Wisconsin I'm building the sort of first prototypes of of these types of systems in the 1990s We'll see in a second they some of the first commercial in memory databases came came about But it's really probably been in the last decade that these types of systems have have really started to take off As I said, this is because the the price at DRAM has dropped enough and the capacities are large enough that Most if not all of the databases in the world can fit entirely in main main memory So but just because you're in memory doesn't mean magically everything's going to go faster Just like a you know, if you have a database and you make it distributed, it's not going to be magically Uh become super high performance There's a bunch of other bottlenecks and other issues that we're going to have to deal with So basically because you remove disk from the equation except for for logging and recovery There's some other things now rise to the top that we're going to have to deal with And again, that's what the some of the major points We're going to talk through the semester how to how to deal with these issues So locking and latching doesn't you know, still problematic that can limit concurrency Cache line misses are actually going to become a big deal Um Pointer chasing, you know doing multiple layers in direction to find the data that we need is going to kill us Doing a predicate evaluations if you say you're running a query on a billion tuples The where clause can actually become a big bottleneck Moving data around copying data between different layers as we issue and then networking will be a problem for the The communicating between the the application and the database management itself And this is why people usually you know look to using store procedures Um to overcome this issue So some core numbers important numbers to keep in the back of your mind As we go along throughout the semester Is the storage action agencies of the different levels of Devices or storage mechanisms in in a system And again on the one end of the spectrum if you look at like they're spinning this hard drives, you know Every disc seek and and and right it's going to be You know in the the low number of milliseconds up, you know up to 10 milliseconds for for older drives But now when you start going closer and closer to the cpu the the the latency increases a lot Or sorry increase latency increases By a lot Right so now in the again instead of doing the 10 millisecond look up to do a read when something's on a spinning this hard drive Now we can do a 60 to 100 nanosecond look up when everything's in d-ram And so because of these numbers are so small like this really changes is how we're actually going to approach the database and and execute queries So there's a great uh Great metaphor that jim gray always likes to you used to like to use that I like to use in my classes That is sort of describe Just the order magnitude difference and performance that we're talking about here So you can think of like reading something in l3 cash Is equivalent to like reading a book that's right in front of you in the room on the table But reading something from a spinning this hard drive is like Flying out the Pluto to go read the same book, right? this So the the microsoft Scale these numbers may be hard for us to reason about but when you really think about it like these performance differences from l3 and d-ram is quite significant Then it's been this hard drive and therefore because we don't assume that we have to go out the disk Let's be really slow This allows us to redesign our system our database system and and perform better All right, so one of the other things we're we're going to talk about through this semester Okay, how things change in memory database So first of all we get rid of the idea of slotted pages Because we don't have to worry about packing data on two pages on disk But we are going to change Slightly how we actually organize the day itself And they don't need to actually be, you know, physically close to each other in in in memory in the same way They have to do on on disk So instead of having a record ID, we can actually use direct memory pointers In many cases we can split up the fixed length and variable length data And we may want to use check sums now to ensure that the software is not going to trash our data So the it's not to say that we're not going to still want to organize Our database still in blocks or pages because the os is still going to do this But the way we're we're going to do it will be slightly different All right, so we use this high-level example say we have our same database The same query we had before we want to look up on the index right now in our index instead of getting a Page ID and a slot number We're going to get a block ID and an offset that we can use to jump directly to memory to go find the data that we want And so there'll be sort of two set of memory pools here So the first one we fix length data So think of things like, you know, 32 bit integers 64 bit integers things like that that will be all contiguous with each other in memory Assuming that we're having a row store, but any time we have a variable length Attribute like a varchar or a large, you know, text field or things like that Instead of storing the data in line with the the the tuple in the fixed length data blocks It'll just have a 64 bit pointer to the data out in some variable length data block right again in a disk based system, there's maybe a bad idea because Now when you do a look up if you split up the variable length data to go grab a tuple you may have to do Uh, two fetches you'll get the one page that has the fixed length data And one fetch to get the variable length data, but now in in memory database, we don't care about this It's just a de-reference to go find the data that we want You may be thinking also too that uh, why even bother with a storage organization of a memory database Why not just let the the ls manage this with mmap memory map files? um and The answer is you can kind of get pretty close to maybe what we actually want um To you know, we're using using hints like minvise and m sync m sync um But in practice, this is actually not going to be practical now underneath the covers when we malloc a big bunch Big bunch of memory the os is going to run omep for us to do that But when but we don't want the os be responsible for writing things out to disk for recovery That all that stuff we want to manage ourselves So there are some notable mmap databases in the wild or actually in the real world that people have been uh, have developed Um, probably the most famous one is is mongu db before they bought wire tiger in Uh, a few years ago They had a storage engine that was that was entirely based on mmap mone adb is a academic uh, analytical system out of europe Um, that's the column store system that uses mmap LMDB is the lightning memory map database. Uh, that's a sort of in memory key value store like berkeley db Um, that uses mmap and then mem sql uses mmap for their For their column store system as far as I know they're still doing this so what i'll say is that If you have the right data out the disk and you want to do transactions on updating the data The database you don't want to use mmap because the os isn't going to not understand what the hell you're actually trying to do in the data Uh with the database and you want all the eviction and and caching policies to be managed by the databases because it knows what careers you're executing For things like mone adb for example, or mem sql They're using this for read only or read mostly data And the os is is not going to get in the way too much I would say this is a ongoing passion in my life of Of proving that mmap is a bad idea for your database system This is something that we're working on here this semester. Um, so hopefully I'm more to say about this So basically for any little workloads mmap might be enough for Transactional operational workloads. You don't want to use mmap. What do you think about this? MongoDB raised a ton of money and they went ipo a few years ago Uh You know, they hired awesome engineers They could have done anything in the world in a new system to make mmap work better And they decided to ditch mmap and went with wire tiger. So if anything that's anecdotal evidence to show that mmap is a bad idea Again the main takeaway of all this is that the if you let the os manager memory for you uh At the level we need in a database system the You're just gonna run into so many problems because you're basically giving up fine grain control of of the of memory Now we'll see and as we go along through the semester There'll be some in more than just allocating memory and deciding when to flush out the disk The os can be used to decide where to actually place memory And again, we'll see that the the database that we can do better than what the os can do and the main takeaway from all of this is that the the The database system is always going to know best for your application because it knows what queries you're executing You know what your data knows what your data looks like So this is something that uh, we just want to manage ourselves. We don't want we don't want to give up control for this All right, so now again going back and discussing what what aspects of a memory database says are going to be different than what's been done in a disk-oriented system Uh, the first thing we talk about is concurrency control so because now we assume data is in memory And therefore the cost of accessing that that data Like a tuple is going to be You know, it's going to be known in advance Right, it's going to be bounded by in access to memory The cost of going doing a look up to go acquire a lock for that tuple It's going to be the same thing as going accessing the data itself So we might as well just go just go access the data ourselves Right, so in the disk-oriented system again, they had a separate hash table for the the locks and lock manager But now we're in some some of these protocols we'll talk about through the semester They're actually going to store the locks directly in the tuple itself So how we're in memory database is going to end up detecting conflict Uh, it can change we can do sort of more fine-grained locking as we'll see in the hyper system That's going to allow us to have better concurrency control But then we can just have coarse-grained locking like in both db or h-door Where we're going to have fewer locks Uh, that you know, it's going to be less overhead for us to maintain those locks But it'll limit concurrency. So we'll see as we talk about concurrency control that would be this sort of trade-off and deciding Um You know how much What granularity do we want to have transactions to you know incur conflicts and and reconcile them? Right again, I said this before but the the we can actually store now the locks directly in the tuple itself And this will improve casual cavity And we'll see that we can use compare and swap techniques To efficiently acquire locks and things and the ways the ways that we couldn't do with a separate lock manager hash table So the main thing that we're going to see With an enemy database And the concurrency control schemes is that because no longer is the issue going to be that A transaction stalls because it has to go to go out the disk to get something It's going to end up stalling because another transaction is running at the same time Trying to access the same data it's trying to access and therefore it has to wait for it to finish So this is a combination again of everything being in memory And also that we have way more cores on a single box now on the way that we that the older system didn't have For indexes we're actually change How she designed them to be more high high performance for in memory databases So we'll see some examples from the 1980s when DRAM speeds were roughly the same as cpu cache speeds And they designed them these design specialized memory indexes that Could be better than what the sort of traditional vplus trees could be But we'll see that is that the the vplus tree actually works still really well There's some techniques and tweaks we can do to make it work better for in memory databases But the high level architecture of it is still going to be the same and actually the modern like lock free data structures That we'll cover later on our latch free data structures these things actually don't perform well that well at all because because the overhead of of Of redoing operations that fail because of conflicts Tends to crush performance So again, the way to think about this is that a lot of the techniques we learned from an introduction Discord in a database class will still be relevant to For in memory indexes, but it's going we're going to tweak some of the assumptions we make about them the other thing i'll point out too is like in a disk based system when you have Transactions that modify tables and those tables have pages and you modify it therefore modified the indexes The system would actually restore the updates for those indexes on disk Because you want to make sure that if you crash come back you can rebuild You know you can have the index all ready to go without having to read the entire database again in a memory database, they're actually not going to Log any updates to the index because if you crash and come back or restart the system and come back You're going to load the database entirely in main memory all over again And therefore you might as well just rebuild the index as you read it And so that's a voice the overhead at runtime of having to do Little log changes to the indexes For query processing some of the strategies we're going to use are going to be slightly different um sequential scans are Not going to be significantly faster for then random access like it's not in the order to magnitude that it is for uh, we're not already back to but like the the The significance difference you have from a you know the spinning disk hard drive doing random IO versus sequential IO um So this means that we are going to have to change some of the approaches we have for how we access data But certainly for doing uh long scans on columns The sequential scan is still is going to be better because we can read a bunch of data That may be compressed at the same time But one thing that'll be different is that instead of doing the tuple at a time approach that we use in a um You know again a traditional database system like the volcano model calling next next next on the query plan We can um use different approaches that are designed for the type of work that we want to execute on So and this is because now if we're accessing a lot of data It's actually the function calls in the software itself We're actually going to become the bottlenecks and not the actual and not disk IO because we don't have that anymore Again, so we'll see this as as we talk about query processing methods or query processing strategies Because there's different approaches we can use because we're in memory And the last one is for logging recovery Because the right head log we still need the right head log because again we we you know system crashes We lose all the contents of memory anyway. Um But there's techniques we're actually going to be able to apply where we can use be be more lightweight and what we actually have to log Um because we know that we're in memory So for example, we may not have to store undo information because We never have to undo our transaction because there was no dirty pages that got written out the disk before transaction actually got committed So the right head log only needs to store redo information so and because we don't have any more disk pages with uh It could get right on the disk. We don't need log sequence numbers So we don't need to maintain these things all throughout the system So again, just like in curtsy told we'd have more lightweight schemes because we're in memory Logging and recovery we can do the same thing right for checkpoints, uh, again, there's different techniques we can use, uh That we'll cover throughout the semester and the main idea here is that Because there won't be any dirty pages We can be we can so much stream data directly out the disk Um, we have to make sure that we you know, we're not seeing torn pages or trying to update some transactions We'll be a little more careful about how we actually do this But again, it's a more lightweight scheme than you would have to do it in the disk-oriented system Well, one thing I'll also tell you real quickly, uh, we'll have a whole lecture on this but For the most of this semester, we're going to assume that the again the database is beyond disk Obviously that's more expensive for some some workloads or some or for actually most applications because DRAMs are expensive And you have to maintain, you know, if you've maintained power to the machine in order to actually retain any data So There'll be some techniques we can look at in one class on how to actually maybe bring back the disk in such a way that doesn't slow us down and Well, I haven't you know, just re-architect the entire system To assume the database is going to be on disk So this would be a Something we discussed in previous semesters, but I'm pushing it to be closer to the beginning of the semester When we talk about storage again, the idea is that If we assume the hot data is going to be in in main memory And we push cold data out the disk as as it's becoming less likely to be updated For all that queries, can we do this in such a way that doesn't slow down the entire system the way we saw in that pie chart? So again, we'll discuss this more For the bulk of the conversation this semester, we we're not going to assume this but I think It's important at least to think about and understand as as we go along All right, so with that I want to finish up talking about uh, just sort of three In memory databases that I think are notable to discuss Again, these things were sort of early pioneers in this space They're not systems. We're going to talk too much about during the semester um the But I think it's at least it's interesting to sort of bring them up now and to sort of understand How do we get to the point we're at now? so we'll talk about times 10, uh, dolly or data blitz from AT&T And uh p-time these other ones here like hana volt eb hackathon silo hyper mem seagull blue and geode These are all in memory databases that we're going to talk all throughout the semester I will describe the techniques that we've developed That are being proposed in the context of these systems and some of these are our academic systems. Some of these are commercial systems um But they're all been developed or worked on in the last in the last 10 years Whereas these these three that i'm highlighting here they go back to the the 1990s All right, so probably the the most famous emery in database at least for the longest time that the You know, we said what's an emery database? The one that everyone names is times 10 So times 10 was developed by hp labs in the 1990s. It was originally called small base And then hp labs spun it out as a separate startup and then oracle bought them in 2005 So you can still get it today. Um, you can still still operate as a, you know, standalone database minute system But oracle now also sells it as a cash Like a front end in memory cash for the um for the oracle database like the the the flagship database management system called oracle So what's sort of interesting about? the times 10 Is that it was a multi process shared memory database minute system So we're not going to see very many shared memory multi process systems The semester And this is because everyone, you know, now that threading packages have gotten good enough and standardized that There's no need to worry about, you know, how am I going to operate on, you know, different versions of unix, right? Everyone sort of targets linux Um And you know the linux threading model is is good enough for what we need So back then in 1990s threading packages, you know, it was it wasn't p threads or posix threads It wasn't standardized as much as they are today So to make sure your system could run on different variants of unix you did you did uh, you did forking with multi process for to have different workers So they were using two-phase locking in a single version system Um, and what was really interesting is that because they recognized in the 1990s that oh memory is expensive memory is limited So they were doing uh dictionary encoding columnar compression for the database even even though it was meant to be a operational database system So when we talk about compression later on the semester, we're not actually not going to see Uh, very many systems at least on the OTP side, they're actually going to do compression even though they're in memory Right, the the compression stuff is mostly going to come up for the the column stores of the the OLAP systems So the next interesting system is is dolly. Uh, it was then later renamed data blitz um, I think this is still around ATT developed in the 1990s to for like telecom systems So it had to be sort of high performance to handle like incoming data from from phone calls um I think it's still around the website might be down But it's it's it might be still be used in production, but I don't think it was ever really commercialized It wasn't like something you can you know, you can download and buy from AT&T I think it was sort of shipped in their technical gear So like times 10 it was a multi-procession memory system. Um, that actually used a map for for storing their data and what I'd like about dolly is that they Were paranoid about their system being the database being corrupted by Bad software written by other people so it was like this it would be like the storage manager that you would link into your library and I was like sequel liar rocks db again, and you can make a request to it um And it would process your data, but it was still in the same address space So if you had crappy software application software that was then using Their their storage manager that crappy software then sort of trashed random random locations in memory um This could then end up you know taking down you know corrupting the the data itself So they would take checksums for each page and make sure that on every single read that the checksum was uh Was was was always correct, and then if it found out it was incorrect They would go out the log and and recover the page from that log um So this is a lot of overhead to do this But you know they were being trying to be super cautious about their data on it and I find that quite admirable All right, the last one is p-time Um, I met the guys that invented p-time uh last year and good. I always thought it was p star time They said it was which is called p-time So this was an in-memory database out of south korea from the 2000s Um, the south koreans are very good at at building in memory databases all the bases is another interesting one that that came out of That came out came out of that country um So the this is one of the like the the early in-memory databases from the 2000s that recognize that you can redesign this How to redesign the system to get better performance than a disk-oriented system for transactional workloads So it had a lot of interesting aspects of it that are still relevant to today Um, how they do uh different differential encoding For maintaining log records to reduce the amount of data you have to store For anytime your transaction updates things They could put support hybrid storage layouts So you could have a row store on a column store in a single system And they could also write data out the disk as for cold storage Um, which is again the larger the memory stuff we'll talk about later in the semester So p-time wasn't around for very very long They had some early benchmark numbers that were in that was in a white paper that was pretty impressive I even still for today for what they were able to achieve Um, they were bought by sap in 2005 Uh, and then a code base was merged into sap hana So when you execute transactions on on sap hana Some portion of that of that system. There's actually the the original p-time code Um from from from way back in the 2000s all right, so to finish up the You know, this is kind of a wild wild lecture So I apologize for that but the The main takeaway I want you to get from all this is that the The entire semester is really about how looking at how how data systems can be designed differently If you assume everything's going to be in memory And we're at the point it's 2019 You know this I don't it's not an issue at all now 10 years ago it sort of was where where people were not comfortable the idea of having Your database run entirely in main memory Um, but now everyone, you know, it's it's there's enough commercial products There's enough buy-in into this concept that it's not an issue So the other thing we'll also talk about too is that the M-app is a terrible idea if I die you can put this on my tombstone never use mf for your database Um, we'll discuss more throughout the semester. Why this is the case and this is ongoing research for us at carney mellon All right, so, uh, I gotta go take a shower. I'll clean myself up. It's it's The mess um So next class we're gonna talk about, uh, the sort of More complex topics about transactions than what probably was covered in your In the introduction class we want to talk about different transaction programming models More than just sort of the flat transactions that we normally talk about when we discuss this Uh last semester that was about isolation levels and then we'll talk a little bit about modern concurrential implementations So the paper you've been assigned to read is a survey paper that I did a few years ago on looking at how Modern concurrential protocols scale to a really high concurrency or really high, uh core counts And to identify what the bottom necks are on on sort of future hardware So this is sort of an extreme case, but the idea is to begin to sort of think about what, uh How these these system could actually perform when you have a lot of simultaneous operations running at the same time So, uh, as I said also to make sure you submit your first reading review on wednesday before class Um, and then I'll post that lecture Um, uh later on so at this point, uh, we got to leave reno um, we're gonna head out of town and and Whatever motel we end up next I'll I'll film the next lecture. Okay. All right guys. Um, so with that, uh, Take it easy. See you So I could kill it