 Today's lecture is now the second part of talking about data structures and databases. So originally I had these two next lectures titled as order-preserving trees, and I decided to rename them as tree indexes because what we're really going to focus on today is using these trees as indexes. Although we can use them in other parts like we talked about before, our primary focus is going to be tree indexes. So this is the same slide I shared last time when we started talking about hash tables and we want to talk about what are the different parts of the database system we can use hash tables with, right? So we said it was internal metadata, right? You guys are doing this for your buffer pool project, you have to build a page table to map pages to frames that's being done as a hash table, that's an example of an internal data structure used for metadata. We can use this for the core storage of the system so you can use a tree or a hash table as the actual place where you store tuples, right? So instead of having like the, like in the hash table example we had a key mapped to a value and the value would be like a pointer to a tuple, instead of having the pointer we just stored the entire tuple inside of the bucket in our hash table and we can do the same thing in trees. We can use them for temporary data structures so when we're doing, running a query, maybe we want to build a tree or hash table on the fly just for that one query populated with the data that we need, then do whatever it is we need to do to process that query and then blow the whole thing away, right? And that's actually, in some cases, much faster than having to do sequential scan over the table over and over again. And as I said, what we're going to focus on today is really using them as table indexes, right? Because when you're going to use a B plus tree, which is primarily we're going to talk about today, actually the only thing we're going to talk about today, when you create an index in your database you're almost always going to get a B plus tree. So we never really define what an index actually is. Most of you probably are, you know, from the first homework assignment doing SQL, some basic knowledge of databases, you might know what an index is, but now we can actually formally define it. So a table index is going to be a replica, if you will, in an auxiliary data structure that will contain a subset of the table's columns. And we're going to organize them in such a way that it's going to make it efficient for us to access the data at those columns, right? Again, the relational model says that tuples can be stored pretty much in any physical form, but for certain queries, having the right physical form or the right ordering of our columns and assorting of the data can make things be more efficient. So this is what table indexes are going to provide for us. The easiest metaphor to think about to understand a table index would be think of like your textbook in the class, right? It has a couple hundred pages. If I want to find every page where the word B plus tree is listed, I could just look at every single page one by one, right? That's the equivalent of a sequential scan, until I find all the entries that I want, or I can go to the back of the book, and there's a glossary, I just find the keyword B plus tree, and then it's going to give me a list of the pages that contain that keyword. That's essentially what the index is going to do for us. It's going to make us be more efficient to find data on a particular number of columns. So it's going to be the database management system's job to make sure that the contents of the table and the contents of the index are always logically synced. And so I'm using the word logically in purpose here to say that physically it may not actually be in sync, right? We might delete a key from our index, and we don't actually delete it from the actual data structure just yet, but logically when I write a query against that index or I look in that index, my application doesn't see it, right? So it's the job of the data management system to keep these two things in sync so that you don't have spurious reads, like false negatives, false positives, right? You see things that you shouldn't see or you miss things that should be there. Again, then the database system is going to manage all of this for you automatically underneath the covers. Sometimes you see in some tutorials for more primitive NoSQL systems that don't support indexes as the first class entity or first class component of a database system, you see people writing application code to maintain a separate table that's an index for your application. And it's up for the application's job to keep those two things in sync, right? That's a bad idea. Should let the database system do this for us because it can do it much more efficiently. It can know that when things change, what indexes it should actually update and just do it automatically for you. So we won't talk about this today, but this will come up later on when we talk about query optimization or query planning. It's up for the database management system to figure out which is the best index we're going to use to execute our query. So as the application programmer, I can come along and say, create index as much as I want, right? I can create redundant indexes. I can make indexes on the same column, right? Just in different orders. The database doesn't want to pretty much let me do whatever I tell it to do. But it may not actually be a good idea, right? So for actually some queries, some indexes are going to be better than others. And it's up for the database management systems query planner or the optimizer to figure out which index to use for a particular query. So we'll see this later on, right? Basically, what's going to happen is that the data system is going to maintain some additional statistics about these indexes. And it can know based on what your query wants to do, what indexes it has available. And it can try to figure out that's the one I want to use. And if all else fails, you just go back and do a sequential scan on the entire table. That's always the default option. But we can do better with indexes. So there's this trade-off between having index on everything one thing we could ever possibly need and versus not paying that maintenance overhead or storage overhead to have every possible index. So again, as an application designer, when I want to create my database and I define all my indexes, I sort of need to be aware of that indexes aren't free and we'll see as we go along why they're not free because you have to maintain them and we'll discuss today how you actually maintain them. And so just because you add indexes doesn't mean it's going to magically make your application run faster. So this idea or this problem of how to pick the right indexes for your application is an old problem. My advisor's advisor actually wrote one of the first papers on this in like 1976. He's dead, right? So people have been doing this for a long time. Now, maybe in the 2000s, there's been some tools developed that can automatically pick indexes for you. I know Microsoft and Oracle have this feature now in their cloud services and this is something actually we've been looking at here at CMU and the data system we've been building with my research group. So I'm happy to talk about how to automatically pick indexes, but for our purposes here in this lecture, we'll just assume that someone picked the right indexes for us. We don't care how. So today is really going to be focusing on B plus trees. So we'll start off with an overview of what a B plus tree is and we'll do a simple demo. And then we'll go over some design decisions you have to think about going beyond the basic B plus tree we'll talk about here at first. And then we'll finish up with some optimizations that you can apply that are used in practice in real systems to make these things go faster. So the way to think about it is the first third is just to understand what a basic B plus tree is. The second third will be, all right, here's actually going to build one. One of the things you need to think about. And the last one would be, all right, if I actually want to make this thing be usable and fast, what are some optimizations I can apply? So I'll say for the second project in the course, after the buffer pool, you guys will be building your own B plus tree. And you can get by with just the first two part of these. The third ones are nice to have, but not necessary for correctness, OK? OK, I forgot to open up the browser. That's all right. So the first thing to sort of address is the confusing naming of these trees. So actually, I don't remember what the textbook where they're called some B plus trees are B trees. There's a class of data structures called B trees. And then there's a specific data structure called a B tree. So now in 2018, people loosely refer to B trees what the sort of canonical definition or the formal definition is actually a B plus tree. So there's a class of data structures called B trees. B plus tree is one of them, but there's also within that category a data structure called a B tree. But people are sort of loosey-goosey with the term these days. So for example, if you go look at Postgres' documentation, or if you go look in the code, they discuss their data structure being a B tree. But from the best, what I can tell when actually looking at the code, it's not a B tree. It's actually a B plus tree. And my sequel refers to theirs as a B plus tree. But I think sequel server might refer to theirs as a B tree. So the main takeaway from this is like the whenever anybody says in a modern data system they have a B tree, I was guaranteed they have a really B plus tree. As far as I know, nobody actually implements the real B tree. But even more confusing about this is that there's the B plus tree defined from the 1970s. But the ones we use today actually borrow some of these ideas from these other trees. So there's the B tree, the B plus tree. We'll cover those. But then there's the B link tree, which was invented here at CMU in 1981. And then there's the B star tree. So there's some elements of the B link tree in the modern B plus tree. So for our purposes, we'll just say it's a B plus tree and I'll define what it is. I think the textbook might be a bit pedantic about this as well. But again, just when you go see other systems, they may say they have a B tree, but they almost always have a B plus tree. So a B plus tree is a self-balancing tree data structure. So the B and B plus tree stands for balanced. And what it's gonna allow us to do is have efficient searches, sequential accesses, and insertions, and deletions with all of our operations taking O log n time, where n is the number of keys that we're storing. Contrast this with the hash table. What was the hash table's asymptotic complexity to do a lookup? Anybody? What's that? O one, right? Hash something and I jump to it. Worst case scenario, it's n, yeah. So the B plus tree is gonna be always log n and that's pretty good, that's pretty fast. And so the definition of it is there's actually no one paper that describes the B plus tree. The one that usually everyone cites is this ubiquitous B tree that came out as a survey paper in 1979. But this is actually not where it was invented. So this 1979 paper goes on about how great B plus trees are, or B trees are, and how every single data system is gonna be using this because this is the greatest data structure ever for relational databases. But at this point here, they're talking about how ubiquitous it is. This is only about six or seven years after it was actually invented. So even early on, people figured out that the B plus tree was a really good idea. So in this 1979 paper, they refer to this IBM tech report from like 1973, where they seem to describe what is now known as a B plus tree today. So again, the main takeaway here is that people recognize early on that the B plus tree was a really good idea for databases. And even today, we'll talk about this on the next class, but even in modern in-memory databases or even with really fast SSDs, the B plus tree actually does almost pretty good pretty well. It's much better actually than any like modern lock-free data structure, right? So the thing that was super useful and made B trees be widely used back then was that it was originally designed for running in hardware where the sequential access to new reads and writes was much faster than random access. So by organizing our nodes in these blocks, the larger, the better, because you can read more data with a single seek on the disk, we can get much better performance. And this would be better than a sort of generic binary tree that we'll see in a second. So again, the main takeaway here is like, even though B plus trees were designed in the 1970s where the hardware is much different than what we have now, they're still really awesome now and very few things actually can beat it. So more formally, a B plus tree is an M way search tree where M is the maximum number of keys you can have in a single node. And it's gonna have the following properties. So the first is that it's gonna be perfectly balanced, meaning every leaf node, every node at the bottom of the tree will always be the same number of levels away from the root, right? So you won't have a leaf node be two hops and another one be 10 hops. They're always, they always have to be the same. Then every inner node, and we'll define what that is in a second, but anything that's not a leaf node essentially, is gonna be either at least half full. So again, we can have M keys in a single node. So it's always gonna have either more than half or up to my max amount. And then every inner node, if it has K keys, will have K plus one null children. So every key I have, I have, I have to have a pointer to children, okay? So let's look through an example. So this is a really simple three way B plus tree. So I can have three keys per node. And the top part is often called the root, but this is an inner node because it's not a leaf node. The leaf nodes are at the bottom because they don't have any children. But then I can, in addition to pointers now, from the inner node to the leaf nodes, I can also have pointers between the siblings. So again, this is an example. The original B plus tree did not have these pointers. The B link tree does, but a lot of people actually implement B plus trees with these pointers now, right? Notice also too, there's no pointers going in the opposite direction. There's no pointer from the leaf node to the inner node or the parent. Now there's nothing about the B plus tree that says you can't do that, but nobody actually does this because this becomes more difficult to maintain these pointers, right? If you have multiple threads trying to access things. Yes. Can you go like one slide? One slide? Yeah. Yes. I didn't get the last one. So if my node has K keys, then I have to have K plus one children. So if I, so if I go back here, this top node is an inner node, the root. It has two keys. So it has to have two plus one children. So it has three children, right? This is doing sure that's a balance. Right, again, so we have these sibling pointers, right? And we're gonna use these along the leaf nodes to do certain things like, if I want to do a range scan and find all the keys that are greater than three, I could use my index, get down the leaf node and then just scan across the leaves going from one page to the next, right? Again, this is what makes the B plus trees be really fast because if I, without these sibling pointers, I may have to go back up and go back up, traverse down and that's sort of doing random access into my page table on disk or the file on disk whereas if I'm smart about how I lay out these leaf nodes, I can maybe prefetch a bunch of them ahead of time because they're all gonna be contiguous, bringing them to my buffer pool and just rip right across without having any page faults. So now the way we're gonna do searches in the tree is through these sort of, these like separator keys in the inner node. So the way it works is that for every key, again, it's gonna have two pointers to its children, one on the right, one on the left and the way it works is that any key that is less than the key that's in the inner node, you would look for those keys down to the right side or your left and any key greater than the one next to it, you go down to the next one, right? So any key less than five, you go down here, any key less than nine, you go down here and then anything greater than equal to nine it goes to here. So now this is where also things get confusing is like, I think I'm following what's in the textbook, but different textbooks do different things, right? You could have this be greater than equal to five and then greater equal to nine here and then greater than nine there. It doesn't matter, right? There's nothing about the data structure or the algorithms to do searches and insert infleations that are, that change. But just to take into account your data information throughout the entire data collection. You can have a lot of data and you can have a lot of data and you can have a lot of data and you can have a lot of data and you can have a lot of data and you can also find a lot of control access. So what is a Naura terug? Which means you want to take mental health care and control the other one you shouldn't. Now this is a step that I'm going to try to buy and I don't want this because of the web division. and will satisfy the disadvantages of this piece. So the product that we use is the little product that requires a certain amount of pressure into the product. Easy to use products that are good for your community to start off with is a perfect point to consider another product and that. And if I were to use it as a product, I would rather start with a product So, it's going to be nice to have the first person to have a piece of paper before they use it in a decent way, but it's going to be nice to have a piece of paper before they I won't let you know how many of you want this next week. The next week, competitively, two will will be coming up for this week. We're gonna use them and we'll just give it in. If you can't get it, it's up there. I will just hold it as well opposed to the others. The best option is yours. I'm gonna show you the other names for the next two people who are going to do these stuff. with two bookies in the house, and every single person has their own science. One of the decisions we have to make on the internet is to ask someone, and they can all have their own business. So it may be just me, but I don't know if that's a choice. And I think that if I have a bookie in the internet, because there's no such thing as a purpose to ask someone, and she won't. But if you really need a bookie in the house, or if you really need a bookie in the house, it is a choice to ask someone, and they can all have their own business. So it may be just me, but I don't know if that's a choice to make. I have each of my own books in the house, and I don't have a bookie in the house, and I think that the light is on. So I just want to ask a few questions. I have a bookie in the house, and I don't know if that's a choice. I have a book in the house, and I don't have a bookie in the house. I don't have a bookie in the house, and I don't know if that's a choice. And it is important for me to ask a few questions. So, I'll just have to put it in order for the audience to see. So, I'll just have to do this, sir. We do the same thing, sir, in the first place. Right now, we're going to have a little bit of a discussion about what to do with the members of the new Alameda. So, we'll be talking about our strategy. I'm going to be talking about the strategy that we're going to be talking about in this conversation, and I'm going to be talking to you. So, we're going to have a little bit of a discussion about what to do with the new Alameda. we had a new engine in our pet road and we had a new car in front of us and it took a bit of a while to set it and it has a new one in the back and it's very easy to do this but it's also really easy to do and I can't say much about it but it's really easy to do and it's a big advantage and it's very easy to do So, as you can see it's a little tricky I don't know what to do with this the mere two lots, even it didn't really have any success I think it's the easiest to do much good I actually did a pretty good exam from the slide and yet it's such a simple solution So, you should know this as usually, the quick push takes over the entire building a few minutes up and the he's he's he's he's he's he's he's he's he's Now, I would like to introduce you to our first mission. The mission is that we have to make sure that we all have our own parts to be able to have them. So, finally, the mission is to have our own parts to be able to have our own parts to be able to have our own parts to be able to have them. Our mission is to have our own parts to be able to have our own parts to be able to have our own parts to be able to have them in the future. The left mouse is a little bit right again. Now let's look at the left mouse pad. There is a little mouse pad, and if you put two buttons in the middle, it will just slide out in the middle. So, yeah, two years from now, we want to play the game. We have a key, and it's a simple game. We give it every time we add that, make it easy. We have a good way to play the game. We have a good way to play the game. And, you know, we have a good way to play the game. So, yeah, we're going to have a great game. Actually, I want to go back to the start because we want to adapt to the game now. You can make things easy but it also gives you stone. We have a good way to play the game. I want to start with the goal to make the game easier. I want to play the game, and, you know, I want to start with the goal to make the game easier. But we're going to have a really good way of playing the game. you'll remember from when you did the same thing for a while, I think you would make a mistake on what you're doing, and it's not something that you want to do, and it's not true. And you're going to continue to do that. So I think it's important to understand the question. Because I think what is going to happen is, here, it's going to actually just be it's the problem you're going to have to do. So we have to always be open. OK. We have to have the same thing. I always try to get, um, ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... It's really important to have the tools and models that are based in the same system. And the organization has to be able to see what's going on in the system, and actually to have the resources to meet your needs. And this is something that is really important to you. And that's why I want to try to manage actually in such a way that I can do the same way. Some of these are not as easy as the other ones, and it has to be able to see what's going on in the system. It's really important to have the tools and models that are based in the same system. That's why I want to try to manage actually in such a way that I can do the same way. So, one of the advantages of being able to see how the system works is that it's really important to be able to see what's going on in the system. It's really important to have the tools and models that are based in the same system. So that's why I want to try to manage actually in such a way that I can do the same way. I want to try to manage actually in such a way that I can do the same way. So that's how you manage the notices. So, basically, the most important thing to do is to have the tools and models that are based in the same system that are often tried to manage the whole system on your own. Not only do you have to be able to see what's going on in the system, but you can really see what's going on in the system. So, that's why I want to try to manage actually in such a way that I can do the same way. And it's really important to be able to see what's going on in the system. And that's why I really want to try to manage actually in such a way that I can do the same way. So, you might be able to see that theえる inside Spirit, but it seems really small coming to the inside spirit line. So, the upper part can help on what's getting meal projects being implemented on one side at the gym, right now you're in this sort of like you have here. We're just exchanging this. And so you are that who will see your future in me. And this subject will be due to full access to the report. Next question I'd like to ask is, when did you decide to work? Is it possible for you to do such a thing? Well, that's true. And, you know, it's possible for you to change. And it will be okay if you want to make the same change. So, you're not going to be doing that for a long time. And it will be better if you are going to do it for a long time. Next question I'd like to ask is, if you have any questions for me, I'd like to ask you to do a quick question. If you want to do a quick question, you can ask me on the phone. I would like to encourage you to come and see the game again, and as you do, you will want to see the games slated a little bit later on. So, some games are a little off or a little lengthy, but some are not. So, it seems to me that you never want to do games. And it certainly wouldn't be the end of the problem if you know what you're going to do. So, this is an understanding that you're working together to sort of get there. And, I would like to encourage you to come and see the game again. And, as you do, you will want to see the games slated a little bit later on. So, I would like to encourage you to come and see the game again. So, it seems to me that you never want to do games. You can refer to the notion that each object has a point of view that your each object's sense can be a picture or an object. Everybody has a reason for that. I'll call it a powerful energy vehicle for each object or device. It's just the same text. So what we're going to do is keep this text page title for where it starts out and then the first person is going to be a very long-term road in the future. The page that you're going to be in can't be more than a year. And in my case, I'm going to send that to you on the mobile page as soon as I get it done. So it does last. So it's going to be a lot of fun. So, the good thing about a lot of the keys is you just use the word key at the time. You can get the keys at the time for them. You can get the keys at the time. You have a link as a way of the type or the type key. But it's a different way of telling what the type is. I think the question is what is the feeling for the reaction to this. So sorry, everyone. This is the way you can have a type that I know perfect for you and that is. So, the way that I say that I do keys in a really fundamental context is that I have this new fix that I want to do a first-road, which is to put the key in each object so that it's just more formal in the future. the product is to end the program and every single key is for the person to do that as a result of the use of them. So, these are some of the features of the program and the other features of the product are its requirements, this is a generic annual page with a generic keyword, which is used as a different and it don't have any, you know, has a page on it. So, this is a code that is used on your next list. The next feature is how to transfer. I said that you can either use it or some of the key on it, actually even if you don't know, we put it on the list. So, what I found out with the key is that you look for key and it just means that you buy your search on the page, and it means that you want to book. And I usually can think of key. If you're going to solve the rate model in this kind of deal, you have to make a code that's going to be expanded at the right time. And if you're going to copyright it, you should just send it to your account at the next point and you're going to have to write it down. And if you're going to copyright it, you should just send it to your account at the next point and you're going to have to write it down at the next point and then you're going to have to write it down at the next point. That's the most important feature. Another interesting feature is that you're going to have to have an interaction change. So, then what information do you have about some of the things that you want to explore? And I also made a clear talk about what you think we have to be in-depth into. So, in this particular, the rate response is about more than, right, the rate response is about more than four million points. So, this is a point that I'm going to be taking as a new key in the future, or whatever it is that you want to explore. So, I'm going to go to my e-mail site, not the whole site. And what I'm going to do is, I'm going to go to the site of the e-mail site. And I'm going to track what I want to do, track what I want to do, track what I want to do, track what I want to do. And the first thing I need to do is to take a look. And one of the things to do, if you're going to do a sketch on a project, you can do a sketch on a program. It says, you can see the purpose of your task to prove your presence by the way that you are when you're in convention. So, I'm going to write my address as a bonus, followed by a third of it, and that's it. Let's go up some of the decisions we have. We have to do a sketch on a project, and it's a good construction, which is much better to review yourself as you're just doing what you have to do at the table. So, again, the community that we're springing back to keep their hands in their arms, they want to drag forward when they get to the space as long as they can be around the areas that they're talking about. So, this is a three-tile of a robot. All three of these will be used in the same way. I have one of these. So, I'm going to put a key. I'm going to put a key in every single key. I'm not going to have one of these. I'm not going to have one full-tile, but I'm going to put a key in every single key. So, if you're sitting in a place that has a lot of work to do, you should try to control all of those. And I've already taught a lot of people how to do all of these things. So, I have a lot of lessons to do. And the general idea that you should start with, you know, to present a hot ad, and a work to make a basis for the work. If it's only this way of the ad, in the inner space, there's a lot of ideas that are very particular. Because the idea that you should start with, you should start with a hot ad. It's a part of your training. You can practically do it in the inner space. And the idea is that, you realize that the idea that you should start with is so different from your teacher. And that's really important in inner-sphere for you. You think about what you should go through right. This is the idea that you should start with, and the idea that you should start with, in that direction, you should start with it. You know, that's what you want to go through. So, you decide if this is a work to deliver, or something like that, you should try to do it the best, and it's not going to be a good idea. So that's the real edge of the drill as well, is to be able to see and how it works. The number of cases that are different from all of the ones that are already worked out. So that's it. That's the content. And the number of other cases that are already worked out. All right. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well. So that's the real edge of the drill as well.