 Slope in I love it. All right, let's get started Right, uh, we love to talk about let's let's get right into it. So again, thanks for DJ drop table. It was holding it down That was awesome. Thank you How are you? Are you still having that police problems or everything okay? Do you need a good lawyer in town because I have one Can you just give me the contact info yeah, so when my pg students got in trouble for legs like whatever minor So we have somebody we can help you out. Okay No, you're not wrong. Okay. Good Uh, all right, so let's let's talk about uh databases. So real quickly. Um, the We have a talk today uh in gates made for at 4 30 p.m So this is uh one of the head engineers from vertica is coming to give a talk so vertica Believe it or not has a small development branch here in in pithburg And so steve is going to come and talk about some of the things that they've been working on vertica is a disk based distributed column store database We got a bunch of words. We've been throwing around throughout the entire semester, but it means it runs multiple nodes It's a column store and and it assumes the primary storage location of the databases on disk So this is one of the it's not the it's one of the first column stores that came out in the mid 2000s that uh sort of that began the wave of column store databases So column store databases aren't really that rare now But back in it back when verga came out in 2006 ish that was considered a uh That was a sort of a major tech technological breakthrough So so steve is a good guy If you want to hear he'll talk about the kind of things that they're working on and for that one that'll be fruit They're not pizza. Um, so again plan accordingly All right, so today, uh, we want to continue our discussion on talking about tree indexes So I want to spend a little bit time in the beginning Doing some demos and discussing more about b-plus trees to finish up the things that we we left out last class And then we'll talk a lot about different ways Talk about more ways you can use indexes beyond the, you know, straight key mapped into a data structure that we've been talking about so far And then we'll talk about an alternative to b-plus trees Tries or radix trees And we'll again we'll go through what makes these unique what makes them different and how are they better or worse than b-plus trees And then we'll finish up really quickly with a Sort of a brain dump of inverted indexes. I'm not going to teach you how they work We have courses here at CMU that can do that. This is just so you know that these things actually exist So when you got in the real world and you realize that the thing you want to index can't be indexed in a b-plus tree You want to use one of these inverted indexes All right, so the The last class we had a couple questions about how do we actually going to handle duplicate keys in our b-plus reindex So I showed you how we would handle duplicate keys inside of The node right we could we could duplicate the values in the node and then Or just have a mapping from a key to a value list inside the node So now I want to talk about at a uh, what I realized I missed out was discussing at a higher level actually within the tree itself How do we can actually maintain these duplicate indexes or duplicate keys? So there's two approaches to do this So the first is that We're going to make every key unique automatically by appending the The corresponding tuple's record id to the key that we're inserting into the index So instead of just storing the key, you know the copy of the attribute that's in in the table I'm also going to prefix or sorry put at the end as a suffix the record id for that tuple so now that makes every single Key automatically unique So the reason why we can do this and this still works is because we're using a b-plus tree Remember I said it with b-plus tree. We can do partial key lookups And and still find the things that we want So if I if I have an attribute or an index on attributes a and b If I want to do a lookup on a I can still do that without having b So in our case when the b-plus tree because we're not going to have the record id We can just do the the regular lookup as we would with the key But we just scan along the leaf nodes till we find all the matches for that given key You can't do this in a hash table right for the hash table. You have to have the entire key So in order to do this approach you'd have to have the When you do a lookup the key you want and then the record is the the record IDs they correspond to But that seems stupid because if you had the record IDs, why would you use an index to look up the record IDs? Yes So his question is what is what is this record id? It's the page id an offset we talked about in the very beginning Right, that's the unique identifier for every physical location of a tuple now it may change and therefore we have to deal with that Uh And postgres is is most famous say the one that with this won't work because they can move things around But when we talk about multi versioning We'll see different examples of why this works for postgres This doesn't work for postgres with the arcs for other systems But this is assuming it's a page id and offset or in the case of sequel server in oracle It was like file number object number page id and offset like a more complex thing The other approach is to somewhat violate the sanctity if you will of the design of the b plus tree and actually store Uh the duplicate keys as overflow leaf nodes So instead of expanding the leaf nodes horizontally to accommodate new entries We're actually going to expand them vertically or then what for then a given leaf node will add these overflow pages Almost like the chain hash table we talked about before and is add all the duplicate keys down there So as we'll see in a second, I'll get again provide overviews what this looks like this approach is going to be more complex because Now we have to handle the case of where i'm scanning along my leaf nodes. I have to know how to follow those You know follow down the overflow pages if I'm scanning in reverse direction, you know Where do I start my scam when I jump back in the other direction? so Those people implement this one Uh, this has the advantage that again We don't have to make any major changes to our data structure Whether it's a unique versus not unique index Everything just still works the same The downside is now we're actually storing this record id as an additional key You know element of our key and that increases the size of our of our index You know the amount of data it takes to actually store the index with this case here We're not storing any redundant information unnecessarily to make things unique, but now we have this management issue So let's go through both of them. So this is our simple, uh B plus three that we talked about before and so the first approach is again to pen the record id So either i'm showing just like the key value. Let's assume there's an attribute a and here's all the values for them In actuality what the database system is actually storing is a combination of the key And then that record id So now when I do a look up and say I want to insert key six I would in this point here. I can do a Uh prefix search in my in my b plus three because I don't have a record id as I'm inserting this Actually, I take back you do have a record id, but I'm not going to find an exact match for that So I would traverse down here and I would land at this page Um, the real I mean the the real thing I'm starting is the page ID offset, but I land here And now I know I want to go into this page so Because now I don't have overflow pages. I have to go exactly in sorted order. So assume whatever this Original six is it's record id is less than the one I'm inserting So it needs to go between the six and seven So I just do the normal split process that we talked about before I slide everybody over seven and eight move here and now I can I can update pointers and how six goes in here Right. It just works exactly the same way they talked about before If I want to do a look up on six again, I just do the prefix search I do the just look at the first element of the key just the six and I can find down here Now I scan along my leaf nodes till I find out what I want The other approach is the overflow pages So now again, I want to start six again. I know I want to go into this guy I can't I don't want to split across Right. I don't I don't want to do what I did before I have seven eight move over I want to go in this page here, but I can't because it's full So I just add now an overflow page Where I insert my new six and now I have my pointer down to it Now remember I said before that that in in most textbook definitions of a b plus tree You assume that the keys are always going to be sorted within the node in this case here We could do that. We could sort them But for it's not actually wrong just to leave it unsorted We just need to know when we're looking for the element we're looking for We can't use binary search to jump around we have to do the linear search to find that we want So now let's say I want to start seven same thing seven goes down there I insert six same thing six goes here. It's it's unordered and that's okay So now here's what needs to happen. So physically this is stored across multiple pages Logically from the index perspective. This just looks like one giant leaf node that has a much So now if I'm scanning across I do the same thing I follow this pointer I land here and now if I'm scanning across instead of jumping over to this node I no need to follow my overflow page to keep looking there And eventually if I find what I'm looking for I'm done. If not, I need to go to the next page Then I just follow that pointer over there So now you're thinking why not just have this guy shouldn't this guy really be pointing to that one, right? Because that would actually be correct But now the problem is every single time I add a new overflow page Now only don't need to update You know my pointer is internally for these two nodes these nodes over here. I now need to go update this one as well But if I just leave that pointer alone and let it point to the beginning of my page The you know the topmost leaf node in this vertical tower Then I would just land there and say oh, well, I'm going over search I really need to jump to my end of my overflow page and work backwards There's a bunch extra logic we have to do to accommodate this. Yes So I said again we're your first word sorry you haven't reviewed what sorry Okay All the pages corresponding to the key so we key six so we need to fetch all these three pages back so his question is If someone if the data system is using the index And we're trying to find the tuples that have the value six for this particular key So what is the index returning? Well, it would turn the record IDs So you you would say you basically have an iterator I traverse down get to my leaf node and now I'm looking on next next next on this iterator And I'm looking at every single element until I find the wrongs that I want and I knew the iterator knows to stop Which says I'm looking for everything that were key equals six So soon as I see maybe a nine over here. I need to stop But just what I'm saying if you know if this is unsorted Then I know I need to scan to the end of all my overflow pages because I the last six might be here If I want to keep them sorted then you know now I insert this six here Now I got to go update this guy and this guy whereas before if I just append it only it's it's updating one page So this is a really good example of Why you know Why trying to understand these data structures in the context of a real full system is important If you take an algorithms class an algorithms class will teach you. Yes. This is the way to start a b plus tree But now because we're inside of a database system We know we have these things called record IDs and we can exploit them To make to facilitate different aspects or different operations that would not be otherwise easy to do. Yes Presumably if we over flowed like the overflow we continued to another overflow Correct so the statement is if this thing overflows we just keep continuing. Yes Presumably there's a certain point when you want to actually like rebalance. Correct Then he says to a certain point you actually want to rebalance so Yes, so that could be a criteria. It says all right. Well if I go beyond this number overflow pages then do a split But if these are all sixes Right in a single page and you can't you can't quite you know easily do that Without depending the record ID All right, cool. So let's do a demo because we didn't really get to do this last time So we're going to do postgres um, and these want to show the difference between a Uh a b plus tree and a hash index All right, let me turn this off All right, cool. Let me log in over here So i'm going to have a table Just live Yeah, okay, so this is postgres. I'm going to have a table Uh of email addresses So it's going to be a simple A simple table with an id with an auto increment key and then a bunch of email addresses So this is a file that You can find on the internet This is a list of 27 million email addresses from If you don't know what that is it was a um a Think of like tinder before tinder It was an adult hookup site in canada that got hacked and then eventually people released the uh the email addresses So this is real um It shouldn't take that long to load, but uh, I should have done this before me, but that's okay Um, so what we're going to do is we're going to create two indexes We're going to create a hash index So in postgres you can actually to say when I want to create an index I want it to be a hash index and then you can then say, uh, I want to create an index And I want to be a b plus tree index Of course, yeah Well, this is that a good demo, right? Okay So maybe we'll come back to this. Let's go to my sequel Sorry about that. Okay So same thing. This is my sequel. Uh, so I can do select count star From emails All right postgres finished Actually, let's go back to postgres because it's done. All right took a minute So I told you I inserted 27 email addresses So I first I'm going to do is I'm going to create a hash index So in in default by default in sequel in most systems If you just create an index you're going to get a b plus tree if you know some tree data shifter In postgres I can say using hash. I'm forcing it to use a hash index And so now we can see things like of course not going to load this. I should have warmed it. Sorry. This shouldn't take too long But the the We can see how If we try to do certain queries, we won't be able to find, uh, the things that we want Let me load all this in Sorry PG warm So this is that same function I used last time just to warm the cache And then now I'm going to call create index In theory, this should be faster because everything's just loaded in Um, but we're going to run queries and we're going to see how the query planner is not going to be able to pick the index For some queries, but it'd be able to pick indexes for other queries Right because a hash index again you have to have the entire Uh, the entire key or an entire element of the key You can't do partial lookups and you can't do range scans Um, sorry Well, this sucks All right, while this is going on too, I'll I'll then create the uh, the bplus tree index in the background I had everything working and then I dropped the table where before class started and I forgot to rebuild it All right, we're back. Okay. So it took 50 seconds. All right So now I can do say queries like select star. Let's let's find a user first So let's start from let's find the the minimum use email address From emails Yeah, there we go Whatever that is someone correctly used a fake email address. So if I want to do select star from emails Where email equals this thing Again, if I add that explained keyword Postgres will tell me what it's going to do. So postgres is going to tell me that hey, I have this thing called a hash index And I and I can do and do a lookup because I know I can do exactly the thing I'm looking for equal that text Right, we'll ignore what a bit bitmap scan or heap scan is that'll come later on But but we know this is going to be fast because it's going to go find exactly The the one that we're looking for right But now say if I want to do something like where email like And then this thing and put a put a wild card at the end Can you use this hash index? No, right because you have to have the entire key. I'm not going to run it because it'll probably take the whole time but uh But you can tell you when you're asking explain it's tells you it's going to default to the sequential scan Remember the sequential scan is always the default operation or access method for the database system They can't find what it wants using index. It always defaults to a sequential scan So while this is going on let's uh Let's build the the treat index But we can see some other things too, right? So let's say we want to find we want to count all the email addresses Where the that are greater than this Right can it do that? No, again, right because we have to have the partial key. We can't do anything that's not in the quality predicate It always has to be an exact match all right It can do some things though, right, so let's say we find the There's let's say somebody else also did this one too Is it whether that exists? I didn't find that Actually, that was pretty fast for a sequential scan. No, wait. Sorry. That's an index scan Let's find another one. Let's find another one. Let's find the Somebody who's somebody who starts with an a so like A star and then we'll limit one and this basically says just keep Find me the first one that you that that that matches that Right, so there's somebody's email address that that looks but that's okay But if we can do other things like this or email equals like that And it was able actually to do two index scans So nose has that or claws And if you see i'll do one probe in the index try to find what I want Do another probe index and try to find what I want and then it combines them together and that's that's what the bitmap or means Basically what's happening here is the bitmap index means that it's trying to find all matches And then instead of starting the record id it maintains a giant bitmap and then just stores the The you know it sets that bit of that offset to say that that record matched and then it combines them together and then It produces the output so that's why it sort of has to do this and then another look up over here All right, so now our b-plus tree is done So now we come back to our original query here Yes Your statement is didn't I say last class that it would do the search once and then organize It will go through the b3 down right. This is a hash index. This is a b. This is not that this is the b plus Sorry, this is a hash table. This is not a b-plus tree okay, um Like going back to Well now I just added the b-plus tree But yeah right there So this is explained explain is telling you what the query plan is going to be so it tells you I'm going to do an index scan Using idx emails hash. That's the name of the hash table index I created okay So now my hat my my I have my um I have my my b plus tree index So if I do something like this Just do this look up we have before this fake email It tells me it wants you to hash hash using the hash index, but as soon as I add this I add the range predicate Down here Is he gonna use the hash table no it's going to use the It actually to use sequential scan. All right. This is another good example. So this guy is the smallest key That we have in our index So it knows that if I want to use the index and all I'm really doing is jumping to the far left point And the scanning along the leaf nodes and so therefore the traversal of the index was a waste of time Therefore it's better for me to just do a sequential scan But let's say if I change that to z a bunch of z's And now it says that all right. Well, I know that if I use my index I'm going to throw away a lot of data So now I can use that bit tree to jump down to the right side of the tree get a starting point and then a scan along the leaf nodes Right. So this is an internal cost model thing that's going on and post-crestle We'll talk about later that allows it to decide when's the right time to do these things So is this clear so for a quality predicates the hash index is going to be pretty good But for the for these range predicates If it's at the right location again, we know something about the distribution of values Then it will choose to do an index leaf scan So now if I go back here again, this guy was doing the the index scan doing exact point query look up If I drop that index Drop index emails hash Comes back right away and now I do that predicate now it's smart enough to know Oh, I no longer have that hash index I have but I do have this tree index So I can use that for this quality predicate So is this clear? All right. The other thing we talked about last class briefly was was table clustering So table clustering is the where we're going to use the index to enforce the sort ordering of the of the Of the table themselves the tuples themselves So remember post grass is unsorted or sorry the relational model is unsorted So as we insert things into post grass It's just putting them in the essentially the order that it that it was told to put them in like as we do the inserts And we saw examples where I could update things delete things that can reshuffle some depending on how I You know what you know what free slots are available in a page So if I go say do a select query here a select star from Emails and I just say give me the first one All right, we get some random gmail account but now if I say If I call this clustering command This will take about a minute but what this is doing this is this command is forcing post grass to to potentially resort the entire Table based on the the the sort ordering defined by this index But this is a one-time operation So as I as I modify the table and maybe things get out of order It's not you know, it's not going to match what it was when I first set it up here Some systems like in my sequel and sequel server in oracle You can say I want an index clustered table or an index table clustering on the index And therefore it ensures that no matter how you insert values into your table in what order you insert them The underlying physical storage will be will be sorted So in some cases this allows to do binary search Directly on the table themselves, which is still log in without having to go through the index itself So this is going to take a long time. So we'll just let this go But I quickly want to jump back over to my sequel So again, same email address loaded into my sequel. I can do the same kind of queries select star From emails where Where that fake one was email equals that So the the my sequel explain is not as good But basically it's not a tree structure. You read it here. It says this is here's the index I could possibly use so it knows I have a hash index and I can use that And then if I do change that to be greater than this Still uses it I know but it says here the the It's kind of hard to see it's small It's it's rolled over here, but see the There's this column here called rose. This is my sequel telling you how many roses can I think I might have to read So that's 27 million wrapped around around here So it could use this hash index, but it's going to fall back and do a simple sequential scan Whereas the one above it could do an index index probe to find exactly what it wanted right all right, so Postgres is now done. We're coming back here. So now if I do limit one Right, I get that first guy that I have before right So this is saying give me the very first tuple you find for this for this And this is the min one that we found before because it's guaranteed that this was inserted in order So if I if I do this now if I say I delete that email address delete Emails where email equals this thing I go ask for the first one and now I get a different fake email address But now let me insert another one back in insert into emails Values default because it's a it's auto-incremented key insert my guy back in Right. It's still not in sorted order Because it only did that operation once Any questions This question is It to an inch in force sort order I'd have to run cluster over and over again for postgres. Yes for other systems You do not have to do that. You can say I wanted to be all automatically clustered by the index This question is does the column need to be indexed before we cluster? No, so in my sequel it's sorted by the primary key Uh, so In my sequel the leaf nodes are actually the tuples themselves So as I'm moving things around splitting emerging the leaf nodes are always in that sort of order So if I want to do a sequential scan on the table I'm basically always following leaf nodes and so in other systems like oracle and sequel server db2 You can say create this table and sort it by these columns and I'll do it for you. No, sorry Yes Yes This question is um I said that we could use the physical id sorry the physical location The page id and the slot number as the record id to determine to make the the tuples unique Instead why not use the time stamp of when the tuple was inserted Because yeah, so in postgres this is an issue from my sequel this this won't be an issue What's the problem with time stamp? He said in distributed setting the times aren't going to be synchronized. Yes, even more simple That is an issue Leap second leap years Right So now again, they repeat the second now what happens I insert something now they have the same time stamp Or clocks can drift my clocks are horribly inaccurate So, you know, I run ntp every so often and now it slowly just the clock but occasionally has to do big big steps You might repeat a second Yes His question is what's it between doing the record a versus bell seconds versus big unique Yeah Yeah, so he's right Middle seconds since the unisec file could take your leap year. It won't help drift Right if the clock has to get stepped back you repeat seconds Nobody uses time stamps in that way you can use logical time stamps, which we'll talk about later on You almost almost never want to use hard physical clocks you used in conjunction with other things Yes The question is why when would you want to use a clustering index when if you already have an index so again like In the case of my sequel I should show an example in my sequel It's always a clustering index when I call create table. It's always clustered on that There are some cases where For certain queries For certain queries you can be smart about like all right Well, if I'm clustering on the on the logical time stamp when it was inserted like the application told me the time stamp Then now maybe I can say we'll take the last days worth of data And put it on the fast disk and the older stuff puts on slower disks There's ways to do like dispartitioning that way And the data system can enforce that all for me underneath the covers Like in my sequel you said that uh All the things are sorted by the primary And yes You so you say this if if I have if I'm if I'm clustered on email I have an index on email And that's my primary key And then Primary key's id so if you have a primary key we'll see this in a second you always have an index on that on that on that uh On on that id or on the attributes Yeah, so then you have to update the other index yes Depends how you store your indexes we'll get to that later like in if we'll get that when you talk about the version so you the The pointer could be the primary key or it could be the record location record id You can do different things Postgres does Record id so we have to update all the time my sequel does primary key All right, so we actually can poke around and postgres real quickly and see what the uh See what the That's just you know, what roughly what the tree looks like Right, so this is just an extension postgres that allows you to get information about Uh, what's in the tree right so I can say I had this index called, you know On the on b plus tree and I can say you know give me information about it tells you how many levels it has Tells you how many elements that it's storing and the root block size So then we can go even further and we can actually get inspect the contents of the tree Uh with this command here And you know the the actual details doesn't matter, but there's a bunch of hex stuff All right, so this is the root node So we can go go a bit deeper now And show you you know for a single node here's some information about it But then that's all hex but we can decode it And then here's in here, you know here's just proving that it's actually storing these emails So this is saying that here's a record at offset three in my root node Or in this particular node in the tree. Here's the you know page number and offset where it's located Here's the hex form form of what's being stored and then there's the actual email address Right so again the the data system is going to store entire copies of these keys on on the inside All right, we're going to stop now and keep going to do a bunch of other stuff We want to get through But that's just getting to show you that you can by default you're always going to get a B plus three, but you can force some systems to tell you I want a hash index and there's different trade-offs for doing this All right, so now related to the point he said about the the primary key You know and then versus the cluster index So if you create a primary key the database system will automatically create an index for you Actually for any time you you declare a an integrity constraint It will automatically create an index for you and you think about it has to because otherwise The only way to enforce that is to do a sequential scan So in my one my auto increment key if I hadn't to enforce the primary key uniqueness of it every single time I inserted that You know unique people if I didn't have an index I had to scan every single tuple all over just make sure There's nobody has the same same key So again every data system will do this automatically for primary key and unique constraints It and it's a basically again when I create the table if I have primary key unique It's the same thing as running these commands. I'll create the table. Then it goes has increased these indexes For foreign keys. It doesn't actually do this So if I create a new table here Called bar and has a foreign key of reference to this value here Every database system that I've ever tried this on will always throw an error Because it's saying I it doesn't have a way to to enforce this referential integrity constraint without an index Right, you think it could automatically create one but it doesn't do that because it doesn't know because this has to be unique Right, so it won't actually do this instead You just replace that with as add unique calls here and that that that builds an index automatically for you That it can then use to enforce this Right because they basically with the way foreign keys work is that every single time I start a new tuple into bar I have to have this id thing So then to make sure that it matches to a tuple in my this table I then do a look up in that index and see whether I there is a you know parent referential match So now I want to talk about different ways to actually use indexes beyond the you know copy the whole key That we talked about here today So the first thing that we can do is what's called a partial index So when you normally call create index on a table, it does a scrunch of scan across the entire table and looks at every single tuple but in many cases For a lot of applications. Maybe you don't need to have an index from the entire table And instead you always want to maybe on the you know some subset of the data So this is what a partial index is You basically modify the create index command and you add this where clause at the end That tells you what tuple should match to be to be in order to be put into this index So now if I want to do a look up like this select b from foo where a equals 123 and c equals wu tang So I've built the index on a and b my where clause has references a So I can still use this index But I can also look at this thing and say oh where c equals wu tang that I know this is exactly the index I want to use So then this allows me to do a more The index is leaner because I'm not storing all the information for all possible tuples I'm only storing exactly whatever it matches in my where clause So if it's somehow the query doesn't have the c equals wu tang. I can't use that index So this is very common when people do things like they want to separate Have different indexes for different date ranges like you know per month I'll have an index so I can do look up quickly on all the orders I want for that you know for that month And again, I'm trying to not have to pollute my buffer pool cache with a bunch of data that I don't need By having a partial index now the height is going to be lower and I can quickly find the data that I'm looking for So in this particular example here Uh for this query We were doing a look up on a using c and we want to return b It turns out actually for this particular query All the data we need Is in the index itself So remember I said normally the index would for a given key would had produce a record ID That you could then follow that to in the table heap and get the tuple that you you were looking for But with this particular query here, we don't actually even need to even look at the tuple Because we need a to do the look up that's in there. We need b that's in there and c is already handled by the partial index where clause So to answer this query, we only need to actually look at the index. We never actually need to look at the underlying tuple in the table So this is what is called a covering index A covering index means that the all the fields that are necessary to Answer the the required result for the query are produced You know are can be found in the index itself All right, so you don't declare an index as a covering index This is something the database system figures out for you automatically I know knows what your query is and knows what your index says everything I need it is in here So again just using this example simple example I can get the b field from that from that The a and b field can be found exactly from this the index and I never need to look at the the actual tuple I can do this for for a bunch of different queries. I can do this for aggregations. I can do this for joins And the advantage here. It's one less You know page id look up in the page table on maybe one less disk i o to not have to go Look at the underlying tuple for this So the a bunch of different data systems support this All the commercial guys do mongo does I don't think my sequel and postgres do I may be wrong about that But this is a big win right if you can do this this this is a huge deal Actually, I don't think postgres can do this for reasons. We'll talk about later So for this simple example, this is great, right? I need a and b a and b can be found in the index But what if I have now another I have another attribute that I want to be able to Do a look up on or get for my query, but I don't actually want to build the index on that attribute Right, so my table has column a b and c. Maybe I don't want to index on c But it's still be nice to have a covering index not not have to go look at the tuple So this is what the include columns allows you to do Basically an include column allows you to say For all the keys that I'm storing from my my leaf pages my leaf nodes also include these additional attributes So in this case, I'm building index on a and b all the inner nodes only have keys a and b And when I do look ups, I only examine a and b but when I land into the leaf nodes I can also get the c attribute value for every single entry in there Right So now again if I go back to my other query here Select b from food where a equals one two three and c equals wu tang I do my look up on a follow that down then as I'm scanning along the leaf nodes I can look at the values at c that's packed in the leaf nodes and also value my predicate and produce my output So This one is also this one's more rare than the covering index support. So a lot of systems support the partial indexes If slightly fewer systems support the covering indexes this one is is even more rare I think this is postgres 11 has sorry. Yeah postgres 11 is going to add this or has it has it now Seco server has it, but my seco doesn't support this and uh oracle does not support this So again, the key thing about this is that although we can do a look up and see in In our where cause it's not in the internets or the we're not, you know, greatly increasing the size of the overall index The last kind of index I'm going to talk about are functional expression indexes So again, everything we've shown so far anytime we declare an index We're always creating an exact copy of the key. That's in the tuple and putting that in our index But there may be some kind of queer some queries out there where We don't actually want to do a look up on the exact value of a key We want to do a look up on some value that we derive from the key So let's say I have a simple example here I have this users table and I want to do a look up and find all the Users that logged in on a tuesday So this extract function just takes the timestamp and you pass in what element of the of the date or timestamp you want So dw means day of week And so this is an extract the day a week from the login timestamp field and find the ones where it equals two Tuesdays two sunday zero monday one tuesday two So if I create an index like this as we've shown so far Right, this won't work. Why? Yes, do you have to know which uh dates will have Will which dates in whatever store format you're starting to actually have correct? So this one is says he says You have to know how to extract out or pull out exactly what ranges Uh Will correspond to tuesday And so you can kind of be smart and say oh well my query looks like this I could say well here's the here's the ranges of timestamps Where tuesday can be found but as far as I don't know system actually does this So instead what you can just do is not use this and create a functional index or expression index where the actual the The the attributes you're you're indexing on can be any arbitrary expression Anything you can have in a where clause you can you can build a index on All right So now when I want to do a look up and uh for this for this predicate I know how to exactly satisfy it by you know You know do look at every feed to skin along here finding all the the the the twos that match what what we want What's another way we could we could speed this query up to that we've already shown before The partial index way right so instead of creating an index for exactly uh in this way for exactly For the extracted day a week. I instead Just use that as my my where clause to say put only the elements where Uh where the extract value produces two right, so Let's do a demo of this in postgres so postgres has the partial indexes it has the uh It doesn't have covering indexes and then the version I have here is 10 So it doesn't have the include clauses, but we can play around partial indexes versus the functional indexes All right, so for this we're going to create a table um, make sure we turn off parallel workers And timing is on right so we're going to create a table That has Again id field and a login timestamp, and then this is going to be a This insert query is just going to insert a bunch of records Uh a bunch of unique timestamps Since 2015 to now at at one minute intervals Right, and this is going to generate uh looks like two two billion records So in second time we'll go makes this go fast or pg warm everything now everything's in our buffer pool So say this is the query we want to run We want to get the average id of users where the uh the day week is They logged in on the tuesday Right, so in this case here when we when we run explain All right, it has to do a sequential scan Right, there's no index So the first index we can build is the expression index And this shouldn't take long So now when we run explain we can see that it's able to pick out and use that expression index we just built All right And again the way the way it did that it said, oh, I know you're trying to do a look up on This bigger, you know extract function Where and where the output is two so now I just need to do a look up and say find me all the the values where it equals two So then we can add the partial index Again, this is now creating a smaller index that only contains the uh And or did I it only contains the the records where that extract function equals two So now I go back to my function here And I see now it actually wants to pick that index Because that's going to be a it's going to be smaller less It's a tree is is has a lower height and I can just find exactly what I want immediately So again the database that can figure out on its own which is the best access method to use for all these different choices All right, so any questions about this Yes Like Okay, so he said So let's try it out right so drop index so the statement is What happens if my expression is A uh, it's based on some some difference Or using the current timestamp Isn't that going to change Every single time I run it so I think what postgres is going to do So let's you let's do the uh Yeah, let's do the expression index. Where is that? Yeah, so he's saying do this Take log in And again, I can put anything I want in here long as long as the valid expression So I can say take my log in and subtract out Now the current timestamp doesn't let you do that Forget how to do this in Yeah, I forget just forget how to do subtraction in um postgres Can you do that? That works. Okay Oh, yeah, because you have to do this Okay Nope No, let's just do let's do something more simple. Let's take a log in Subtract a hundred from it didn't like that either yeah, all right, so Basically what happens when we call create index it'll run the now function once and whatever that timestamp is now That's now now Later on it doesn't change. It's not dynamic. It's as it builds the index. So now if I insert something again Uh in theory it should now use that the Correct current now If it's smart it could say well, what was the now at the time when I built the index? I don't know whether it does that or not But again, so you can't do certain stupid things like you can't do Like One build an index on one, but I should be able to do uh id plus one Now wait, what am I what am I missing here? It's oh, I think it's this There we go. That's what it was right double parentheses So yeah, so now I can't won't let me do this But I don't know I don't user Log in expr Let's try another example Functions in the next expression must be more immutable. So there you go. Yeah But I should be able to do this right like No, no time's up. All right. Anyway Give you my point. Yeah There's this thing called well couple of snapshots later on but like there's like the now at the time the query runs And it has to be guaranteed that's consistent At the for the snapshot of the index itself. Yes Kind of So this question is When we create an index what kind of index is it going to create so I so by default It's going to create a b-plus tree If I add that using clause from before So this this is a postgres idiom. This is not in the sequel standard So if I add this using hash this tells postgres make me a hash index by default everyone's usually going to get a b-plus tree Add index, I mean We can do performance, but the reason is you don't need to Search this top Your question is when you add an index you can make make Right. Yes, but the reason is I mean the reason is the reason is to search faster for the reason is you don't need to Look at the list note anymore because we have all the information in the So so when you say you don't you don't need to look at the leaf nodes of what the index like All right, so I think your question is if if I have an index if I do a look up For some queries, I don't have to look at the actual tuples. I get to look at the indexes Yeah, you were calling like the tuples are you know in the table heap Don't they're not leaf nodes the leaf nodes are in the index Right, so I always I always have to look at the index because I in a b-plus tree I always have to go to the bottom Okay, so for some queries if you can do a covering index look up I'd never had to look at the table the tuple I can get all the information I need to compute the answer from the index itself Not all database systems support that though For so for what we've talked about so far other than covering indexes the idea is that we can quickly find the The tuples that have the keys that we want to look up on without having to do a sequential scan So so a sequential scan is n if it's a if it's a hash index I can do a 1 if it's a b-plus tree it's log n Right, so the idea is cutting down as much data as you can to to not look at not do wasted work Correct you save it as if we create a table and there's no index we always have to do a sequential scan Yes, we saw that in the very beginning Yes This question is if I create an index, where does it live? Well, again if it's backed by disk It goes, you know, it gets written out that you know if it's backed by the buffer pool It goes out the disc and I want to do that because maybe my index might be larger than the amount of memory that's available to me So again, I could have I could have an ephemeral data structure that's in memory And I have to blow that away my sequel does that for for their hash table Because it has to be in a memory, but the b-plus tree is backed by disk. So as it gets too large, I page things out Would you like separate It could be the same buffer pool it could be different buffer pool instances It depends on the implementation Again, the buffer pool manager doesn't know what's inside the pages It just says you want page 123? Here it is. And then whoever is is is accessing it is responsible to know how to interpret those bytes The buffer pool manager doesn't know doesn't care In the high-end systems, you can say here's the buffer pool manager for indexes And it has certain replacement policies and here's one for tables and they have another replacement policy But for poster semi-sequel, it's all the same Okay So Let's now jump back and finish up with tries Okay, so In all the examples I showed for the b-plus tree so far the The inner nodes And the leaf nodes always had an exact copy of the keys Yes, you can do prefix compression or suffix truncation as we talked about last time But in general we have the entire copy of the key replicated multiple times throughout throughout the tree structure and so The other issue is going to be also in a b-plus tree is that In order for me to determine whether key exists In my table I always have to get to the leaf node. I always have to traverse all the way to the bottom Right Because again the the inner nodes may have copies of keys that don't no longer exist because when I delete them from the leaf node Depending on how I split and merge. I may have left my guide posts up above So in order to determine whether I know exactly this key exists. I always have to go to the leaf node So This you know again It's log n instead of instead of on to have to do a sequential scan But it's still not great and I may have if you know depending on how much memory I have and how I'm using my My buffer pool manager I may have a page miss where I have to do a look up on disk for every single node as I traverse down So for some applications, it might be nice if we can actually figure out At the top of the tree Whether our key exists without having to go all the way to the bottom So this is what a try does for us. So quick show of hands who here has heard of a try before Okay, perfect. Who here has heard of a radix tree? Fewer excellent. Okay, so So radix tree is just a a specialization of a try and nobody uses tries everyone uses radix trees in databases So what we'll go through this so a try is a tree data structure Where instead of storing the entire copies of keys in our nodes in in the tree We're instead going to store digits of the key and by digits. I don't necessarily mean our Arabic numerals I mean some some subset atomic subset of our key like a byte some or some a single bit And so what happens is that we're basically going to decompose all our keys And store them down the digits down, you know At different levels one by one And then now because we could have duplicated keys or duplicated digits We only need to store that once at each level So a really simple example here would be a try like this where I have three keys hello hat and have So in the first level in the root node all three keys begin the letter h So I sort h once and there's a path down to the second level Where now I I see I distinguish between hello and hat and have Hello has an e hat and have have a a for the second digit So I have separate entries for that and then now I have separate paths down to handle for you know each each unique path in the key So now if I want to do a look up say I want to look up a hello I just decompose it the key into its digits and I look at the h I have a match here I find the e and then I traverse down the lo And the bottom is just like our uh in our b plus tree This could be a record id that points to the actual table that we're looking for So tries are old tries are older than b plus trees or b plus trees were embedded in like 1973 at ibm Tries were actually invented in like 1959 by this french dude And he didn't have a name for it And then there was another cs researcher the famous guy ever fenkin and then a year or two later He proposed the name try Which is short for retrieval tree and he was using that to distinguish from from a regular tree data structure So this is why they're called tries and apparently this ever fenkin guy is actually cmu faculty If you go look at the you know the the cs website the directory. He's listed there. He's like super old I've never seen him at any faculty meeting. I don't know who he is I don't know if he's actually still here, but that's the guy that invented the term try. He's actually here at cmu supposedly So sometimes you also see these things listed as digital search trees or prefix trees as far as you know, these are all These are all the same thing So tries are really interesting right in the context of databases Right, especially if you know now they'll be understand b-plus trees So the first thing that's super interesting about them is that their shape only depends on the The key distribution of the key spaces and their length So what I mean by that is it's a deterministic data structure So no matter what order we insert the keys We're always going to end up with the same shape of the physical data structure Right, that's not the same thing as in a b-plus tree because the b-plus tree if I insert a keys one way And then I shuffle them around and then and I insert them to another tree Depending on how I do is splits and merges. I may end up with different layouts of the nodes The keys might be in one node versus another node in a try. It's always it's always the same thing Right The other thing about them is that they don't actually require any any rebalancing like we had in the b-plus tree So we'll see, you know, there is some rebalancing we could do At the vertical level, but horizontally we're never actually going to potentially rebalance So an unlike in a b-plus tree where all the operations were log n In a try the operation complexity is k where k is the length of the key And this is totally different than a b-plus tree. So going back here So if I want to look up hello I by the time I get here, I know that there's no, you know, I keep going down the bottom But so the number of steps I have to do is dependent on the key that I'm looking up But say I'm going to look up andy The first letter is a I look up in the root node. I see it only has an h I immediately stop and I know the thing I'm looking for can't be anywhere else in the tree And I don't have to always traverse the bottom Yes Your question is for the e here, do I have to iterate across the entire block? So the statement is if this thing is super wide, uh, does that does that mean I have to Squash your scan across the entire thing You pre sort them Right as you do you bind your search to find the thing you're looking for Well, when we see actually how we actually do this in like, uh, for like bites You you can just jump exactly the position you want and it's either there or not there Yeah, this is not really a you know physical diagram. I thought how it was actually stored This is just a high level overview Okay So again, this is super interesting because the fact that like the the the complexity is based on the key that we're trying to look up on the length um It's also interesting because now the The we're not storing the exact copy of the key You know directly in any single node. It's implicitly stored by the path So if we if we want to reconstruct hello, we would traverse down keep track of our path on the stack And then that's how we can put the key back together Whereas now this makes sequential scans more difficult because Although I can be in sorted order I got a back track And you know go back up and go back down unlike in the b-plus tree where I can just scan along the leaf nodes So tries are going to be faster for point queries than a b-plus tree, but they're going to be slower than first for scans All right, so now we get a bit more formal talk about the definition of a try So the we would use the term Spanned to the same way in a b-plus tree For of a node just to say the span is the number of Sort of outgoing branches. This is essentially the number of digits. We're going to represent in In you know at each node in each level So if a digit is going to exist in the corpus Then at the level at each digit we'd have to have a pointer now to another branch If it doesn't exist in our corpus at a level then we just store null like a null bit or something So now this the span is going to use to determine the fan out Just like again in a b-plus tree and that's going then correspond to the physical height of the of the try So the parlance you would say I have an n-way try you would say you have a fan out of order n again It's the number pass coming out And that's going to determine the size of the the the digit you're storing at each level So the most simple try you can store is a one bit try Right, so to each level i'm going to discriminate the a digit for of a single bit So let's say i want to store these three keys 10 25 and 31 So it's a one bit try so i mean at each each level we're going to we're going to look at one bit So i'm showing them in you know Here's the binary form of these two numbers or the three numbers again normally these would be 32 bit or 64 bit But for simplicity reasons i'm showing them in in 16 bits So at the the try will look like this and i'll go through uh At each level so at the root node We're going to examine the first digit position the first bit And again it's it's it's one way it's one bit So it's either zero or one So in this first position all three keys have the have the bit set to zero So at bit zero i have a path going down at bit one is null because there's no key that matches that Then i go down down to the second level And for simplicity reasons We're just going to repeat this think of this repeating across 10 times Right, but it's going to be the same thing. I have a zero all my all my tuples our keys have a zero At every single position and have a path going down and one doesn't have anything But now when i get to this position here now i see that there there's a difference So for key 10 it's bit at this digit or at this position is zero So there's a path going down to this side for the other two it's it's one so there's a path going down here So now if i look at say the remaining part of this key again It's a single path going down and it's you know, it's one zero one zero and same thing if it's null I have there's no the bit is not set at that position. It's it's null otherwise It's it's a path going down and then the leaf note again. This is just a record idea that points to uh, the corresponding tuple Same thing for the other side right at this point here They're the same but then they split here and then now I have separate paths for the other parts Right is this so what you know, so we can do this in one bit two bits eight bits 16 bits We can do this at different levels different gradient layer d's So what's one simple optimization we can do for this Or there's actually two optimizations How can we reduce the size of this tribe? Yes Exactly so he says we don't need spaces to mark zero and some ones because What is this saying right so again? This is this is the value at this the digit in this position and then here's the pointer for it So this is redundant So all I really need to do is just store the pointers Right because if it's if it at the bit is set to zero I want offset zero the bit is set to one I go to offset one So this is horizontal compression. This is reducing the the the size of of the tribe each try node What's another compression way where I could compress us in the back He says repeating the number zero ten times for this one you actually have to have So once I get down here at these parts here, there's no other key that matches this It's sort of what he was saying But but but up here the reason why we had to keep this is because we're going to split here So we need to know how we got down to here So but after this we don't need to store anything we can stack and just store Well, if you go down here at this position at this level if the bit is zero I only have one key that matches so let me just store the tuple pointer to to that key And then same thing over here So this is vertical compression. Yes You have to store it in order. Can't you just like say like put in each level like a little Markation as to what digit the thing you're searching for is and just check that Because you don't have to go in order so you have a more balanced So your question is instead of so sorry city instead of storing the what sorry, you know We like the first the top the top node, right? Yes, zero Yes Like a little number in each level that says like which bit you are and you can go in a different order Instead of just going iterating from like left to right to go for like second and fifth or something like that This is like a little mug optimization. There's like CPU instructions You can run in a single instruction for like a for a bit bit map a bit sequence Find me the find me the the value at this offset in a single instruction. You don't have to iterate Or like you find me to count me the number of ones in my in this bit bit field Right, there's a CPU instructions to make this go really fast So it's not like you're just doing like an in a forlip sequentially scanning over this It's not as bad as you think it is Okay, so again, this is like low level bit information But this is showing you at the extreme case you wouldn't actually want to use a you know one one bitter try Usually you want to store them as eight bits or a single byte But to me this is the easiest way to understand this and so even now with eight bits the same thing for every single position I just have a pointer or not And I can quickly jump to the one you know the offset that I want So this is fine and dandy if everything's static, but actually how do we How do we modify this thing every do inserts and updates and deletes So there is no standard way to maintain a try in the way that there was for a b-plus tree Different implementations do different things So i'm going to show you sort of one brief example I'm not saying this is the only way to do this But there's some of the things you have to be mindful if you're actually trying to build one So let's say again. This is the hello hat and have key set we had before so I insert here Again, I just reverse down. I would find this slot here and now I can sort this into this right So now let's say I want to delete hat. Well, that's here. I go ahead and delete that And rather than reshuffling everything, maybe it's okay for me to leave an empty space here Right because then I don't throw into compaction But now let's say I delete have and now I remove this and I say well now I have this This node here by itself And so if I want to you know actually find here I'd have to you know do an extra hop to go down to the ir but I know I'm not going to have any other match So you could decide just to roll everything up and put it up here Different again different implementations do different things Uh, if you take the advanced class, we'll cover up a bunch of these things. Yes So once we do the Compressions to the try we get rather extreme Yes, yeah, maybe we're clear the a radix tree is one that's vertically compressed Uh, yeah, I should I should allow that more carefully Yeah, I don't have a slide. Yeah, radix tree is is a is one where you remove remove all the paths Yeah, I apologize. I don't know what I don't know. It used to be a slide here is to find what a radix tree I don't know what happened to it. Sorry Um Oh, no, this is it. Sorry. This is the radix tree. Sorry It's when you do the vertical compression Uh, to remove any any nodes where there's no other different differentiating path below it Sometimes it's called a particular tree, but usually they're called radix trees And again, it's a subset of a try Okay, so we covered modifications and the last thing I want to briefly talk about uh is Actually how we do comparisons actually in the sake of time I'm going to skip that let's the last thing I want to talk about is inverted indexes Again tries are super interesting No commercial data system as far as I know Supports them out of the box Hyper is a system out of um Uh, of from germany that tableau bought that runs in you know tableaus. It's post goes compatible Uh, they're they're all in on try. We've done some research here on tries. They're super interesting But right now the B plus tree is still the dominant data structure everyone uses But there's a lot of interest in them All right, so so real quickly Everything that we talked about so far Have for these indexes have been satisfying or handling index or point queries and range queries So if I want to find a you know record where it's people that live in zip code 15217 All right, that's an equality predicate to find exactly the thing i'm looking for Or if i'm trying to find things within date ranges, right? That's a range scan again I can use use a B plus tree for that Where the B plus tree and the hash index hash tables are not good for is when we want to do keyword searches So for example, say I have the entire corpus of wikipedia I'm going to find all the wikipedia wikipedia articles that that contain the keyword poplo I can't use a hash table index for that and I can't use a B plus tree index for that Because i'm trying to find a sub element of Of a value for an attribute Right and again for the B plus tree And I have to have the exact key. I can't do a partial key You know, I can do a partial key look up if the key is comprised of multiple attributes But within single attribute I have to have the entire value of it. I can't have like the first 16 bits So this is the problem we're trying to solve here So quickly just again just remind everyone what wikipedia looks like The main thing we care about is that there's this revisions table that has a has a tech That's my bookie. Sorry, uh, that has the text field All right, and we want to we want to be able to find all the matches for poplar for this So if I try to create An index on the content field This is going to be a really bad idea Because again, it's going to take the entire key an entire contents of this this attribute in our table And try to build a B B plus tree on that Right and in case of wikipedia, this would be really stupid because some articles can be kilobytes And now I'm storing the entire key in in my in my index and the only thing I could do lookups on is You know if someone gives me the entire article back, which is stupid Right and so in order to do this kind of look up like this I I I want to do a instead of you know, something equals something I want to do a keyword search like with a light clause With wildcards and say find me all the matches where the you know the keyword poplar is inside Actually, and this is actually not in the right sequel we go in either way because this is going to match for things that Have Pablo as a prefix like Pavlov like the famous russian scientist And I'm going to find exactly where my name is being used. So this predicate itself is not going to be useful for us So this is what an inverted index does for us So an inverted index is going to map words I mean words as we describe them in in the you know, English language or in in natural languages not like bite sequences in in the processor That's going to map words to the records that contain them And then it's going to allow us to then do lookups on this index and say find me all the records that contain this keyword or have this You know keyword of this certain property So these are sometimes called full text search indexes and just like with Uh when I created the index and I told Postgres I wanted this thing to be a hash table index You can do the same thing in some databases. You can say I'm going to create an index and I wanted to be I wanted to be an inverted index or full text search index so Sometimes in in the theoretical literature. These are called courtances And this is because there was this old lady in 1800s who sat down for 16 years and built an inverted index that mapped Every single word used by Shakespeare in his his entire body of work Right, but this is nobody calls them this everyone instead calls them first full text search indexes areverted indexes So all the major database systems will support some Variants of this Internally as I say when you call create index you can say I want to have a full text search in search index And they all vary in the sophistication of the of the indexes and What kind of queries you can run on them? There's also a bunch of specialized database systems that are that are sold or marketed as full text search databases So the most famous one is probably elastic search and this is built on top of lucine. Lucine is like a library Written by the guy that invented hadoop that does like a you know Does the search and says it does the indexing and then elastic search provides like a like a server interface to to that index So they're also used as lucine. I think Sphinx does as well. I use they should they should be in Which is like a standalone c library that does full text in search indexing because this is better than the mysql Full text search indexing but ideally, you know, these are all these would be internal or sorry external to like Postgres and mysql Whereas these other guys are sort of like it's built inside of it the system itself so the We're not going to have time to discuss implementations But basically all the hash table index stuff we talked about so far in the b-plus trees That's what you're going to use to build one of these full text search indexes So the thing that it doesn't look up and find me all the you know the records that have contained this word I could build that as a hash table. I could build that as a uh As as a b-plus tree But i'm going to augment it with additional metadata that provide the context about how that word was being used in the in the tuple So the kind of queries you can do that you can't do on a b-plus tree and a full text full or inverted index You can do phrase searches so I can do again find all the records that contain the word pablo Uh, I can do proximity searches to find me all the records where the word pablo is in You know within five words or three words from You know criminal or alcoholic or something like that Right because I'm maintaining the context information about how that word was being used Then I can also do a wild cause searches. That's it's more complicated than the like stuff I can do regular expressions or complex pattern matching to find things. I'm looking for So the things we do care about slightly is that how we're actually going to build this thing And again the different systems will all do different things The things are going to vary the most on is what they're actually storing again This is the context information about how the word was found in the in the attribute So at the very simplest form you just have you know the word itself and then map to a record ID But I can also include you know what other words are around it How many steps away from other words and that will determine how How complex queries I can support on this The other tricky thing is actually when do you update these things? So if it's built inside of the system you could in theory on every update make sure you update your your search index Or a vert index if it's a if it's external then you have to run this as a cron job or push updates to it um A lot of times people will stage updates in batches and then apply them every so often because because potentially updating the vert index is super expensive And again, I realize I'm going this over this super fast I just want you to be aware that beyond B plus trees and hash tables that we talked about here It's a whole bunch of other database indexes that are available That can do things beyond point queries and range queries that we've looked at And actually the the other class of indexes that we didn't talk about are the geospatial tree indexes So things like our trees quad trees Katie trees these allows you to do multi-dimensional lookups like in your geometric spaces and things like that These are very common now in like video databases and image databases So there's a whole class That christos felidus the other davis presser teaches 15 826 He teaches it in the fall and spring now. So if you're interested in this kind of stuff, I'll be teaching it in the spring So the main takeaway for all of this is that the for our most of the time the B plus tree is going to what we want That's the go-to index. That's that's very that's resilient and solves many of the problems that people have in databases Inverted indexes we can go you want to go more detailed. There's a whole another class in I think an lti 442 or 642 I think it's called search engines Right, but in a search engine underneath the covers is basically an inverted index So it's the same the same technology the same methods. Okay All right So next Wednesday, we're now going to go look at how we actually make our B plus tree thread safe So we've sort of washed all over this or we've not talked about to avoid the problem Actually, how do we allow multiple threads update the index at the same time? So now we're going to spend more time talking talking about that Okay, any questions Bodies and crushed up can met the cows in the gym or And if the same don't know you for a can of pain pain