 So today we are going to cover query execution. We also have to discuss B3s and hash tables that we didn't finish last time. Before that, let me just briefly discuss a homework and the overall structure of the remaining lectures. You have two more homeworks to turn in. The homework on, so last night, I hope most of you turned in the homework on database tuning and data privacy. I hope you enjoyed the data privacy question. So the remaining two homeworks are on x-query. You have to write 10 simple x-queries and have fun executing them on a geographic database. The last one is going to be to run a big Latin program, which is that's the name of the query language. It's a map-reuse high-level query interface. So it's like a query language that is run over a map-reuse framework. For that, you will get access to Amazon accounts. This is going to happen over the next few days, and you will run these programs over there. The two queries or three queries that you have to write, they are extremely trivial. But plan ahead because this homework will take you a lot of time just to set up the environment, to connect to Amazon web services, to learn big, to learn how to deal with the data. The purpose is to expose you to a technology without we don't have time to get too deep into data analytics. So these are the two remaining homeworks, lectures. Today, we are going to discuss query execution. Probably not finished, but finished early next week. Next week, I'm going to discuss query optimization. The following week, we will discuss parallel databases. By this, I mean both parallel operators, parallel joints, and how they are implemented over parallel machines, and also this very modern framework of map-reuse. The query languages that have been developed on top of map-reuse, in particular, pig Latin, which is what we will use in the last homework. After that, we have the final. Most of the class will take the final during the weekend, 3 and 4th of December, whatever it's Saturday and Sunday. Then we have one more lecture, which is my call is a lame duck lecture, because nothing on that lecture will be on the final because the final is past you. But I encourage you to come because I'm planning to cover a very interesting novel topics. I'm planning to discuss about data provenance. There are two hot topics today in data management and databases. One is map-reuse and parallel databases, and the other is data provenance. People want to keep track of where the data is coming from. I also want to discuss data privacy, since I know some of you had a lot of fun on the homework. I can tell you the major concepts in data privacy. If I have time, I'd like to talk a little bit about probabilistic databases, which is what I like to do, which combines data provenance with probabilistic inference. That's a plan for the remaining lecture. Today, for the remaining of this class. Today, let me switch the slide deck. I'm going to start discussing B-trees and hash tables, which we did not have time to discuss last time. B-trees are used for indexes. They are the most prevalent data structure for indexing. The other possibility is a hash table, and there are other index structures like bitmaps, which are less frequently used. B-trees are like the workhorse for indexing data. A B-trees is a search tree, but it's not binary. Instead, the nodes are much bigger, their pattern, such that they are as big as a block. The idea is that when you access a node, that means that you will read an entire block from disk, and you would like to use as much as possible of such information since you have spate the price of accessing a block. So that's the idea in a B-trees. The nodes are big and they have therefore many keys and many pointers to the next nodes. They look like this. A B-plus tree is a B-tree with two changes. One is trivial and it is that the leaves are chained. They are chained and we already discussed last time why they are linked. They are linked because we want to do range queries. So when we want to find all values k between let's say 100 and 200, then we go down to 100 and then we follow these links, and we read all the leaf nodes up to 200. The other change in B-plus trees over B-trees is that all the keys are on the leaves. Here is where the keys are. Everything else inside the B-tree is just for navigational purposes. It is here just to help us go left or right whenever we need to search for a key. While in a standard B-tree, that's not the case in a standard B-tree every internal node holds keys and values. So let's review the B-tree basics. Because of this convention that all the values are on the leaves, they're not on the internal nodes. The structure of the internal nodes and all the leaves of the leaf nodes differs. The tree is parameterized by a number D for the degree. Every internal node has between D and 2D keys. So it looks like this. It has a number of keys and it has a number of children which is one plus a number of keys. Think of the children as being, every key, how should I put it? The keys, they divide the space of the keys in the node. They divide the space of possible key values into intervals, and each of these children is the root of a sub-tree that contains keys from that interval. For example, if we go down, let me try again. If we go down this link, here we will find values that are between 30 and 120. So all the keys in this sub-tree are guaranteed to be between 30 and 120. You need to break ties in some way. I'm going to break ties by assuming that 30 is here. It's always on the right. So 120, if you look for 120, you should go always right. Leaf nodes, they have somewhat similar but different structure. So here they also have keys, but these are real keys in the data. And instead of pointers, they might have the data itself if it's a primary index, or they might have pointers to, no longer to nodes in the tree, but pointers to actual records in the file. So that's the only difference. So let me show you an example. I think we saw this last time. So here is a B3. So keep in mind that the structure of the internal nodes is slightly different from the structure of the leaves. In the sense that for the internal nodes, these are pointers to nodes. But for leaves, these things, they might be data records. If the index is a primary index, or they are pointers to data records. In which case, this might be the primary key or might be a record ID. Okay, so how do we find the key 40? I have my little animation here. I think you saw it last time. We need to find 40. It's to the left of 80, so we go left. Now it's between 20 and 60, so we go in the middle. And here we have 40, so here it is. We find the record containing 40. Good, and we discussed last time that a B3 can be used if the index is a B3, then it can be used either for point queries, for queries that have an equality predicate on the index attribute, or for range queries. Queries that impose a range condition on the index attribute. So again, the terminology is point queries and range queries. Remember, we discussed this last time. Good, now we can also have indexes on pairs of attributes. And here we have to be careful. We must have an equality condition on the first attribute, otherwise we can't use the second attribute. So remember we had a discussion about the fact that this one, this query cannot be answered by using the index on name and zip code. So now let's look a little bit deeper into the design of B3s and then into how we do insertion and deletion from a B plus three. The simple question is, how large do we choose the degree of the tree? The book calls this the order. So in the book you will see that it is called the order of the B3. Some other books call it the degree. So it's very easy. You choose the as big as possible, such that the note is as big as a block. But not bigger, that's the danger. You don't want to make it bigger than a block because then you have to read two blocks for one note and we don't like to do this. So a little bit of math helps you determine D. So I assume that the size of a key is four bytes. Your key is maybe an integer. The pointer to the next note is maybe eight bytes. And the block size is 4k. So then keep in mind that the largest note will have 2D keys and will have 2D plus one pointers. That's how big a note has to be. And this has to be less than the block size and from here you solve this in equation and you get these 170. This is how you design the order or the degree of the B3. Any questions about this? Okay, now the book has these numbers. I don't know where they got them from. But they claim that these are typical statistics in practice. I haven't seen any other book that has these numbers. But they're fun to read that. So they claim that the typical order is 100. And the typical fill factor is like 67%. What is the fill factor? So each note may contain between all the keys and as few as half of the keys. We'll get to this in the next slide. So the fill factor is what percentage of the note is occupied? What percentage of the note is full? And they claim that the typical is 67%. Therefore, the average fan out is 133. A typical note has like 133 children. Now, the rest is really instructive because it gives you a sense of how well B3s work as indexes. Think about the B3 that has height 4. At height 4, you can store so many distinct keys. 133 to the power 4. And that turns out to be 312 million records. What about Depths 5? Well, times 133. So that's like 30 billion records or no, more. Like 30 billion records or 40 billion records. So you realize that by the time you reach Depth 6, there are more keys that you can ever imagine in the entire world. Not even Google has so many distinct things to index. So this is, now you turn this around and ask the question, how many reads from this does it take you to get a particular value? To access a particular record? It's exactly the depth of the tree. And that depth cannot be bigger than four or five or maybe six if you have an enormous amount of data. On the other hand, there is enough main memory around, such that we can store the first level, which means the root. For the first two levels of the B3, we can store it in main memory. So you only need to have like three or four reads from disk to access any data item. And that's amazing. So it doesn't matter how much data you have. And three or four disk accesses you get to your data. That's a part of B3s. I know it's very simple, but it is actually amazing if you think about those arrays of disks that hold your data. And with only four reads, you get exactly to what you need. Good, so let's see how they actually work. We saw how to access a key in a B3. The difficult questions are, how do we insert the value? And how do we delete a value from a B3? Let's see an insertion first. So you get a new value to be inserted in the B3. By the way, when does this happen? When do we have to insert a value in a B3? Yes? Insert a new record in the table? You insert a new record in the table? But there are other cases when you don't necessarily insert. Update an index? How do you have an index of something with a predicate attached to change the road to the index B plus 3? Think about a common operation, just a standard SQL command. It's a standard SQL query. One is insert, but there is a relative, the good friend of insert. Modify, update. You update a value, you update an attribute value. You update what did we have? I don't remember, H. You update the H from 25 to 26. So now you have a new value. You have to delete the old value and you need to insert the new value. And that's what you need to do whenever an update occurs on an attribute for which we have an index. Good, so here is a problem. We are getting a key and we get a payload. And we need to insert this new key and this payload in Z3. Really, the first task is to find the leaf note that where K belongs. And there is only one. You search for it, you pretend you search for it. If you find it, then there is an error. If you don't find it, then it's good because now you look at the leaf where it should be. And you just store it here. The problem is, if you don't have enough room in that leaf note to store the new key, then what happens, you need to split it. So here is an example. And all my examples on the next slide, D is going to be equal to 2, which means we can store between 2 and 4 keys. Did I actually say this? I think I didn't define. Yeah, I don't think I ever said this. Every note can have between D and 2D keys, except the root. The root can have between 1 and 2D keys. The root can be much more empty than the other notes, for reasons that will become clear in a few minutes. Okay, so if the note is too full, so in our case, we can have between 2 and 4 keys. Now here we have 5 keys, not good. So what do we do? We split it in two and one of them will contain K1, K2. And well, we need to have all 5 keys. And actually there is an error, there is a mistake on this slide, okay, which makes it much more interesting. What's the mistake? What is the mistake on this slide? This is a leaf note. And the leaf note must contain all the keys. So once we insert it, and the leaf note is, you can't actually have 5 keys in the leaf note, this is just a conceptual picture. But conceptually we have 5 keys, they don't fit there. We need to split the leaf note into two leaf notes. Which keys do you expect to see in these two leaf notes? Or five? Because all keys must be present on the leaf note, okay? So that's a mistake, case three is missing. So the concept here is that we split the leaf note in two. We store half of the keys in one, half of the keys in the other one. But because it's a leaf note and because in B plus three, behave differently from internal notes, all the keys must be present. We can't simply push case three up. We will push case three up, but we also must keep case three here. My question is, where? In the left note or in the right note? According to our convention. Remember how we handle ties. Remember case three will go to the panel note. Case three will distinguish between left and right. So where does case three belong? To the right, exactly. Which is much harder for me to insert because it belongs here. And it's payload is... How about what would have gone to the left? No, it has to go on the right because of our picture. But I'm confused about the payload. Where is the payload here? So it has to go on the right because look, this is our convention. Where is our convention? Right here. Let me erase this. The convention is that when you look for 30, it's on the right, it's here. So if it's coming back to our example, case three will be sent to the parent and will distinguish between these two children. So therefore case three, the value case three must be on the right. But I'm still confused what happens to the payload because we must have... Oh, I'm sorry, that's not my mistake. The leaves, of course, the leaves are different from the internal nodes. Their payload has exactly as many records as keys. For every key there is a record. So therefore there is no P0 here. It's going to have... They are in one-to-one correspondence with the keys. And similarly here. Okay, this will become clearer on my picture here. And here the correspondence is correct. So let's insert 19. Where does it go? And on which leaf? On the first leaf, right here, right? And this payload will be right here. And you can see there are, let's take a leaf that I didn't touch. Here we have two keys and there are two records. Which is different from the internal nodes where the number of pointers is one plus the number of keys. Okay, but let's focus on insertion. So, yep, here is 19 and here is its new record. Now, let's insert 25. Where does 25 go? Which leaf? The second leaf. So I would like to put it right here. All right, but there is no room for it. So now this, if I put it here, conceptually we'll put it here. Let me just do this. Notice that the number of payloads is equal to the number of keys. That was my mistake on the, one of my two mistakes on the other slide. What do we do now? We split, this node can't exist like this. It's nice conceptually to think about a node which is too full, now needs to be split. But in practice it can never be too full, right? It can never get here. So we split it in two and we have a good reason to split it right here, okay? But now the parent will no longer have three children. How many children will it have? Once we split one of his children in two, we get four children. So we need a new key in the parent. Which will that key be? 30, exactly. 30 is the key that goes to the parent and 30 will distinguish between these two nodes that we just split. So let's see this. Now we have to split. Here it is, we split. And now you, let's verify our convention. We had to move 30 here to the right. Because our convention is that when you search for 30, in the case of a tie, you follow, you go to the right. Good. So that was insertion. And now what happens if this node also became full? What happens then? Then we do one more split. Here it's slightly different. Because now the key in the middle really disappears. We remove it and we send it up to the parent and we split the node in two. But what happens if the parent becomes full? Which now the parent is really the root. It's going to take a while to become full because right now it's kind of empty. But at some point it will become full. Why do we do that? Sorry? Add a level. So imagine that this node, after many, many insertions became full. So we'd have like 80, 40, whatever, 90 and 100. And now you want to insert something new. Well, let me insert something new, like 85. And now it's too full. What do we do? Exactly, so we split it into two. Let me split it, show it here. So now we have two and there is a key that goes up. So now we create a new root. How many keys will there be in this new root? Just one. And there is nowhere we can get more keys. And each of them will be exactly half full, right? Because the rest is empty. Yes? Yes, do you rebalance the tree at all when you're doing this? Do we have to rebalance the tree during this process? Did we ever create an imbalance? Nope. As you see, we work level by level. We never created a new level except when we created this new root. That's the beauty of B trees. This is why they work so beautifully. It's actually amazing. Let's step back and emphasize this. The search trees that you learned, they worked nice as long as they were balanced. But are binary search trees always balanced? No, it's actually, we went through great pain. Actually, you went through great pain when you took this data structure class. I don't think you took it with me. You have to go through great pain to balance. So how can you balance a binary search tree? There are several techniques, which ones you remember. Yeah, they have a certain name if you use rotations. So they're either called ADL trees when they use those rotations, or red-black trees use other kind of rebalancing techniques. So there are several techniques by which you can rebalance the binary search trees and they are never perfectly balanced. If you work so hard, then they are never perfectly balanced. Now fast forward to B trees. They are always perfectly balanced. And look how easy it is. You just insert, and the only thing you need to worry about is to split the node when it's too full. And then you push up a key and two children. And one thing that, one detail that we need to emphasize, and you should keep in mind, is let's work out the mass of splitting. So a node can have up to two D keys. If it is too full, how many keys would there be? Once it overflows. Two D plus one. Now if we split it into an internal node, one key goes up. So how many keys remain in each of the children? D and D. Which is exactly the minimal number of keys allowed in a B tree, right? In a B tree we can have between D and two D keys. The mass works beautifully in the case of B trees. For the, in the case of B plus three, so these things need to be treated slightly differently, but it still works out. Here you have one node will have D keys, and the other will have D plus one keys. Could any questions about insertions in B trees? Because then we move to a harder topic, which is deleting from a B tree. So how do we delete certain? We find certain, and what do we do to it? We just remove it, it's simple. We can simply remove it from here, and we don't have to worry about anything because there are still two keys left in this leaf node, and therefore it satisfies our constraint that there will be between two and four keys. Good, so this worked nicely. Notice something here. What do we do with this set here? Do we have to change it? And if so to, well, it says here, we have to change it to 40. Do we have to change it or not? No, we don't have to change it. And that's the distinction between B plus trees and B trees. There is the distinction between the leaf nodes that hold all the data values, and the internal nodes, they only hold keys that allows us to navigate left and right. So that 30 is harmless, it can stay there 30. I think I changed it to four, no, I left it 30. Good, next task, let's delete 25. So now we're in trouble, right? Because this leaf becomes too empty. What would you do? We cannot leave the leaf less than half full, that's not allowed. If we do, actually, why isn't it allowed? Why not leave nodes more than half full? Less than half full. Where's the node? Sorry? You get to where's the node? We can repair it, but why not just ignore it? Why not just move on with our life? Leave it with fewer than half of the leaf. Just the keys, and just continue operation. What will happen? It's a form of imbalance. Because imagine inserting many keys in a B3, then you delete most of them. We expect the tree to shrink, but if we violate this constraint that every node has at least the keys, then it's no longer balanced. Then some branches will have only one key and they go deep just because they only have one key. So this is why we insist to have at least half of the keys present in each node, except for the root. So what do we do here? How do we rebalance this node? You think, is it too trivial or too hard? That's 20 to 40, and 30 to 40. So we move 22 to 40? No, actually we don't touch 40. We do something else. We borrow a key from the left sibling because the left sibling has extra keys that it can lend. Even if we remove a key from the left, it's still not under four. This is called the rotation. And it's represented like this. It's like taking a key from the left and it has to be the biggest key. You take it and you move it here, but as you move it, you move it through the parent because now the parent needs to know that this is a new key separating the left from the right. So let's see what happens. Here is what happens. Now 19 has ended up here. We have a missing key here, but that's okay because that node had at least one key to spare. And 19 has appeared on the top. So that's what we do. If the node becomes too empty, then we look at it's a left and that's a right sibling. Whoever has a key to spare, we borrow it from there and this is called the rotation. Okay, but now, let's delete 40. We have a similar problem, right? So this node becomes under full, but now there is no place to borrow from. The left sibling has nothing to offer. It's already, it's a limit and there is no right sibling. We can't borrow across two levels. So what do we do now? Merge. So now we merge them and again the math works beautifully because if both of them are half full, then if you merge them, you can never get over flow. So this is what will happen. We merge and as we merge, we need to remove the middle key, the 30. So the merge actually goes like this. We remove a key from the parent and the result is this. Now the parent happens to have two keys, but what happens if the parent gets under full? What happens if you end up merging these two nodes and now the parent needs to give up another key and now it's under full? What can we do? Repeat the process at that level. Repeat the process at that level. In this case, we can do a rotation and if no rotation is possible, then what do we do? We merge and we borrow a key from the leaf, right? So when, let me ask you this, when does the height of the tree decrease during deletion? When we merge level two, we borrow a key from the leaf, but in the leaf had only one key left. If it had more than one key, then we just remove one key and we keep the leaf with fewer keys. But if that was the last key on the leaf, then we end up with the leaf, which only has one child and no keys. That's redundant. That's something we can remove. Good, so this is how a B3s work and these are all my slides. I really insist that you understand the mechanics of B3s. It's like a fundamental thing in computer science. You can use the slides, you can read a book. I'm sure that most of you already have seen B3s in one form or another. Okay, so let me discuss some higher level aspects of B3s. In the databases in particular, we often have to index not on numerical attributes, but on strings and B3s, they have a particular optimization on strings called key compression and the idea is very simple. Strings are all that elixicographically and the idea and key compression is that you only store the difference, the character that differs from the previous keys that give you the navigation. Now, as I look at my example here, I realize that these are not alphabetical, but I hope you don't, I hope this is my little annoyance. Imagine that these keys are in one of the nodes of the B3, and its parent is what can it be? Joan. So it says if you're bigger than Joan, then go right, and the next one is Kate. So that's apparent, Joan and Kate. So if you're bigger than Joan, go right. If you're bigger than Kate, then go here. Here you find Jonathan, Joan, and so on. What they do is that they don't store this GO because it comes from Joan. So they say the first two characters and all this node are borrowed from my parents, and just continue with the other characters. So this is what they restore. This is called key compression. It's a major saving because if the strings are long, then essentially you need to store each new character once. You don't have to repeat the prefix, and that results in significant savings. It's described briefly in the book. The actual theory behind it is of course much more elaborate, but we don't go in in detail with key compression. Other practical aspects, all database systems should support this, bulk insertion. Here is the idea. Suppose you do the normal thing. You import a data from a file, you upload it in a database system or from whatever. Now you have a table, and now you say create an index. So you have the table already, and it's the first time you tell the database to create an index. What can the database do? Well, what it can do, it can iterate over all the records in the database, and for every record, insert the corresponding value in the index, and we have seen how the insertion does, how the insertion goes. But that's very inefficient because well each insertion is a heavy duty operation where you have to traverse the entire tree up and down, actually down, and then back up to do the splitting. It's much better to do something called bulk insertion. But what does it mean? What does it mean to do bulk insertion? I can tell you what the operation should achieve. It should construct the entire B3 for the entire table, much more efficiently than inserting every key one by one. How can you do this? Yes. You can start with the leaves, because you can basically start with any D or between D and 2D to each leaf. That's exactly what you do. Before you can construct the leaves, what do you need to do? So you sort all the values, and this results in the leaves. The leaves are now sorted. The leaves of the B3 are always sorted, and here we have all the values in sorted order. Now we just do the right math and we construct the upper levels by grouping some groups of T and we move up that way. So let's do a quick math. We didn't discuss this, but what does see if there are n keys and if D is the order or the degree of the tree? Then what is the time for retrieving a key? Let me do this. So time for finding a key as a function of n. What is the asymptotic running time to find a key? It's a big O. What should I write here? Log D of n. Because there is a fan node of D. So at each step every time we do a read, we divide the entire data set into D chunks, and we only continue in one of the chunks. We can only divide this log n of D times. What is the time to insert? It's the same, because we need to go down and then up and the same for division. So what is the time for creating a new index? It's n log n, exactly. It's n log D of n. What is the time for bulk insert? It kind of gets tricky, depends on whether the records are sorted or not. Let's assume that they are sorted. Somebody sorted them for us. So then imagine this process when you line up the leaves, and then you construct the level one by one. What is running time? Well, every node and three needs to be touched only once. Well, once when you construct it, once when you read it for its parents or twice. So it's O of n. It's proportional to the size of the tree. Think about it this way. You essentially, it's like a linear scan of the leaves, plus of the bulk of the tree, of the internal nodes of the tree. One thing to keep in mind is that every tree has a property, that the number of internal nodes is no bigger than the number of leaves. So you don't spend more time processing the internal nodes than you process on the leaves. That's a major advantage. Now to this, you have to add the sorting time, which happens also to be n log n. But that one can be implemented much more efficiently than inserting in a B3. We will discuss sorting probably today, but more likely next week. Good. Here is another very important practical aspect. So actually, let me not do it this way. Let me do it this way. So the B3 looks like this. Imagine a database that is an OSTP database, which means an online transaction processing. Where the workload consists of many updates, like simple reads and simple updates or simple insertions. Like an accounting database or financial database, where you read accounts and update the account and place an order. So these are point queries. They are called OSTP or online transaction processing workloads. We have discussed concurrency control, imagine locking and every time a customer wants to update their account, you need to lock the customer's record, then update and then you release a lock. If many of thousands of customers want to access the database, things look good because each of them access their own record, and there is very little contention. How can you have a contention in such a system? It's only if the same customer happens to access his account from two different clients, very rare. However, once we introduce an index, we have a big problem. What is that big problem? This nice picture that I described, in which thousands of customers access the database, but each of them locks only their own record, suddenly breaks. There is a huge hotspot where all these customers want to lock the same thing. Yes. With the root of the index? They want to rule. Yeah. They are in love with the root of the index. Why would they lock it? Yes. It's kind of locks in a waterfall fashion going down towards their node. Exactly. So imagine the typical update, or the typical insertion. You want to update whatever the order, the total number of orders for that customer. So you go down the tree, find the leaf, but then you have to start doing the splitting. So you need to come back. You don't know how far. In the worst case, you have to come all the way back to the leaf and split. Because of this worst case, you cannot release a lock because then it's no longer serializable and then the entire tree breaks down. If you have a non-serializable schedule that does these low-level updates to the B3, then it's no longer a B3 and then we're in big trouble. Nobody can recover that. So we have to lock the leaf, the root. So suddenly, our database that was supposed to work wonderfully is going to be a nightmare because everybody wants to lock the leaf, the root. What is the solution? How do we solve this? The book doesn't talk about this. I don't know why. It's such a simple solution. Yes. Ah, give up. That's one solution. Any other suggestions? It's actually an amazing idea. What they do is they split the root proactively. And the idea is that you only need to split the root if it was full and if you're pushing a new key from below. So what they say is, look if it's full. Let's split it anyway. And then it's split proactively. And therefore, a typical insertion goes like this. If the root was full, you split it. And now you continue your search. If this node was full, then you split it as well. But now you only touch this node here. And you continue your search. If this node is not full, then you don't split it. How many locks do you need? And at the end, when you insert in the leaf, you don't need to go back and insert and do the splitting along the path all the way up to the root. So how many locks do you need to hold at any one time? Only two. One for the parent node and one for your current node. So for example, you have split this already. So now you examine this node. And if it is to fold, then you split it. And it still needs the lock to the parent. But once you've done your split or you decided to want to move further, then you can release this lock and acquire the next lock. This locking protocol has a name. What's the name? A simple name. And we discussed it. It's called the tree locking protocol. And what is remarkable about the tree locking protocol is that it is not a two-phase locking protocol. It's not a strict two-phase locking. It's not a two-phase locking at all. Because we release the lock on the parent all the time. We keep these two locks, release the lock on the parent, acquire the lock for the next child, and go on. However, even though it's not a two-phase locking protocol, it still guarantees a realizability for a simple reason. It's something that you can check or you can read the proof in the book. Yes, and now I hope that things fit together. We discussed the tree locking protocol without too much motivation. That's where we need it for this in order to avoid the hot spot at the top of the tree. And this is used in conjunction with aggressive splitting of the nose, both splitting and merging. Actually, it works for both deletion and insertion. Good. So that was my last practical aspect. So I discussed what is on this slide, how to improve concurrency control on B trees. OK, so to summarize, B trees are wonderful. They are the default index structure used by most database systems. They are effective both for point queries that look for exact equality between the indexed attribute and some value, and also for range queries that look for a range. They can't handle multi-range queries. If you have a query like this, what kind of index would help you find all the products that have price between 50 and 100 and quantity between 2 and 20? Yes. Yes, just example of indices that you would pick one. Yeah, that's the only solution. So this is what database systems do. So you probably have an index on price and you have an index on quantity, but you need to pick one of them. And if you have statistics that your colleagues had, then you will pick the one that has the best selectivity that returns the fewest records. But what B trees don't help you do, you can't have a B tree on both price and quantity and have to answer this query more efficiently. This is called a high-dimensional query, because what it does is that in this space of quantity and price, it wants to find a rectangle. And B trees, they cannot handle rectangles in two-dimensional space or parallel people in higher-dimensional spaces. If this is what you need, then the structure that helps you do this are called R trees. And they are designed to support queries like this. So an R tree would be an index on the pair, price and quantity, that allows you to answer queries like when you have a range for both price and quantity. R trees are much more messy. They are not so elegant as B trees. The splitting and merging criteria there are much more ad hoc and heuristics. But sometimes you need them and some database systems, especially if they offer spatial support for spatial data, then they have R trees to speed up queries. Good. So that's the last slide on B trees. Any questions about B trees? Good. The next topic is going to be hash tables. Everybody here, I assume, knows the basic of hash tables. Still, I hope I'm going to surprise you. I'm going to show you two interesting hash table structures that probably a few of you have heard before. So let's review the basics. We have keys, k, and the essence of a hash table is that it's a function called f, or sometimes h, that takes a key, maybe a string, and returns a number between 0 and n minus 1. And it's typically medium size. It's like in the range of hundreds or thousands or maybe hundreds of thousands or even millions. But it's like an array. It's an array that you can store either in memory or on disk. This is a main memory hash table. A secondary storage hash table is exactly the same thing. But now a bucket, what you store in this f of k, is an entire block. And we store as many keys here as we can fit. So let me show you an example. Here is an illustration of a hash table with four buckets. And it's four. And I assume that every block can store two keys. So for example, in block one, we have b and f. And nothing else fits here. And in block zero, we only have e. So using the hash table is straightforward. If you're looking for a certain key k, how do you find k? What do we need to do to find k? To search for k. We hash it. We compute a number f of k. And now we have a direct access. We assume that these blocks are stored in an array, such that if you have the number of the block, you can access it with one single disc operation. And then you need to do a local linear search in the block to find the actual key. Good. OK, so I actually have here an example. Search for a, compute the hash value. I knew that I'm going to call the hash function h in the end. It's not f, it's typically denoted h. So in this case, it sends you to bucket three. You read bucket three, and there is a. Let's insert. If you have space in the block, then you just insert it. If there is no room in the block, then we create overflow blocks. And if you remember, there are lots of ways here with overflows and hash tables. But in the context of databases, we only use this, what's the correct name? I always get confused. This is changing, right? This is the open-chaining hash table. So we always create a list of all these overflow blocks. OK, what is the performance? If there are no overflows, performance is great. Then the access time, time to search. If the hash table has n and n reasons, the time to search is all of one. It's just one single read. But once we start having overflow blocks, the size of that list can grow in an uncontrolled way. And then you can get performance that is as bad as of n, if most of the entries in the database collude in a single block. Good. So that's a basic. Now, how can you make it better? Unless you have seen this, you probably imagine you cannot improve hash tables. But amazingly, there are two techniques called extensible and linear hash tables that were developed in the context of databases that to precisely this, they cope with a case of collisions. They allow a hash table to grow. How many people have heard of extensible hash tables or linear hash tables? OK, so it's a great opportunity to learn something new. And actually, they're quite fun. So here is how extensible hash tables allow you to grow and to avoid these overflow blocks. The first of all, they start with a hash function that returns a huge number, a number between 0 and 2 to the k minus 1. Think about this as being like 2 to the 64 minus 1. So it returns 64 bits, let's say. But you're not going to create a hash table with 2 to the 64 entries. Instead, you're going to create a much smaller hash table, which has so many entries, like 2 to the, I don't know, 2 to the 7, let's say. OK? And it's very easy to use a hash function that returns 64 bits because you're not going to look at the last seven bits and ignore the others. So let me show you how this works. I have much smaller numbers here. So the hash table returns up to 2 to the power of 4, so 4 bits. But we are actually only using one bit. So for example, the key 4, which in binary is this, will be mapped to 0 because we only look at the very last bit. And the key 7 will be mapped to 1 because we only look at the very last bit. The notation that we use is here. So the keys that sit in these buckets, they have 4 bits because the hash function returns many bits, 4 in our case, many is 4. But we don't care about the first three. We only care about the last bit. Good. So how do we retrieve? So suppose we are getting the key 4. How do I find it? We apply the hash function. In this case, it's just 4. It gives me 0, 1, 0, 0. So where do I look? At the last, I ignore these bits. And I just look at number 0. Something needs to tell me how many bits to inspect. And this is this global information I that sits around and tells me we need to look at one bit. There is also local information here that will become clear in a second. But now let's insert. We insert 13, and that goes straight into this bucket. And the interesting thing is what happens if we insert and we have an overflow. And the rule in extensible hash tables is no overflows. Whenever you don't have room to put a new key, you will double the size of the hash table. And in essence, what you do is that now you're going to use the other bit, the last of the bits that you have ignored. So think about it this way. Here we have three keys. The first was 0, 1, 1, 1. So far, we only use the last bit. But now, we are going to use the last two bits. So this will end in a different bucket from 1, 1, 0, 1. Where the last bit, the last two bits now are 0 and 1. So let's do the work. We need to double the size of the hash table. Initially, it was of size 2 right here. Now we're going to double it. Double because we are going to use one more bit. Initially, it was 2 to the 1. Now it's 2 to the 2. And the rule is that only this block, the overflow block, needs to be split. We don't want to touch the other blocks. That would be too expensive. So how do we split it? Well, it's kind of obvious. The first key goes to 1, 1. So the first key goes here. The second key goes to 0, 1. So here it is. And the next key goes also to 0, 1. So here it is. For all the other blocks, so we don't worry about splitting. We just put the pointer in 0, 0, for example. So we'll go to essentially to here. And the pointer in 1, 0 will also go here. Let me show you one more example. So now let's suppose we insert these two new values. And now we want to split this block. So what do we need to do now? We don't need to double the hash table. And the reason is because for this block, we have only examined the last bit. So now we can afford to examine the last two bits because we are still within the two bits that are currently inspected by our hash table. So let me back to Rakhir a little bit because I see some confused faces. Look at this number here. What does this number say? Number of bits. The number of bits that we inspected in that block. Now what's the relationship between this number and the red number? Always less than or equal to. You cannot inspect more than two bits. But maybe in this case, we inspected only one bit. So now if this block overflows, we can expose the next bit because we still have one left. We expose the next bit. And now it's kind of obvious. We just start redistributing the keys. I'm pretty sure they went in the right place. OK, so let's say I don't have another slide, but let's try to go together over a scenario in which we insert the key. I want to put it here. What color is good? Let me use black. 1, 1. No, 1, 1, 0, 1. We had already. 1, 0, 0, 1. OK? So now the hash table has to grow. So we need to double it. I want to use a different color here. So now I'm going to add the extra number, the new bit. OK, so let's worry about the overflow block first. Where do these elements go? So this will increase to 3. Where do they go? This one stays in. Oh, right, so I actually need to create a new block, right? And this will point here, while 1, 1, 0, 1, which is this one, will point here. OK? And here we also have 3 bits. So 1, 1, 0, 1, this key, where does it go? To the old block or to the new block? To the new block. So this I'm going to delete it from. Here is 1, 1, 0, 1. 0, 1, 0, 1, where does it go? To the new block. 0, 1, 0, 1. 1, 0, 0, 1, stays here. This stays put. OK. OK, this pointer, what do we do to it? The pointer from 0, 0, 0, where does it point to? To exactly the same place. Do we do anything to this block? No, we don't split it. And same for the other pointer. Same for, let me use blue for these old pointers. Now this pointer here, 1, 0, 0, 0, where does this point to? To which one? Nowhere? No, it has to point somewhere. Where? To the first block, exactly. It will point right here. And the reason is because if you're looking at for 1, 0, 0, you will be sent to this first block. But here you will notice that only two bits have been exposed. But yeah, they're right here. 1, 0, 0 is right here, and so on. The next, this is already 1, 0, 1 already points to the new block. That's the one that's split. But the others are old blocks. So they will point here and here. Yep, does it make sense? OK, so then let's review this. So we insert. And my question is, since these blocks need to split, we get worried. We want to touch a single block. What is the worst case scenario in the case of an extensible hash table? How many blocks do you need to touch? How many blocks do we need to manipulate in the worst case during an insertion? Just one. Well, one or two. If you have to split it, then you need to manipulate two. But even if you double the size of the hash table, all the other blocks, you will just keep around and point to them. This is what makes it so attractive. You only manipulate, you only split the block that has overflowing. But now, if you really have to double the size of the hash table, then all the entries, all the new entries in the hash table, they need to be updated, of course. So the answer to the second question is that half of the entries might need to be updated. So what do you think about the extensible hash table? How fast does it look up? When you search, what is the time for search if you have an item in the hash table? It's all fun. There are no overflow blocks. We are happy. So is it so magic? What is the disadvantage of an extensible hash table? All the hash tables, they suffer from the fact that they cannot offer range queries. But try to compare it to the open link hash tables, the standard ones, where we simply change them. Sorry? Limited by the max value of it? I has this as far as you can go. Oh, I see. So at some point, they will get stuck, because there are no more bits left to extend. This is true, but you can prevent this. You can start with 128 bits. And you never grow to that, even 64 bits. You never grow to an array of size 64. But in terms of performance, there is a problem with them. What is their performance problem? The crack keys are huge and grow. Exactly. The size of the hash table actually might go in controllably. But even if it grows in a controllable fashion, when it has to double, that's a major disruption. That operation takes, it's no longer going to be O of 1, because you need to fill in half of the new entries in the bigger hash table. Something else can go wrong. This hash table might fit in main memory. But after one insertion, it doubles in size, and now it might no longer fit in main memory, just because it doubled. There is another interesting phenomenon. Let me actually go back here. So here we inserted 1, 0, 0, 1. But what happens instead of this? I would have inserted 1, 1 more. We can have duplicates. I would have inserted this value, 1, 1, 0, 1. What happens? So we clearly need to double, right? But then when you split this block, what do you notice? Where do these elements go? To the same block. So we have not solved the overflow problem. So what do we need to do? We need to split again to expose another bit. Hopefully we divide the elements in this block into two. So you might even have to split repeatedly after a single insertion, which is a major disruption. That's a major disadvantage of extensible hash tables. Good. So we discuss this. The next hash table idea that I want to show you tries to alleviate this problem. So it's going to allow the hash table to grow, maintain the same principle, to try to avoid those long lists, the long overflow lists. But it's going to try to make the hash table grow one by one, as opposed to doubling in size. These are called linear hash tables. So here is the idea. We cannot longer double. Instead, we need to grow one by one. And therefore, we cannot use all the i bits that we currently have under use. Instead, the size of the hash table will be such that we use i bits. But the hash table is slightly more than 2 to the power i. So the idea is that if the key has a 1 on the i plus 1 position, then what we do is we simply flip that bit, we make it 0, or ignore it if you want. And then we are back in the range. So let me show you how it works. So in this example, we have n equals 3. So the entry 1, 1 is missing. That's maybe one way to think about this. 1, 1 is missing here, which means that if you have a key which wants to be 1, 1, then we flip its bit. What did I do here? We flip its bit, and then instead of 0, 1, 1, 1, it's going to be 0, 0, 1, 1. And that belongs here. Should we take another example? Let me take a bigger example here. Let's suppose i is 4. And we have all the keys between 0, 0, 0, 0. And let me take some arbitrary number here. 1, 0, 1, 0. No more other place to go. So now where would we send the key for which the hash function of the key is 0, 1, 1, 1? Where do you send this? Simple case. If this is a hash table. Exactly where it belongs, right? Exactly to 0, 1, 1, 1, 1. Because it's right here. But where do we send 1, 0, 1, 1? Well, if you look for it, it's right here. It's after the end of the hash table. I can't put it here. Instead, where do we put it? We flip the bit, make it 0, and now it's clear. It goes exactly into 0, 0, 1, 1. That's the idea of a bit flip. Good. So now let's see how insertion works. The idea in linear hash tables is that it will still have overflow blocks. So now it has essentially two knobs to control. It can allow overflow blocks to accumulate, and these lists can increase. Or it has the option of extending the hash table. So in this case, the overflow block is accepted. But when the system thinks that the hash table is too big, it will simply decide to increase n by 1. So to create one more entry in the hash table. So let's see that process. Let's see how this insertion happens, how this extension happens. This is illustrated here. So on the left, we see the old hash table. And this hash table has to grow by 1 right here, which is essentially this entry. So that means that this entry will have a new block. And some of the keys will need to end up here. Which keys? How do you find the block from which we take keys? How do you find the bucket from where we take keys to put here? Yes? You flip the reading back. Exactly. You flip the reading back. Bit gives you 0, 1. So this is exactly the only block that you need to inspect and decide where to send the keys. And some of them will be sent to the new block, and others will stay behind. And of course, here you also need to increment. So now here we have exposed. No, this is fine. This stays to the number 2 remains here everywhere. Because in each of case, we have looked at only 2 bits. It's slightly more complicated. Let me see if I have that expression. Well, I think that's the only slide I have for a rather complicated operation when we need to add one more bit. So now if you want to extend this hash table, now we need to add another entry, which is 1, 0, 0. In essence, we need to increase i from 2 to 3. Which means that now we need to expose all the other bits. So which are the keys that need to, let me raise it. Which blocks do we need to inspect to find keys that go into this new block in the block 1, 0, 0? Just 0, 0, 0, right? Why do I see here we need to touch every single block? So clearly the last 2 bits, they have to be 0, 0. No, this is wrong. We only need to touch this. That's the only one that we need to touch. So it's not that complicated. So the book actually has material on linear hash tables. And I invited to read from the book. They are used quite extensively in a database system for indexing because they do have good performance and because they offer this nice tuning mechanism. They allow you to have a trade-off between how much you do the extension and how much overflow block you tolerate. Any questions about linear hash tables? Then I'd like to stop here the discussion from last time and finally move to our new lecture. So where are we? We are discussing implementations of database systems. We discussed so far indexes. And we had some discussion about database tuning, how to improve the performance of database systems by reorganizing the physical schema. And we discussed indexes. Today we are going to start discussing the query process of itself. How do database systems take a SQL query and end up evaluating it? Where do they start? So for that, we will discuss the relational algebra. And that's covered in an early chapter in chapter 4.2. A rather confusing discussion of query evaluation, which no matter how you look at it, it needs to describe various things that interact. And they only become clear when to see all of them how they fit together. So I apologize. This is always a little bit confusing. And then we will start a systematic discussion of particular implementations of relational operators. They're all simple, but they are basic. And it's important that we are aware of them. So let me start like motivating the relational algebra. Think about database systems as transforming the what into how. What is what we write in SQL? You write a SQL query. This is when you say what you want to get from the database. We say that SQL is a declarative language. But the problem is, this cannot be evaluated. The database system needs an algorithm to evaluate the SQL query. And that is a how. And when it decides how to evaluate it, this is when it needs to take into account the physical schema and all the indexes and how the data is stored. And it needs to make up a plan of how to evaluate it. So but I realized looking at the clock, I completely forgot we need to take a short break, right? So let me see where is a good place to start. Let's stop right here. Let's take a break until eight o'clock. And then I'll discuss relational algebra and move on. OK, so we're still missing a few. But I think I can start. I have some very fluffy material at the beginning just to get excited about this idea that SQL is a declarative language. That's what we want. And somehow we need to find the magic to translate it into a concrete algorithm. So this is an example of a SQL query that you know and love. It does a three-way join on product, purchase, customer. And here are the join conditions. And it selects only products whose price is greater than 100 and only the customers who live in Seattle. But now if you look at it and if you imagine three files, three big files like the product, the purchase, and the customer, I would ask you, please evaluate this query. How would you proceed? It's not clear at all. The only thing that we discussed is a nested loop semantics. And even then, that makes a choice about the order of the tables, in which order to iterate over the tables. And that choice is not clear which one is the best. The formalism in which database engine expresses the how is out of the relational algebra. The relational algebra is a language, it's an algebra, that tells you in which order to execute the operations. And here is an example. The same query as before expressed in the relational algebra, which we will study in detail in the next 25 minutes or so. So this is a join. It says start with a product table and with a purchase table and join them on the PID. And call the result T1. The result of this join is going to be a new table called T1, which has some attributes. Then the plan says join this table with customer. Now, you may wonder why did we join the product with purchase first and then with customer, but that's not my point. My point is that in this formalism, we can express that order. If we want the other order, then we write a different algebra, different algebra expression. And then all my symbols are screwed up. So this here should be sigma. It's a selection, and I hope it's not going to be like this for the other slides. This is a selection, and the next is a projection. And finally, I don't know what EM is. Oh, thank you. Right, that's duplicate elimination. So this is the formalism in which we explain what operations to be executed in what order. So let's go over the relational algebra. And before I start, I need to warn you about this distinction between set semantics and back semantics. We said that the relational model is based on sets. However, tables and sequels are actually implemented as backs, and the results in SQL are backs. Backs are like multisets. They're just another term for a multiset. Here is a back in which A occurs twice. You can have a back with very many occurrences of just one single element. So when we discuss a relational algebra, we need to pay special attention to the distinction between set semantics and back semantics. So here are the relational algebras at a glance. Actually, I'm not going to introduce them like this. I'm going to introduce them as follows. There is a core of a relational algebra. And I want you to remember that there are five operations in the core. Union, difference, election, projection, and join. Then there are some operations that are kind of auxiliary that are just doing bookkeeping, like renaming intersection and complement, which are derived ones, and various flavors of joints. So think about them as being still in the set framework, but more like extensions or derived operations. And then there are strict extensions that only make sense over backs. And there are duplicate elimination, group pi, and sorting. Actually, sorting doesn't make sense over backs. It makes sense over sequences. So let's go slowly over the core of the five operations. Union, difference, election, projection, join. You know what union is, right? The union of two sets is the set that contains elements from both. What is a union for over backs? If R1 and R2 are two backs, how would you describe the union? Yes? All the elements in both sets. All the elements in both. But we need to worry about how the number of occurrences. If A occurs twice in the first set and three times in the second set, how many times should we put it in the result? Five times. It's the addition. What about the difference? Set difference. What is the set difference operation? Remember? Sorry? Elements that occur in both the sets? Oh, no. If A is ABC, and B is BCD, then what is that difference? ABC minus BCD? A. It's just A. The elements from the first set except those that occur in the second set. In the structure in SQL, the operation is called accept. Remember the Venn diagram? If this is A and this is B, then the union is everything. This is the union. The difference is this part. The green part is difference. And the blue part is A union B. Now, what happens if A and B are backs? What is difference? Suppose an element occurs three times in A and twice in B. How many times do you put it in the result? Not at all, or probably once. You want to take the difference. But what happens of the contrary if it occurs maybe three times in A and five times in B? Now you compute the difference A minus B. And then it doesn't occur in A minus B. OK, so we get the intuition of how we deal with facts. Good. So union difference, selection. No, I described here derived operators. So intersection interestingly can be computed with two differences. So think about it in terms of Venn diagrams. We never use this in databases. This is not how we compute difference. Instead of differences, it's joint. Databases are obsessed about joints. Everything can be expressed as joints. And the intersection is just the joint. And we discuss this later. So we had union difference. Next one is selection. Selection is written sigma. And there is a suffix that keeps the selection condition. For example, this says, from the table employees, return all those tuples where the salary is greater than $40,000. And the other example says, from the table employee, return only those tuples where name is equal to Smith. And you can put all sorts of conditions in the subscript where you combine these operators with and or. But we need to be careful. These are only conditions on the attributes of the current relation. We cannot have a subquery in this condition. We cannot check for the existence of something else in a different table. Just we can only look at the current and the attributes of one tuple and check something about them. Overbanks is trivial. If you have a bag, you just check every tuple in turn. And if it's satisfied, then all its copies will satisfy the condition. So here is an example. We have employee. And we want the selection with salary over $40,000. So the first one doesn't make it. Only the next two. I don't know what I did here. My zeros are all over the place. Imagine it drops the last zero here. And then it makes sense. So that's what selection does. It removes some of the tuples. Projections was a union difference selection. Projections, the last one will be joined. Projection means retain only some of the attributes. For example, an employee, if we say project on SSN and name, it means that you're getting a result of table that which has SSN and name. Here the semantics differs over sets and bags. And I will show this on the next table. So here, if you project an on name and salary, we get John with $20,000 twice. Because these are actually two John's. There is the first John and the last John. And they were different tuples. Under bag semantics, we get both of them. Under set semantics, we only get them once. Which one is easier to implement? Bag semantics or set semantics? Which one? The bag semantics is easier to implement. We just look at each tuple and you remove the attributes that you need to remove. What you need to do for set semantics? So now you need to construct the entire set. It can be huge. It can have millions of tuples. And now we need to find duplicates. How can you find duplicates? How can you remove the duplicates in a huge table of billions of tuples? What would you do? You can sort it. Yeah? That's exactly what you need to do. You sort it. You use some of these heavy duty sorting algorithms. Sort it. Once it's sorted, then the tuples that are identical will be next to each other. And then they're easy to identify and remove. Much more expensive than just doing a projection and keeping the duplicates around. OK. And the last operation is join. But I'm going to give you the join in its purest mathematical flavor, which is just Cartesian product. So what is a Cartesian product of two sets? If the first set has m elements and the second set has n elements, how many elements does a Cartesian product have? How many? m into mn. mn. And the Cartesian product has mn, mn elements. And it consists of all the pairs, from the first and from the second set. Here is an example. Maybe you have employees and you have dependents. And when you compute the Cartesian product, then you get all of them, all pairs. It's rarely used like this in practice. Also, I will show you a very interesting example when Cartesian product is used. Probably next week when we discuss query optimization. Cartesian product is used mostly as a primitive to express joints. But in database systems, the joints are actually considered primitive operation. Good. So we discussed the five main operations in the relational algebra. Union, different selection, projection, join. And instead of join, I showed you just a Cartesian product. Now some extra operations. Renaming. Renaming is more for technical reasons. Since we start with tables that have attribute names, when we combine them, at some point we will have clashes between these names. So you need to rename the attributes in order to be able to compute some operations. And the renaming operator is row. And if you just list the new names of the attributes that you would like to have in the output. Do I have an example? No, I don't have an example, except for this one. It says that you would take the employee table which had name and social security number, rename the attributes to NS. And now the table is NS. Normally you need this somehow at some level and in the representation of a query plan you need to be able to rename the attributes. You need to be able to refer to them. We would often skip the renaming operator. And assume that the names don't clash. Good. So now we start our discussion of joins. And there are many of them. And I'm going to start with the most elegant form of joins, which is called the natural join. What the natural join does is that it looks for attributes with the same name and the two relations in R1 and R2. So it is, mathematically, it can be defined as here. It takes a Cartesian product of R1 and R2. Then select only those tuples where the common attribute names are equal. And finally, remove the duplicate attribute names because once you make the equal ones equal, then you don't need two copies of them. So that is a natural join. So let me show you an example here. So R is here. And we want to do the natural join between R and S. The common attribute is B. So we need to compute all pairs of tuples where the B's agree. That's the idea. So where do they agree? Well, G and Z, G and Z, V and V. Oh, and G and Z again. So we actually have many combinations here. Here it is. Do my lines make sense? We need to have five tuples. And they are listed right here in the result. As you can see, we don't have B twice. We only list B ones. That is possible only for natural join because it's the same B. Technically, what you can get here by computing the Cartesian product. Now you get a table with attributes A, B, B, C, where you equate the two B's. And then you remove one of the B's. And you get exactly the same operation, the same thing. So that's a natural join. Now some quick and clever questions to you. If you take the natural join of a table with A, B, C and another one with A, C, E, you get a table with what attributes? A, B, C, D, and E. Exactly. That's what you get. Now if you have a table with attributes A, B, C and another one with attributes D, E, and you take the natural join, what is the schema and what is the meaning of this natural join? What is the schema of the result? What attributes do we have? Empty. Empty? It's D of all the attributes, right? Because that seems to be the rule. We include all the attributes. But what we will put in D? What is the result of this natural join? It's a Cartesian product. This is the same as our Cartesian product as because there is no condition to check. There are no common attributes or no equality condition to check after doing the Cartesian product. OK. Oh, but what about this one? So what is the result? Let me call it T. What is the schema? Which attributes do we have in the result now? A and B. A and B. What's the meaning? What will we store in T? What is the natural join in this case? Think about putting together the topics from S and T. You only care about those pairs where A, B are from S is equal to AB from T, which is the same as doing what? It's the same as intersection. This is exactly the intersection operation. So this is R intersect S. Yes? Is this for back semantic tool or just for set semantic? Ah, that's a good point. Because if it's for semantic, you may have more to help. Right. So what happens in the case of set semantics, this is what we need to keep in mind. That the natural join of two tables is the same as the intersection. And the intersection is joined. But now let's discuss a little bit about the distinction between set and back semantics. What happens in this case in the last example if there are duplicates? The same AB tuple occurs three times in S and four times in R. How many times should be included in the result? 12 times their product. What about the intersection? If it occurs three times in one and four times in the other, how many times do you include it in the result? I'm at a loss. I don't know what to say. There doesn't seem to be a nice single choice here. It kind of depends on how you define your intersection operator in your particular back algebra. And since intersection is not a primitive operation in database systems, there is no clear answer. I think you can play this game any way you want. Let me just point out that if it takes a product, it's a perfectly valid semantics. So the intersection of three copies with four copies is going to be 12 copies. Why not? And then join is the same as the intersection even under back semantics. But other choices are the minimum. If the intersection is a minimum, then, of course, the two semantics differ. So does this answer your question? So basically, the answer is that under back semantics, the intersection is less well-defined. If we define it as minimum, then the two will differ. If we define it as product, you might not buy this definition at first. But if you think a little bit about it, it makes some sense. If we define it as product, then the two are still equal. So that's a natural join. Other joins, theta join, not very frequently used, but it's one of operators that database system support. Theta is any condition between the attributes in R1 and R2. And by anything about inequality or like this or it can even be inequality like that, any condition. It's used rarely, but sometimes we use it. The most frequent type of join is called EQ join. And this is a join where the join condition is an equality between an attribute from R and an attribute from R1 and an attribute from R2. So think about the natural join as a form of an EQ join, but which only looks for equality between attributes with the same name. And this is by far the most widely used join. If we don't say which one it is, if we just write R join S, we mean some EQ join where we didn't mention, we didn't bother to say, we didn't bother to mention the equality predicate. So how many joins have we seen so far? We have seen the natural join, the theta join, the EQ join as a Cartesian product. Let me show you something completely different, which is called a semi-join. It sounds like a limping join, a join that misses half of its power. But it's a wonderful concept, was invented in the 70s. And it's extremely useful today as people start doing massive data analytics on parallel database systems. And I hope to have a discussion about this in two lectures from now on how to use semi-join reductions for parallel queries. So let's start simple with the definition. What is a semi-join? A semi-join is written like this, very hard to find on a PowerPoint. And it means it reads the blue stuff. It means you retain from R only those stoppers that have a joining tupper in S. You can write this formally. You do the join between R and S. You make sure that you find all the joining tuppers. And you project back on the attributes that are those in R. And you ignore the extra attributes that you got from S. But over back semantics, the blue definition is a correct one. It says you keep exactly the duplicate count from R. But you only keep those stoppers that will join with S. Think about this as a preparation step for the join. If you were to join R and S, then you can do the semi-join first to just discover which stoppers from R make sense to keep around for the join. Sounds very bizarre. Why would anyone? Look at this. Let me give you one application, which we see original application of semi-joins when they were introduced in the 70s, which is to join between two tables that sit on different servers. Both tables are big. It costs a lot to send one table from one server to the other. You would like to avoid sending too much data. But you need to join them. So the join operation is here, in this one here. Let's read it. So it says join employees with dependence on this key, foreign key attribute. But only care about those dependence whose age is greater than 71. So the problem that we have is that employees are on one server. It's a big table. Dependents are on another table, on another server, another big table. How can you join them? Well, it's the first site that looks hopeless. How can you join them? You need to send one table to the other in order to do the join. But let me give you some hints. Namely, one hint is that very few employees and very few dependents have age greater than 71. So clearly, you don't want to send the entire dependence table. Also, very few employees have matching dependents. But you don't know which ones. Most employees, they don't have dependents. But you don't know which ones. But there is another issue, namely, these attributes of dependent, especially stuff, they're big. Think about this as being a very wide table. OK, so I threw all this at you. Tell me a reasonable way to compute this join. Sorry? Dependent semi-join. Sure, we will use a semi-join. But let's use plain English first. So how would you do it? Book your employee table. Sorry? Outside. Clearly, we need to ship some data around it. There is no way around it. But let's try to be reasonably smart about shipping. There is no magic here, actually. But I'd like to hear suggestions from you. Scan through the dependents and select only those records which have age greater than 71. That's the perfect first idea. So scan over dependents and only select those records whose age is greater than 71. And then try to find the MSSN. That's the second idea. Don't send all their stuff, but only send their SSN. So send your SSN to employees. And now employee can see all the SSNs of dependents that have age greater than 71. Then we do what? Then do the join. Then do the join an employee, which is like selecting a subset of employees that will finally join with the participate in the final join. That's the operation I'd like to give a name. What's the name of this operation? That's a semi-join. And finally, send these employees with all their attributes back to the dependents. And now you can do the final join and group with the stuff that we missed in the first place. So it doesn't look like a big deal, but do keep even the simple example in mind. Because once we start applying semi-join reductions to more complicated queries, it will get much more confusing. So it's good to keep this example in mind. So let's write formally what we just discussed. We are going to select from dependents those with age greater than 71, and retain just their social security number, the foreign key. This is a table T. Then we send T over the network to the other side, to the other server. And here we join with employee, which is the same as a semi-join. With a semi-join with all the dependents. Then we send the answer back, the semi-join, we send it back. And finally, our answer is what comes from the employee table, join with dependents, something we can do locally here. So that is the essence of using semi-joins in this particular case to reduce the amount of communication. In other cases, it can be for all sorts of other optimizations to avoid computing some aggregation groups that you don't need. That's the essence. Good. Joints everywhere. Joints all the time. Joints are us. That's our mantra. In databases, joints are the key operations that we need to worry about. And I would like you to tell me why. Why are we obsessed about joints? Think about other programming languages that do even more complicated stuff. When you write a program in Java, what programming paradigms do you use? You use iteration. That's how you get your horsepower. You have nested for loops. Maybe use recursion. Never heard of joints in Java. Why do we care about joints in databases? What is the root cause for why we are obsessed with taking these many, many tables and joining them? We imagine taking all these 10 tables and joining them all the time. Normalization, I have a double story. Exactly, because of normalization. Because we had in mind a complicated data structure, but then we shredded into these small tables because we wanted it to be normalized to avoid all the redundancies or the update anomalies or the update anomalies. So now that our data is chopped into these small tables, and every time we do something to it, we need to join. This is why we are obsessed with joints. Good. So we discussed the five main operations in the relational algebra, the few extras. And now there are three simple operations that only make sense on bags. Duplicate elimination, denoted delta. Just takes a bag and removes duplicates. Group pi, denoted gamma, and it has two indices. One is a set of attributes, which is a group pi attribute, or group pi set of attributes. And the second is the aggregate, which is like the aggregate from here. And sometimes we use sorting. But actually, I'm not going to use sorting in my examples. It's rarely used as part of the physical operator. But when it's used, it's denoted tau. Good. So these are the building blocks. Each operator, as we will discuss in, I hope, starting today, each operator can be implemented in several ways, in several clever ways. And now our goal is to combine these operators to evaluate a query. And the result is called a query plan. It's called a complex relational algebra expression. But the more popular term is set of a query plan, where we simply combine these operators in a big tree. It's also called a query tree. And now this is like a program. This tells you in which order to do what operations. In this example, you read it from the bottom. It says, start with persons. And do a selection. Then what do we do? Then we project on social security number. And then we do what? Then we join with purchase. This is our join here. And this is, look at the notation that we use in order to disambiguate. The table from the left has seller, a seller social security number. The table from the right has SSN. And I just give them tupper names here to be able to refer to them later. It's very intuitive. And I don't want to become too formal. It doesn't make sense. It's pretty clear what they do. From the right comes a selection from product. And then just RPIDs. We do the join. And finally, we join with persons. And at the very end, what does this do? It's a group I on the name, which name? The name coming from where? Because the name coming from product, you can see the U. And it returns account. That's a plan. And it's pretty clear in which sequence to execute this plan. So two things. In query execution, we discuss how each individual node, how each individual operator is executed. In query optimization, we discuss how to transform this plan into one that might be more efficient. And how to do and go about this. So the query plans and standard databases are always trees. But now more novel query languages like, for example, big Latin, which we are going to use in the last homework, they allow these plans to be DAGs. Essentially, you are describing the plans in the query itself. So there is no optimizer there. You have to write this by hand. And you describe a DAG. It turns out there are other contexts in which people have looked at extending trees to DAGs. But database systems, they don't do this. Because there is no good place for common sub-expressions in SQL. They are very rare. Probably, if you use a view twice, this is a natural place to get a DAG. But it's very rare. OK, I have one slide to remind you that the language that we look at, which is the relational algebra or SQL, it's a language that allows us to manipulate sets, actually tables, sets or backs, depending on the semantics. And it is a restricted language. This is not a Turing-complete language. In particular, you cannot do such a simple query like transitive closure. Remember the vacant number? Suppose a question where find for every actor find this vacant number. Easy to write in Java, if you had a data structure with all the movies and actors, and it's easy to write in Java. But it's completely inexpressible in the relational algebra. There is a theorem that says you can't express that query in the relational algebra. So a relational algebra cannot express transitive closure. There are many other things you can't express. You can't express parity, you can't express NP-complete problems. It has a very limited expressive power. And that's for a good reason, because we want all the queries to be computable efficiently. But we need to keep in mind that there are limitations to the expressive power. So that was a relational algebra. Any questions about these operators and the algebra? Then let's move to the most confusing part, where I need to expose you to various aspects of the query processor without getting too deep into any of them before we start discussing operators individually. So I'm going to start by showing you very high level architecture overview of a query processor. You see variations of this picture in various textbooks. This is what needs to happen to a SQL query from the time it is issued by the user or by the application until the time it is executed. First, we need to parse it. Parse it. How many people know how to write a parser? Very few people. It's a dying trade, which is too bad. People need to know how to write parsers. And actually, some languages, so even in Java, you have Java CC, a compiler, where you can write grammars. You know, camera is extremely easy to write a parser once you understand what a parser needs to do. I suffer in my research because I often want to do research with some students, sometimes with undergraduates, and the first thing I need to do is to ask them to parse a query. That's what I need, parse a query. And they are at a loss. And I say, why? It's so simple. You just write the grammar, and you send it to Java CC, and it bingles. It gets parsed. But these are concepts that need to be taught. And if you ever have time, grab a book on compilers and read it and educate yourself on writing a parser. OK, so we are not going to discuss query parsing. But I will briefly remind you of query rewriting, which you already know. That's something we discussed many times. And then what happens next is that the optimized query processor needs to choose a logical plan, which is another name for a query plan or a query tree. That's what we already saw. The next step is that it needs to decide for each operator in this plan which algorithms to use for it. For a join, there are many choices. There can be a hash join, a merge join, a nested group join, which one is best for each of the joints in the query plan. That result is called the physical plan. And once those decisions are in place, this is when the query can be evaluated. And now the actual query processing starts. OK, so I'm going to illustrate these steps at a very high level. And I'm going to use this example. So let's spend a couple of minutes looking at it. So we have, yes, supplier and part are two entities. And supply is a many, many relationship between supplier number and part number. What should I do? Should I write this better? So we have supplier and we have part. And then there is this many, many relationship called supply. It's not a big deal. And the example here is actually very simple. It only goes to supplier. And it creates a view of what suppliers that are in Seattle State of Washington. So now comes the actual query. The query, of course, uses a view. And it only returns those suppliers a supply a part that is in the set of supplies. That's supply part number two. I don't know why it's so complicated. This is supplier. Our supply is, right, that's a part number. So again, SNO is a supplier number. And it returns only those nearby suppliers that supply part number two. And for that condition, it needs a nested subquery. Find all suppliers that supply part number two. Good. So what happens to this query? Well, we know what happens. It first needs to be accepted by the system. So there is some authentication going on. The system might have a limited number of queries that it accepted at any one time. And then if it might decide to put the query in a waiting queue until it has more resources to evaluate it, once the admission control is done, then the query is parsed. And then query rewriting happens. I'm not going to discuss parsing. But the result of parsing is some kind of internal representation of the query. And once the internal presentation is in place, the system can now rewrite the query and do flattening and view inlining. So let's see this. Let's just remind ourselves. This was the original query. But this is a view. So after we rewrite it, we get something like this. And the nested query becomes now a join. That's query rewriting. Good. The next step. The next step is query optimization, which means find a logical plan and then improve that logical plan. The logical plan is something like what we saw. It's a plan consisting of the logical operators in the relational algebra. The physical plan adds information about how those operators are being implemented. So the logical plan chooses from these operations, that we know by now, and looks something like this. Well, this should be a projection here. So it says, please join, then do a selection, and then a projection. So this is exactly our query. We join the suppliers with supplies. We select one of those in Washington, Seattle, and those that supply part number two. And then we retain just the supplier name. That was our query. Now these logical query plans are constructed once per query block. What is a query block? A query block is essentially a single select from where group by query. No subqueries. If you have subqueries, then we need to worry about them. They need to be unnested. But sometimes we can't unnest them because they are correlated. So I'm going to show you an example. Let me skip over query blocks. So I think you get intuition. One query block gets into a fragment of this plan. I'm going to skip this. The interesting question is, how do we deal with subqueries? So look at this query here. It selects all suppliers that are in Washington and for which they don't supply any product over $100. So this returns all suppliers in Washington that supply only products under $100. How can we write this as a query plan? Obviously, we don't have the right operations. How can we write this as a query plan? So what is your proposal? How would you write this as a query plan? There is no one-to-one mapping. We need to do a more radical transformation here. But I think we should be able to suggest a plan. And somebody suggests a plan without actually looking at the slides, of course. It's not that hard. How would you compute this in the relational algebra? Join suppliers and suppliers on assembly. So start with, now I realize how stupid the names are. So supply here and supply. So join them. On SNO. Good? Where state is Washington and price less than or equal to $100. So state is Washington. And price is what? Less than or equal to $100. Ah, you have a plan here in mind. So less than or equal to $100. So then project on SNO. SNO. So this returns something interesting. What does this return? All suppliers that have what property? That supply at least one product. That's the wrong one. That's under $100. That is, yeah, that is. But you know, you don't want this. No. This returns all those suppliers that supply at least one product that's too expensive, OK? These are exactly the suppliers we don't want. So now what do we do? Which operation do we use? Sorry? Minus. Exactly. So now we use minus between this and what do I write on the right? Set of all suppliers. The set of, can I write supplier here? It's a good intention, but it's not right. It's actually a good point. We also need to select only those that are in the state of Washington, otherwise we get too many. That's correct. But even if I do that, let me be careful what I raise. So suppose I do this, state is equal to Washington from supplier. Can I simply subtract these two sub-queries? Sorry? Supplies? No. If it's supply, it's supply. No, it doesn't. If its supply is nothing, then it still satisfies the query on the left. Why can't I subtract them? Why can't I draw this line up all the way to here? What goes wrong? Formally speaking, we need pro-projection to solve. Exactly. What comes from the left has all the attributes in supplier. What comes from the right has only SNO. So what should I do? Should I remove the projection from the right? Or should I add a projection to the left? We need to add a projection operated to the left because, well, on the right actually, we have a different schema. We also have the supply information. But now we have a line there, two schemas. And now we can subtract them. That's the query. So let's go quickly over the same steps. The first observation here is that the queries are correlated. Correlated subqueries means that the inner query refers to a variable that was bound by the outer query. And that's the variable, this Q. That's the correlation. Very bad for query optimizers. The reason is because you cannot evaluate the inner subquery separately. You need to, the inner subquery only makes sense at each iteration of the outer subquery, something we cannot express in the relational algebra. So what database systems do is a process called decorrelation, which is something for which you should already have a pretty good intuition. You can rewrite the same query without using the correlated variable. Like here, instead of using the correlated variable, we use not in. Once we decorrelated it, now we can write it as a difference. And once we write it as a difference, we can finally write the algebra expression, which turns out to be actually simpler because we didn't need to join on the right subquery. Good. So these are the steps that the database optimizer needs to do to get from a SQL query to some plan, to one plan, which might not be the best plan. Then he needs to worry about transforming that plan, which was an optimal one. And that's something we will discuss next time. But once it has decided on a logical plan, which is going to freeze, that's the plan that's going to execute, now it needs to annotate every operand. Now it needs to choose a physical plan. And that means essentially three choices, not just one. The one I mentioned often is this. It needs to choose one particular implementation for every operator. But there are more choices. For the leaf, note it needs to choose the access path, which means is it going to use an index or a sequential scan of the table. And then to make scheduling decisions, which is something I'm going to discuss in two slides. So let's start with the physical operators. Here is how we should think about this. For example, now this would be the access path. It says for supplier, for suppliers, do a file scan. The alternative could be maybe you do an index lookup based on SNL. For suppliers, also do a file scan. For this join, we choose a nested loop, which is exactly what it says, is a nested loop. And we are going to discuss it briefly, but there is no intelligence to this operation. For this election, do it on the fly. And here there are actually no choices. That's the only thing you can do. You will evaluate this on the fly. And similarly for the projection, and then there is also a duplicate elimination. And there you need to choose an operator. So that's how the physical plan looks like. Now let's discuss the other thing, which is the scheduling decision. And here the idea is the following. Imagine that you have this tree of operators. My question to you is, how are you going to evaluate it? It seems to be bottom up, right? You can't start from the root. You can't evaluate the last operation first. You need to do bottom up. But even then, there are some interesting variations. One is, for example, to use one thread per operator. So imagine creating a separate thread for every node in your query plan. So what does this thread do? It will read topics from its children. It will do the right thing to these topics. And whenever it has to produce some topics, it will send them to the parent node. But this creates a synchronization nightmare, because threads, they will have to wait until they get topics from the children. So how do you synchronize them? You need lots of semaphores or logs, and it's a synchronization nightmare. But apparently, this is how early database systems were implemented. It took quite a while until the 80s for people to came up with this very simple idea. It's called the iterator interface, which I'm going to describe next. What they said is that for every node is like an object that implements three methods. And here they are. There is an open method. This is when it has to create its internal data structure and be ready. There is getNext. So getNext is when it has to produce a single tupper, and when it's done, it has to be written. Then there is a close method when it cleans up its data structures. So now, if you think about a tree like this one, here is how it goes. So this wants to do a nested loop. It means that it iterates over suppliers, and for each supplier, it's going to ask for getNext from suppliers. And it's going to kind of buffer these results, and when it's going to be asked for getNext, that is going to return one tupper from this result. And similarly, this operator will be implemented in the same way. I will show you this in more details hopefully still today, because we are very close to that. Now, if you do it this way, then what happens is that the tuppers end up being pipelined. This is one option to implement them. So tuppers now will simply flow as fast as possible from the leaves all the way to the top. This is called the pipelined synchronization. There is no good term for this. This is a pipelined execution of the operators. The pipeline is always great. The tuppers, they don't have to wait. The alternative is called materialization or blocking. It's also called blocking. And here, the idea is that the operator is at block. And you can actually mix and match. Some operators will pipeline, others will block. Operators at block, they will simply store all the tuppers until they produce the entire intermediate table. And only then will they return the answer to the parent. And they are called blocking because nothing will happen downstream. Nothing will happen to the parent and the parent until this operator has finished processing all the tuppers. So blocking is never a good idea, but it is a necessary evil. Something you need to do in two cases. Either the semantics of the operator is such that you can't, you need to block. For example, set difference. Set difference, you need to block the left operand until you've seen the entire right operand. You can't pipeline any of the tuppers. Other reasons for which we need to do blocking is if we don't have enough main memory. If the query plan is too complex and the intermediate results are too big, then we can't pipeline all the tuppers. So this will become clearer. I have more slides that I'm going to show you next time about the difference. But do keep in mind this dual usage. We can block or we can do pipeline. This is the end of my confusing part. I had to expose it to all these concepts before we start discussing in detail the physical operators. And that's my plan for the next 10 or 15 minutes. And then we'll end this lecture. I'm going to start the discussion of the physical operators and only discuss the main memory operators, which shouldn't be so difficult. And I'm going to start with a simple question to you. Suppose we need to do just this. Do the join between supply and part. So supply and part, they join on part number. And now both supply and part are small enough to touch the affected main memory. Think of them as being two arrays in Java. They are pretty big, but they fit in main memory. And my question is, how would you implement this join? And I want to discuss three variants. And they don't have to be super clever. I just want to see three different ways to implement this join, this logical join on part number, but different ways. So what should I write on that number one? Sort, sort them. So this is called actually sort join. It's actually called merge join. But how exactly does it work? OK, so we sort both supply. We start with supply. And we sort it such that PNO is a common increasing order. And then we do what? In this case, because PNO is going to be a key on part, right? Yes, PNO is a key on part. Let me double underline. So then you can go through supply with the current PNO. Do you assume that you also sort part? Yeah, part is also sorted on PNO. Good. So now we have a supply and part. And we go in lock steps, right? And if they match, if they match, so here is my supply and here is my part. If they match, what do I do? I return it. And do I advance in both? You advance the smaller one. We advance only in supply. This is where we take advantage of the fact that it's a foreign key. There might be another supply that refers to the same part, but there is no other part to which this supply refers to. Good. This is actually called merge join. What is the running time? If supply has size m and part has size n, what is the running time of merge join? The asymptotic running time. What is the asymptotic running time to sort? Log m plus m log m. m log m plus n log n. So now we have them sorted. So now what do we need to do? We need to do the actual work, right? And how expensive is that? Is that plus n? m plus n, because we essentially traverse each list once. We do this in lock steps, but OK. So that's the running time. Good. Second algorithm. Yes? Hash the smaller one, hash table. Excellent. This is called hash join. So we take the smaller one. Which one is the smaller one? Which one should we hash? Let's take supply, because it's the first one. So what exactly do we do? We create a hash table where we send. This is like our hash table. We send each supply, right? Each supply, we hash it based on its PNO. So now it sits in this hash table. Then we go over part. Do we have to sort part? No. We go over part, and for every part, what do we do? We hash. We hash on PNO of that part. And we see in the hash table, we have a corresponding supply. Probably there is one, because there is a foreign key. But even if there were none, we could still detect that, and we simply ignore that part. That's a hash join. What is the running time? Assuming a wonderful hash function and hash table. Well, we need to build. Actually, that's a standard name. We need to build the hash table. That costs how much? Of M, because we need to iterate over supply. And then we need to probe. We need to iterate over part and probe. And how much does that cost? Plus N. The cost is N. And let me offer the last one, because nobody wants to utter the word nested loops. This seems much more dumb. But it's an option. We iterate over supplies. For each supply, we do a full loop over supplier, over part. And we find all the matching pairs. And the running time, of course, is O of MN. So the choice between these three algorithms, this is a physical operator. That's what we need to choose. So here are the names. So let me discuss each of them in a little bit of detail. Actually, not too much detail, because we already discussed them. But let me show you something quite surprising. Here is a nested loop join. Look how naive it looks. We just iterate over supply. And for each of them, we iterate over part. And we find the matching pairs. And by the way, this relation is called outer relation. And part is called the inner relation. And sometimes the terminology is opposite. So just in case you get confused, some of you might suspect that people use the opposite terminology. But the problem is, we don't write the join this way. Remember the iterator model. We need to implement the join by implementing three methods, three functions. Open, get next, and close. What should they be? What should open be, in this case? What should open do? It says, get ready to return the first stopper. What do you need to do to return the first stopper? You read supply. The first record from supply. So now we have SMA memory. And maybe a postponed part. I don't know. Yeah, let's postpone part. And you say, I'm ready. Now get next. What does get next do? This nested loop turn completely upside down. You won't recognize it. Well, get next needs to read from part and find a match. If there is no match, it's going to read the next element from part until it finds a match. When it finds a match, it can return it. What happens if get next is looking for its match and it exhausts part? It's at the end of part. There was no match. What does it do? Next element, supply, and the first element, call. Exactly. Then it reads the next element from supply and resets part and continues. So that's how we need to implement it. And here is a code. You can see with the code, the nested loop join that looks so easy. Look how daunting it looks on my slide here. So this is exactly what we discussed. Open just opens both the left and the right and then gets the supply from the left. Get next. It reads the next part until either it's null or until it finds a matching part. But if it's null, if it's at the end of the file, then it needs to close it. It needs to get the next S. It needs to reopen part and to get the next part. And it does this until it finds a match and then it returns a matching pair. So that is the essence of the iterator model. The beauty of the iterator model is that you have to imagine these things interoperating in a big plan. And you start from the root. And you ask the root, please give me the next record. What the root does is, depending on the particular algorithm like this one, it will decide to call getNext on some of its children. They, in turn, call getNext on other children until some records are being fetched from the database. And then everything gets propagated up to three and you get your result. By just implementing these two simple methods for each operator. OK. The hash-based join, we don't need to review hash tables. We just discuss them. But you have them here on the slide just. So the hash, the main memory hash algorithm, we discussed this, consists of two parts. There is a built part, built phase. And there is a probe phase. In the built phase, all the topics from supply are inserted in the hash table. And in the probe phase, every part is probed. And if a match is found, then it is returned. And we have to keep in mind that this too has to be written in terms of the three methods, open, getNext, and close. Of these two relations of S and P, actually supply and parts, which one can be pipelined and which one is blocking? One is pipelined and the other is blocking. Which one is pipelined and which one is blocking? So pipelined means that whenever you read a topic from that relation, you can immediately send it back up to the parent. That's pipelined. Blocking means, no, you have to read everything from that table. You can't return anything to the parent until you have read the entire table. Supply is blocking. That is a blocking table. You must read the entire table in memory. You can't return anything before you have finished reading it. Part is pipelined. It's asymmetric. This is the joint is asymmetric because of this property of the hash joint. The table from the left is pipelining and the table from the right is blocking. There is a clever implementation. It's actually very simple. Think about it. It's called double pipelining, where both tables are pipelining. Try to think how to implement this. And we'll discuss it next time. So challenge finds a double pipelined hash joint. People call it double pipelined because both tables can pipeline. Try to think how you do it. And the last one is a merge join. Exactly what your colleague proposed. We sort both supply and part on a product number. And then we do the merge, which is here. And we already discussed this. In the case of equality, we only increment. We only go to the next supply. We do not advance in part. Two more quick operators to discuss. Group i. How do we do group i in memory? Using a hash table. We have products. And we want to group by department. And to sum the quantities for each department. How can we use a hash table for this? We take the first record, which is a name, department, and quantity. And now we look at the hash table, which is initially it's empty. But in general, we look up the hash table based on what? What is the key for the hash table? It's a department because this corresponds to all groups. So we apply the hash function to department. It sends us somewhere in the hash table. And from there, we either get nothing. We get that it doesn't exist. And then we insert this record in the hash table. Or we get the current sum. If we get something, we simply add to that sum. We add our current quantity. And we store it back in the hash table. OK. And the last thing is delta, which I'm not going to show you, which is a duplicate elimination. Actually, I'm going to show you this because there is a question here. And I'm not going to discuss duplicate elimination. Why not? Because it is the same as group i. Duplicate elimination is like group i without any aggregation. The groups means, essentially, you do duplicate elimination on the group i attributes. And in the case of a group i, you also need to complete some aggregate for each group. But if you don't have any aggregates, this is exactly duplicate elimination. And this is also done with hash tables or with sorting. Good. Selection projections, they don't have anything interesting to discuss. And this is exactly the place where I wanted to get. And that's a good place to stop. I know you're tired and we kind of used all our time today. But the topics, I don't think it was so hard. Next time, we're going to get a little bit more hairy as we will discuss external memory algorithms. They are not difficult algorithms, but they are hairy. I think a good term to describe them. And then we will also start discussing optimization, which again is a confusing part because the real architecture of the current database systems is not discussed well in any textbook. I'm going to discuss only parts of that architecture. And you will have to use your imagination to fill in the details. So enjoy the x-query homework. Tell me questions if you have any questions. And otherwise, I'll see you next week and discuss query optimization and query execution.