 Now we move on to query processing, so however query is processed, this is a typical path for a query, first it is parsed and translated into typically relational algebra or some variant of relational algebra, somebody was asking me you know why should students learn about relational algebra, because that is the underpinning of any relational database system, you write your query in SQL but it is translated into relational algebra or some variant of relational algebra and then there is an optimizer, now what is the role of the optimizer, usually there are many different ways of executing a given SQL query and the cost of different ways of executing it can vary widely, a stupid plan can take days or months or years whereas a good plan might take seconds or minutes, the difference can be mind-boggling, now in the old days before relational databases humans had to decide exactly how to execute a query in the old database system, that is a lot of work for the programmer and humans may not actually fully understand what is going on in the database and may choose a bad plan, one of the reasons relational databases took off is that they built an optimizer, which decides what is the best way of evaluating a given query, you specify the query declaratively, you do not give details of how to evaluate it, the system figures out the best way of evaluating it, so that is the job of the optimizer, in the main workshop I will talk a bit about optimization, in today's lecture we do not have time for this, I am going to skip optimization totally, but once you have done optimization you have an execution plan and then you actually evaluate it, this chapter, chapter 12 talks of this part, what are execution plans and how the operations are actually evaluated, chapter 13 talks of the optimizer itself, so like I said I am skipping chapter 13 here, I should also mention that all these chapters, chapters 10, 11, 12, 13, they have a lot of details, it is physically impossible for any one database course to start from the basics and cover all of this, so you should do one of two things, the way we do it here, we cover all the basics and then the last about one month of the course, we give a quick overview of the internals, we cover data storage, we cover B plus three indices, we cover a little bit about query processing, we briefly sketch query optimization, we briefly sketch what are transactions, we briefly sketch concurrency control, we very briefly sketch recovery, so students have an idea of what all these things are, they do not understand it in detail, but when they go out in the world, if they have not done the second course, that is okay, they at least have an idea, they can learn the rest on their own, but if you have a two course sequence in your university then you can split it between the two courses, okay, what happened here is we started with the two courses, the first was a core, the second was an elective, internals was completely in the elective, the core course did not even touch internals, or what we realize after some time is students don't take the elective, they only do the core course, and then they go out in the world not knowing anything about what is going on inside of a database, we realize that's a bad idea, so if the second course is an elective, do make sure that the first course covers the basics, I think a lot of universities have adopted this from what I see of our incoming M.Tech students, I think most, but certainly not all universities have done this, so if you don't follow this, please make sure that is done, if you can influence your university, okay, so now what is an evaluation plan? Okay, so first of all, let's take a query, which is find the salaries of all instructors who earn less than 75,000, it's a very trivial query, I just want to find the salaries, which are less than 75,000, so here are two ways of doing it, I can first project instructor on salary, meaning remove all the other fields, and then select sigma, salary less than 75,000, so first throw away the other fields and then find things which are less than 75,000, or I could do it the other way, first I throw away things which are greater than 75,000, keep only those which are less than 75,000, from those records I remove the other fields, okay, this is a toy example, a real implementation will do both of these together, it doesn't really matter, but assuming they are done separately, you have two alternatives, so an evaluation plan not only chooses amongst these alternatives, it also chooses how exactly to do projection, how exactly to do selection, so in the relational algebra of projection, duplicates are eliminated, how do you eliminate duplicates? We'll see that, you can use sorting or hashing, we'll see that, but that choice has to be made, the evaluation plan specifies the choice, it says do projection with sorting on these attributes to do duplicate elimination, that's part of a plan, so that's an actual operation, not a relational algebra operation, but a physical operation, so now we're not covering query optimization, but just to tell you what it is, the goal is among all equivalent evaluation plans, which generate the same result, pick one, but how do you pick one? You need a way to estimate how much time this will take, what is the cost of this plan, so all of this is based on a way of estimating costs of various alternatives, now how do you do cost estimation, that's covered in detail in chapter 13, which I'm not going to cover, but that is the underpinning for algorithms to choose between different plans, and cost estimation is done based on statistical information, which is stored in the database, like how many tuples there are, how big are the tuples, how many distinct departments are there in the instructor relation, and so on and so forth, and how do you estimate the cost, how do you define the cost first of all, one way to define the cost is how long will this query run, then so it is very difficult to exactly estimate how long a query will run, but there are some good proxies for the cost, so one is the amount of CPU time the plan takes, which can be estimated, the number of disk IOs that the plan requires, which is actually hard to estimate, because it depends on how much memory you have, but you can estimate, if you had zero memory or very little memory, how many disk IOs with the plan required, and use that as a proxy for the cost, it is not actually what will happen, but it is an estimate, and then network communication, so somehow you put all of these together, put some weights to this, and come up with a single number that is the cost of, estimated cost of the plan, so I am going to skip those details, I will just mention one thing, that the cost depends on two parts, the number of blocks transferred, and the number of seeks, so it turns out that today, disks are very fast at transferring data, they probably become a factor of 20 faster in transferring data than disk 15 years ago, or 10 years ago, they were, today you can get a disk transfer of 200 megabytes per second even up to that, whereas a while ago 10 megabytes per second was very high, so that has changed, but the seek time for hard disk has not changed much at all in the last 30 years, improved by a factor of 2 maybe, however today something else is happening, seek times have dropped drastically, why, not for hard disk, but why have seek times come down, no not indexing, for hard disk seek time is still very expensive, but people are using something else, some other technology with very low seek time, flash memory, called solid state disk, flash disk, whatever you call it, all of us use those pendrives all over, right, that's a flash memory, and on flash, unlike a hard disk, getting to a particular piece of data and retrieving it is very fast, now why don't we just throw away hard disk and plug pendrives into our computer, it turns out those pendrives are actually very slow at data transfer, they are not optimized for data transfer, however they are much slower than hard disk for speed of data transfer, so now there is solid state hard disk which are comparable to regular hard disk, magnetic hard disk in terms of the data transfer rates with a much slower seek time, in fact now the data transfer is going ahead of magnetic disk, and they have become cheap, they were there earlier but they are very expensive, today it's very cheap, in fact those of you who are using OSL, we bought those machines recently, so we had a choice of solid state disk of 64 GB, versus hard disk minimum was 500 GB or something, it turned out that solid state disk was cheaper than hard disk, of course hard disk is 500, this is 64, but we don't need 500 GB on those machines, nobody is storing data of large amounts on that machine, the data is coming off a NFS server, so we actually decided to buy the solid state disk and it's much faster, you know hard disk when you run out of memory there's a lot of thrashing, it's actually cheaper to buy solid state disk of somewhat lower capacity today than to buy a hard disk, and this is having a very significant impact on database systems, what is this impact? You can get, you know don't do anything, just change hard disk by solid state disk and your performance jumps sharply for many many applications, so today if you want to you know build a database system, you will probably go with flash, that's what has happened today in the real life, so in some sense the all these cost estimations based on very high seek times, they are becoming irrelevant for a class of database applications, in fact something funnier has happened, not only people moved to flash this, now for many applications people are saying everything will fit in memory, memory sizes have exploded, now it depends on the kind of data you have, you have big data it is not going to fit on in memory it's not going to fit on flash, forget it, are this still rule for big data, what about the rest of data which is not big data, let's say all academic and accounting records of IIT, no pictures, no scans or anything just the relational data, for IIT for the last 20 years how big is it, make a guess, IIT is a reasonably sized academic institution and all our data will comfortably fit today I think in about 6, 8 or 16 gigabytes, that's all, all of our accounting, academic grades, courses everything put together is something like say 16 gigabytes, it fits in memory forget flash it fits in memory, so for this kind of data the cost estimates here based on this 6 is probably a wrong way of doing things, so anyway these models evolve and Optimize PostgreSQL for example can will let you tune this relative cost, so if you are running on flash you can tell PostgreSQL this 6 are extremely cheap and it will choose a plan accordingly, so the database can be used as is more or less little bit of speaking will help it choose a good plan, every query will be associated with some cost when you execute the query, for some research purpose if I want to estimate the cost that is I don't want to execute the query, I want to estimate it first and based on cost then I'll decide whether I want to execute it or not, I didn't found any tool which gives me cost before executing query oracle, MySQL they give cost but after query is executed, in fact in the labs tomorrow you will be doing this, so all databases have a facility to explain a plan and that facility will actually show you the estimated cost without executing the query, they all use it internally for query optimization and once they have chosen a plan you can find it without actually executing it, so we will be doing that in the lab, so let me take questions little bit later, let me focus on the operators that we will be using, so first of all let's look at the select operation relation algebra, how do you implement it, I want to find all records with salary less than 75,000, the simplest way is called a file scan linear search through the file, so don't read the content of the slide, I'm not going to go into details, just read the stuff in blue or bold up here, the next one is an index scan, so if I want to find the record belonging to roll number, some roll number, I will use an index we just saw that's what indices are for, so the same query select roll number equal to 52 on student instead of scanning a much better alternative is to use an index, but sometimes it turns out that depending on what the condition is, whether it's a whether duplicates are present, whether the duplicates are clustered together in a relation or they are scattered in the relation, depending on all this, it may still be cheaper even though there is an index to go back to linear scan, so there are all these cost estimates, these are all cost estimates, forget the details, I'm going to skip it now, but the point is there are cost estimate depends on whether the index is primary, that is it's clustered on the key you're looking for, the relation is sorted on the key, not the index, the relation stored sorted on the key or is it not, if it is sorted then these are the cost estimates, I won't go into details and if it is secondary that the relation is not sorted on the key then the cost estimates will vary, so any optimizer will use these kind of cost estimates to choose whether to do a use a index or do a file scan and in fact PostgreSQL has a hybrid thing, I think this is a good point to mention this bitmap index scan, so when you have a secondary index, so let this conceptually represent the B tree, not showing the internal structures these are the leaves, so now I have a range query let's say which accesses all keys in this range and I want to fetch the record, well supposing this is a secondary index, the actual data is somewhere else, it's on disk divided into blocks, now let's say that this range looks like this one goes here, the next one goes there, the third one goes here, the fourth one goes there, the fifth one comes here, sixth goes there and so forth, it's scattered all over, let's say that in this range there are some 1000 records, which are scattered all over here, if I and let's say this is fairly big, maybe not 1000 with today's memory size is this 1000 maybe kind of okay, some large number, actually even 1000 let it be 1000, so now what will happen is I'm going to do 1000 random accesses on this file to fetch the records that I want okay, that can be pretty expensive, the alternative is to just do a file scan, read the whole file from scratch, forget the index pretend it doesn't exist, read all the records in the file and find those with the key in this range that we are looking for, so those are the two alternatives as per what we have seen, so depending on the number of records here it may make sense to use the index or to do the file scan, what if I am not sure how many records will be here, I have a key range, I don't know exactly how many records will be fetched, I'm not sure what to do, the optimizer cannot make a good choice, it will at some point it will toss a coin and choose between the two, not literally but it just picks one with incomplete knowledge of how many keys there are, so what Postgres has is a hybrid thing which gets the best of both worlds, so what it does is first it will find all these things and it finds these pointers but it does not fetch the records, what does it do instead it has a bitmap, one bit per page, data page, so the pages in the file containing the relation are numbered logically 1, 2, 3 so on, now this has one bit per page, now what it does is it first goes through all the leaf entries in this B plus tree, it finds the pointer and if a pointer is to the third page it goes to the third bit and sets it to one, initially all the bits are zero and so it takes one after another and sets the corresponding bits, so the remaining bits after that there will be some zeros, some ones, zeros and so forth, what have we achieved at the end of this, we have not touched the data, we have just scanned the leaf of the B plus tree and set bits corresponding to whether that page is mentioned here, if it's mentioned set the bit to one, all pages which are not meant pointed to remain at zero, so what is the point of this operation, now it goes back to the, this actual data, it now scans it potentially linearly but it can skip, so it says oh the first two pages are zero, I'll skip them, the third page is one, I will only fetch the third page, the next there are fourth pages also one, I'll fetch it, maybe the fifth, sixth and so on are zero, I'll skip them, then after that the hundredth page is one, I'll fetch that, so what has it done, it has not done random IO on this file, it has gone sequentially through the file but it has skipped pages which are not pointed to by anything over here, it has only fetched pages which are useful, so this has two steps, first it scans the index, it builds a bitmap, scan index, build bitmap, after that so this is one, two, the third step is scan relation using bitmap to skip pages which are not needed and once a page is fetched you will go through the records in that page to see which one satisfies the condition, so it's a hybrid algorithm, so this algorithm is called a bitmap index scan, it's not amongst those we have listed in the book, maybe in the next edition we may mention this hybrid algorithm but this is implemented in PostgreSQL and when we look at query plans in PostgreSQL many times you will see this, so what is the benefit here, supposing there are only five pointers here, you will only fetch five pages because only those bits were one, that's cheap, if there were a thousand records here which satisfied in this range, you will fetch a thousand pages but not randomly, you are fetching them sequentially which is much cheaper, so that's a hybrid algorithm which is in addition to whatever we have in the book, if the index is, so the question is will this problem, will this be used only if the index is not plus, yes that's correct, if the index is clustered then there is no need for the bitmap index scan, if the lookup is on a key, so there's only one record there is no need for a bitmap index scan, if PostgreSQL is pretty sure that you will only fetch a few records then also it won't do this but if it thinks you may fetch somewhat more, it doesn't know how many but somewhat more than two or three records then it will do this. Sir in special data we use KD trees for indexing, three types of trees to use, KD trees, core trees and R trees, in KD trees what, how do you actually do this? Can I bump that question off, even in the main course I won't really have time to get into all this. There is less description about it in the tech book also. So there is a lot of material available on the web, so if you're interested in that you can look it up but to give you a one-minute answer, a KD tree is like you first divide up the space in one dimension and then the next level you do it on the second dimension, the third level on the third dimension, so at any level there's a linear ordering based on that dimension alone, so you know if you don't know about it this will make no sense, so I don't want to get into detail, so let me not take that question now. Okay. You can take it offline. Can we sir? So yeah one last question. Sir this is regarding that multi-processing, so when you're multi-processing. Multi-processing. When you're talking about that what we can say is that performance evaluation and all. Yeah. If we are increasing the more than one project for example working with single processor based system as well as working with multi-processor based system. Yeah. Will it improve the performance as well as. Okay. So this is actually a question which leads to another aspect of query processing which is parallel processing. So there is a whole chapter on parallel processing which unfortunately we, this is a basic code so we don't have time for it but yes you can make use of, so even if you don't do parallel processing if you have an online transaction processing system. As it is all these database systems can handle multiple processes even if it is single core it supports multiple processes. Now if you have multi-core the processes can run in parallel. So you get parallelism for online transaction processing but if you run one query it will typically run on only one process. It will not benefit from multi-processing in the traditional query process. So if you want one query to actually execute in parallel across multiple cores then you need parallel processing and how to do that is described later. And in some sense when we are doing Hadoop, MapReduce in Hadoop we are doing parallel processing but that is for a specific task. Now if you give an SQL query and on a system which supports parallel processing the database will decide how to parallelize the whole thing and do it for you. You don't have to worry about it. You just specify what you want declaratively and the database takes care. Yeah the whole point of parallel processing is to reduce the query execution. Time for individual queries. Okay there is some other stuff again I am going to skip it. Let me jump ahead to sorting. I think all of you probably know about external sorting. So I am going to skip this. External sort merge, break it up into runs and then merge the runs. There are many optimizations which are all there. I am skipping it. So let's jump ahead to other operations which benefit from sorting. The first thing is join. So there are many ways of doing join. The simplest way of doing joins which is actually the fallback option in general is what is called nested loops join. And you all know what this is. Just a simple loop one inside the other. For each tuple tr in R, for each tuple ts in S, test the pair tr ts to see if they satisfy the join condition. If so add the pair to the result. This works with any arbitrary join condition. If you put crazy join conditions the database system will fall back to nested loops which can be very very expensive as you can imagine. If you have million tuples in each relation you are dead. Million times million. It's horrible. So you generally don't want to do this and you want more efficient algorithm. By the way some terminology the outer relation this one R here is called outer and S is the inner relation. Now when I talk of nested loops join I normally mean the full nested loops join. But when you look at query plans in optimizers many times I will say nested loops at the top level. But inside it is not plain nested loops it is index nested loops. What is index nested loops? The index nested loop the basic idea is the outer level is the same as here. The inner level instead of taking every tuple T as in S I already have one particular R tuple. Let's say the join is on a particular attribute R dot A equal to S dot A. So it is on A. I have a particular R tuple. I have a particular A value in that R tuple. Now I will use the index to go and fetch the matching S tuples. There may be more than one but I can use the index to fetch matching S tuples. So in the index nested loops join this part the for loop on S is replaced by an index lookup. I do not have a figure for it but the idea is very simple. And when you look at query execution plans they may say nested loop and then if you look at the inner relation it will say index lookup. That is the clue that this is an index nested loops join not a plain nested loops join. Do not look at the plan and get scared saying hey the optimizer is doing full nested loops is very expensive. No. Use index nested loops join is usually much more efficient than plain nested loops. Then the next algorithm is the merge join. Intuitively very simple algorithm. If I joining R and S on the attribute A1. For example here I will sort both of these on attribute A1. And now I do a merge. I have a pointer here and a pointer there. If I look at the A1 values they match right. So that pair will be output. Next what do I do? I will move the second pointer here. The next point is B in no longer matches. If there were two tuples with this A there they will get output. In fact yeah. So it will come to D and I will show you another point there. But now what happens is this is moved to B. This is still at A. So this is lower. I will move this also to the next one which is B. Now again they match I will output the pair. Now again I will move this one C greater than B. So again I move this D. D and C do not match which one is smaller C. I will move it here D. Now they have a match I will output it. Then I will move this B1 forward to M. It does not match. But now I will come back here and say the next one. Oh D this is the same as before. So now what do I do? I will actually restore this pointer to the D which I was just that. And this second D will match this D and output a tuple. After which again I will move this forward to M and this one will move forward to F does not match which is smaller F. I will move it forward. Now MM match. Now I will move M forward there is nothing here. This M I move forward. This is Q. This is over. It is at M over. So I have merged the two relations to join them. So this is a very efficient way of joining relations. In one scan of R and S maybe a few backtracking because of duplicates. Other than duplicates it is a single straight scan of each relation very efficient. So that is merge join. Any questions? So this is widely used. The only cost is I have to sort before I can do merge join. So the sort is expensive. The merge join is cheap. Again there are cost estimates of all of these. Again I am going to skip all that. And now come to hash join. So what is the goal of hash join? So let us first don't read the slide. Let us first take a very simple case. Supposing I have one large relation and one small relation which easily fits in memory. What would be a good way of doing this join? I don't have indices. I have one small relation which fits in memory. Oh by the way I should mention that all of these work for other than the nested loops. These things sort and hash and so on work only for equi joints that is the condition has to be something here equal to something there. With more complex conditions they can't be used. It can be this equal to this and that equal to that. That is fine. But if you say this less than that the sort can still be used but hash cannot be used. And if you have other complex functions then they can't be used. For example if I say that I have a geographical coordinate on the first relation and a geographical coordinate in second relation. If I say distance between the two is less than 5 kilometers. I can't use these. I can't sort them in any meaningful way. I can't hash them. One fallback is to use nested loops join. But for the special case because this kind of spatial join is very important other techniques have been developed for spatial joints which use spatial indexing R trees and so on. So we can do an index nested loops join with an index which supports the function. Find everything which is less than 5 kilometers from this guy. And if the index supports that you can use index nested loops join. When you don't have an index the fallback is nested loops. But now we are focusing on equi join. So let's stick with that. Any questions? Okay. Now how does, so coming back to the question. I have one relation which is large, one which is small. So the most efficient way of doing this join is to load the small relation into memory and build an index on the joint corner. In memory index. What kind of in memory index are most efficient? Usually a hash index is very good for these kinds of tasks. I don't need a range scan here. So I'll build an in memory hash index on the small relation and then scan the large relation, match it, each tuple in that I'll do index. Look up find tuples and output it. So what have I done? I have done index nested loops join. But after building a temporary index on the smaller relation. So I can do dynamically create an indices with index nested loops join. That's possible. Hash join generalizes this idea. What if the other relation is not so small? It doesn't fit in memory. What can we do? An idea is as follows. I'm going to break up both relations R and S in such a way that the partitions of that second relation S are small enough to fit in memory. That's a basic idea. So how do I partition? I will compute some function, a hash function on the join attribute. And I will use the same hash function on R on each join attributes and on S on each join attribute. So what does a partition function do? Hash function. Given a tuple, the hash function will give a value between 0 and 4 in this example. And so I'll take the first tuple, compute the hash function on the join attributes only, and it gives me one. So I will put the tuple in this partition R1. The next tuple gives a value 0. So I put it in R0. So all the tuples that have a hash value of 0 fall in partition R0. Similarly R1, R2, R3, R4. So I go over R and break it into pieces and store them in separate parts. That's the first step, partitioning of R. The second step is to partition S using the same hash function. You can't use a different hash function. It has to be the same one. So now we have partition S also. Now I'm going to choose this number of partitions. I've chosen 5 here. I chose it such that there's partitions are likely to fit in memory. Why do I say likely? It's all hashing. There could be some worst case scenarios where all of them map to one value, then that won't fit. Then I'll deal with it later. But it's unlikely. In most cases, you know, if the relation is, let's say, 4 times the size of memory, I'll break it into 5 pieces. Then with high probability, each of the pieces will fit in memory. If I break it into 4 pieces, one of them is likely to overflow. If I break it into 5 pieces, it's likely that all of them will fit in memory. So the first step is partitioning. The second step is to load S0 into memory, build an index, and then take each R tuple in R0 and probe. The important thing to note here is that a tuple in R1 cannot match any tuple in S0, S2, S3, S4. It can only match tuples in S1. Why? The join attributes have to be the same. The hash function is on the join attribute, not on anything else. So if the tuples match on the join attributes, their hash values from both sides should match. So R1 only has to be matched with S1, R2 with S2, and so on. That's the key idea. So once I've done the partitioning, I can finish up the join using a dynamically constructed hash index. Index, that's a good join on that. So that's the idea of hash function. Some terminology, S over here, this one, on which I build the index, it's called the build relation, because I'm going to build an index on it. The other relation, R and R example, is called the probe relation. Why? I'm not building an index on it. I'm using those tuples to go into the index and look up this tuple. So I'm probing the index. So build and probe relation. So that's, I forget all the other details in the slide. I think I've explained it. And I also told you that if it's four times the size of memory, I use five. Why the five? It's a heuristic. So the fudge factor. If the thing is n times memory, I'll choose, let's say fudge factor is 1.2, 1.2 times the, that to find out how many partitions. So that, that is it for join operations. It's actually very simple. So it turns out that there is sorting, hashing, and then index, then nested loops join. For, what about other operations? Join is the most important. Select is also one, you know, that is also very important. These are the two most important operations. But there are others. If I say select this thing, what do I want? No duplicates. How do I remove duplicates? Very easy ways to sort. And then all the duplicates should be adjacent. I just compare with the next, next record. Throw away duplicates till I get a new record. Very easy. Now, one of the important things to note is all the algorithms which we have seen so far handle relations which can be much larger than memory. This is the key thing with databases. They do not insist that the stuff fit in memory. They can handle much larger data and work efficiently in spite of that. So sorting is with external sort. It's very efficient. And then all, you know, duplicate elimination uses sorting. It is in turn efficient. But there is another way. I can use hashing, hash partitioning. How does that work? I partition R into some number of pieces, right? So supposing a relation is 5 times memory, I will partition it again with some fudge factor into pieces. But based on what? What is the hash function on? No, there's no joining. I'm removing duplicates. Yeah, you can do it on any one attribute, but you might as well do it on all attributes. So I can just hash on all attributes. So duplicates should land up in the same part. That is the key. Same bucket. And I'm going to break it into enough pieces that each piece is likely to fit in memory. And then I load it in memory. I can sort or I can build a hash index and probe. And if it's a duplicate, I throw it away. If it is not a duplicate, I add it to the index. So I can do duplicate elimination using hashing also. Projection is basically this duplicate elimination. What about aggregation? Group by and sum count, whatever. How do I do that? What is the goal of group by? I want to compute an aggregate on all tuples in a particular group. So my goal is to bring the tuples of that group together. That's my goal. So how do I bring tuples of a group together? I can use hashing, but let's start with sorting. The simplest way is sort. Sort on what? On the group by attributes. I have to sort on all the group by attributes. Okay? So if I sort on all the group by attributes, I'm guaranteed that all members of the group will be contiguous. It won't be that I have one member of a group here and then a member of some other group and then another member of the same group. So if I sort on the group by attributes, I can just scan once straight down. I'll see one group. The next few tuples may be in the same group. So I'm computing some. I keep adding up count. I add up the count. Average, what do I do? I just get the sum and the count. When I finish the group, I'll divide some by content or put the average. Similarly, other aggregates. When I come to the first tuple of the new group, the previous group is over. I'll output the result. Now start a new group and continue this. Very simple. Sort. Once scan, group by aggregation is done. Okay? Of course, if you are asking for more complex things like median, it's a little trickier because you have to also maybe sort on the attribute on which you compute median. Let's not get into that. For the basic sum, min, max, count, average, standard deviation also in there, all of them can be done with a very simple single scan. That's one alternative. Now, there's another alternative which works very well if the number of groups is small. This can happen many times. You have a very large relation, but the number of groups is small. What I'll do is I'll build a hash index. Hash index on what? On the group. Okay? I take the first tuple, I hash it, go into the hash index and what do I store in there? Let's say I'm doing sum. This is a new group. So the sum currently with the first tuple is whatever the attribute value is there. Now I take the next tuple, I look up the hash index to see if its group by entry is there. If it is there, what do I do? Add it to the sum. If it is not there, what do I do? Create a new entry. Okay? So I repeat this for each tuple. I will look up the index. If it is there, add to the sum, add the count, whatever else. If it is not there, create a new group in there. How many entries are there in the hash index? One per group. So if the number of groups is small, this will fit in memory. It's very efficient. One single pass on data, no sorting, nothing done. Very, very efficient. However, if the number of groups is large, it's more than, like, can fit in memory, there is a problem. Okay? The hash index will fill all of memory, then what to do? So if you, so what some databases will do is try to estimate how many groups there are. If the estimated number of groups is small, they will use this algorithm. If the estimated number of groups is large, they will first partition on the group by attribute and then run this algorithm in each partition. Now, of course, your estimate may go wrong. You thought it will fit, but it didn't. Then there are ways to handle it. Add up to that. I won't get into details. But anyway, so it turns out both sorting and hashing can be used, either can be used for aggregates and depending on the cost estimate, you would choose some of these. There are different situations where each of these makes sense. No? So here, you wouldn't, so extendable hashing is for hashing on disk. Here we are not doing hashing on disk. Our goal is in-memory hashing. So we won't use extendable hashing. So extendable hashing is for disk-based hashing. Now in-memory is extending the hash table entries is very cheap. It's done routinely. So we don't have to worry about that. If you use a Java hash map or a C++ standard lives hash map, internally they take care of all this. They will initially have a small array. If there are too many entries, they will split and extend the array. They will take care of it. If it were disk, then it's a lot more difficult to deal with extending the array on disk. But this whole business of hash storage, I skipped it totally, right? I just said B++ tree is nothing else. The reason is that hash, including extendable hashing and so on, people didn't find any significant performance benefits and it gives fewer features and concurrency control is a, again, you have to deal with recovery concurrency control. All that has to be dealt with for that. So then they decide it's not worth doing it and all databases that I know have given up on it. So today B++ tree rules. Postgres used to support it but it no longer maintained. Oracle, I think, may still support it but they also don't encourage it. Okay, set operations, union intersection is actually very similar to join. We can do merge hashing. I won't get into the details. Outer join. Outer join can be done by modifications of merge join. In merge join, what do you do? You compare. If it matches it outputs. What about outer join? If it matched, at least one, no change. If it didn't match any and it is on the preserved side, left outer join is a left input. Didn't match any right input. You output it with null. Similarly for right outer join, full outer join, whatever. It's very easy. So merge join is very, very easy to extend to implement outer join. What about hash join? For the probe relation, it's actually very easy. So if it's left outer join on the probe relation, being the left, I probe, I don't find a match. What do I do? Immediately output it padded with nulls. But if I want a left outer join where the left input is a build input, what do I do? You see the problem. I'm probing using the other one. At the end, I need to know whether something matched this or not. If nothing matched this, so it is in the index. The tuple is there in the index. The other relation is being used to probe. If nothing matched it, I need to output it padded with nulls. If something matched it, I should not output it. How do I track it? Just keep one bit. In the index, keep one bit. So you probe the index to see if there is a match. If there is a match, set that bit. Otherwise the bit is zero. So I probe. If there is a match, set the bit. At the end, I will go through the entire index once, through all the entries. If the bit is set, forget it, skip it. If the bit is not set, output it padded with nulls. So it's actually very nice. So in one hour, I have been able to explain to you all the basic algorithms at a high level. That's very nice. Of course, there are a lot of details. There's a survey on query evaluation by someone called Good's grapher, who is considered the guru in this area. That survey goes on for like 70 pages. Just for whatever I covered in one hour, there's 70 very dense pages. And this is very old. It's 1993. And since then, more and more techniques have been developed. So if you redo that thing, that survey will probably be 300 pages. What I covered in one hour. So obviously, there's a lot of details there, which is very important because databases are very important. They underlie so many things. People put a lot of effort into optimizing all of these. So it's still an active area. For example, this bitmap index can. The idea has been around, may have been around. I don't know. But PostgreSQL implemented it maybe two, three years ago, four years ago. It didn't have it before. And it's a very nice thing. And I think even today some of the other databases don't implement something like it. So practically speaking, databases are still improving. New, very useful new features are still being added at the query evaluation level point.