 It's going to be on Amazon Web Services. We will run our queries on multiple servers that we get from Amazon Web Services. It's cool. There are extremely simple queries, but the setup is cool. Okay. So hello everyone. It's time to start. So let me start with the announcement. I need to apologize for getting back and forth so much about the final. But in the end, it turns out it's not unexpected. There is no good compromise. There is no time when everybody can take the final and be happy of taking it at that time. So what I will do, I will go for the maximum flexibility. So we are going to have the main take home final. The weekend before the last vector. It's going to be funny, but I'm going to tell you something nice in the last lecture, but you will be after the final. So you will take your main final on Saturday and Sunday, December 4 and 5. We will post it at midnight and you have to turn it in by midnight. But if you cannot make it then or if you don't like to take it on that date, please let me know. I need to know exactly at least a couple of days before the final. I will schedule a second date probably on Thursday, Friday like we were planning initially. It's not going to be over the weekend. It's going to be during the weekday probably Thursday and Friday. So I will make sure that all the finals are graded by Saturday and the grades are done then by Saturday. So I really hope that most people will be able to take the final on the four things, and only a few of you will opt for the other option. I might reuse some of the questions. It's difficult to design to equal finals. So I really trust on your maturity not to share the final. I'm not sure exactly what technology we will use. I'm currently considering catalyst for administering the online final. But anyway, you should not share whatever you see with your colleagues that they have not taken it yet because some of the problems might be reused. I trust you that you want to do this. And vice versa, if you take it late then don't ask. How was it? What kind of questions were? Okay, I can't imagine this as being a difficult final. You have two days to think about this, but I will require you to go through lots of material and make sure that you know it. Good. Any questions about the final? Then a short note. So last time we started discussing, I mean we discussed XML, we discussed XPASS, and we stopped right before discussing X-query. That's what we will discuss today in class, X-query. Now, in X-query, they replaced what used to be document and you'll see where it disappears. They replaced it with doc and when you use the X-query interpreter for homework, which one? homework 6, then you need to use doc, not document. Document doesn't work and you don't understand it doesn't work because you need to use doc. So let me now switch the slides to X-query. So this is what we're going to discuss in the next 20 minutes, the syntax for X-query. Think about this as the SQL language for XML. Just to refresh your memory, XML is a three-data model. It can be arbitrarily deep nested, it can be nested arbitrarily deep. Otherwise, it can represent data as well as simulations can represent data. The data in XML can be non-normalized. We have seen a simple language called XPASS that allows us to navigate through the tree structure much like the UNIX directory commands allow us to navigate in the directory tree structure. But XPASS by itself, it doesn't allow us to do joints, doesn't allow us to construct new XML documents to be returned, and this is what X-query does. I need to mention this because XML keyword was supposed to be my research group's proposal. So early on when XML first appeared, it was abandoned of HTML and when we looked at it, we were doing research or semi-structured data. We said, we need a query language for this, and that was the first query language XML keyword. But we didn't pretend this to be an industrial strength language. After that, it took about seven years for the X-query working group to define X-query. So what is this X-query? Remember in SQL, we have select from where. In X-query, we have for, let, where, and return. These are the four main clauses, and these expressions are called flower expressions. So let me show you a first simple X-query. Here it is. So what does it say? The variable X goes over bib slash book. Do you remember XPASS? What does bib slash book do? So what is the root element here? It's bib, and underneath it looks for books, and everything that you can find, it will bind to this variable X. Once it finds, it binds to X, then from X, it will navigate and check that there is a year, and that the value of that year is greater than 1995. If so, then it will return the title of X. Very simple. So what you get are results like this. Lots of titles. A very simple flower expression. Please remember, instead of document, we need to use docs. Also for simple XPASS expression, this is how you run XPASS. I started from nowhere, but you need to start from the document. You say document and then from there on, you continue with the XPASS expression. The same query as before, we can push more of the computation inside XPASS, and this is considered to be cool X query style. So instead of a where clause, here is where you can put the condition. You can push it inside XPASS, and you can actually go the way, so you don't need to return. You don't need a return clause, you can do this directly. This is the last line, is both a valid X query query, and a valid XPASS query. XPASS, I should mention, I mean, XPASS uses XPASS. Everything you can write in XPASS, you can write in XPASS too, but you can write more stuff. Okay. So here is something more interesting that we can do in XPASS that we couldn't do in XPASS. Namely, create interesting results. Here is an example, it's not that interesting, but to show you the concept. Again, we go to overbooks, and for every book X, we return a particular pattern, a particular pattern or template. Sometimes they call it template. This pattern is answer. There is a title and a year inside the answer, and what we place in the answer. We place a title and we place here. So now what we get as answer is, as a result, is a set of answer elements. Inside each answer elements, there is a title and a year. Okay. So notice, let me erase this stuff. Notice interesting parenthesis here, the left braces, the left braces. What is this? What do you think is that? Substitution. Yes. So it's an indication that what comes after title is not a constant, but it is part of X query. So I actually have this on the next slide. If you drop it, what was this query actually return? Yes? That literal string. It will return exactly this, right? So here it is. You actually get one for every binding of X. That's what you get. Okay. More things that you can do, an X query that you couldn't do, we couldn't do an X pass, is we can construct intricately nested output XML documents. For example here, what does it say? Let's read this query together. So look at B is bound to bib. Then A is bound to the authors in bib. Then what do we return? We return the result. But in that result, what do we place? What comes first? Author. An author, right? After that, titles, one or multiple? Possibly multiple titles. So this is what we get. You get multiple results, one for every binding of A and B. Inside each result, we get one single author and possibly multiple titles. But actually, what are these titles? Which titles do we get? We get an author like Smith. We get several titles, but how are these titles related to the author Smith? It's Smith. Sorry? They're not authors, Smith. They are titles of Smith, right? And this is done right here. Because now when we fetch the titles, we start again from B. Look at B. B is bound to the root. So actually, this is my little trick. So I don't have to start from document again. So I bind B to the root. And then we return those books whose author is our A, our current author. And we get all the titles written by the current author. Okay, so this gives us the power to restructure documents and to essentially to reformat them or restructure them according to your desire. Why would you do this? Why would you start from one particular format and restructure the data? What physics, data, dependence, concepts give views, right? Maintain some particular structure but change the underlying data. Maintain a particular structure, but okay, so you're right. This kind of defines a view. This defines a new view over the data. But the next time we have another reason. Remember DTDs? Why would we use DTDs? DTDs are like, they describe the structure of the XML document. Who cares about the structure of the XML document? Yes? Anything that consumes it? So you can take your own personal XML thing and run it through one of these to make it into the RSS DTDs and RSS feed from that data. Exactly. You want to give this document to some application that consumes it, to some consumer. And that consumer only accepts documents in a particular, with a particular structure, with a particular DTD. And here, Xquery allows you to construct that output structure, that DTD, right? It allows you to restructure your document to conform to the output DTD. In homework, now I start getting confused. Homework 6, which is Xquery, you are allowed to answer 10 simple queries. And for each of them, Param gave you the DTD. He wants you to produce the answer exactly according to that DTD. By the way, I forgot to mention, the Xquery interpreter that we use is called Zorba. It's free. You can download it from the web. And it comes from Oracle. It's actually implemented by somebody I know. She was a poster of mine on the XML QR query. Dana Florescu, she spent the last 10 years working on Xquery and implementations of Xquery. She actually had a company that was bought by BA. And now she ended up at Oracle. So she's an expert in Xquery. And I found this to be a cute interpreter. Before that, we had Galax, which was also free interpreter. But it's no longer supported. So we don't have Galax anymore. Zorba seems to be quite efficient. How many people have tried Zorba? OK, not too many. If you use Windows, it's trivial. You just go to the binary. You get the binary and it runs. If you use a Mac like I do, I think I told you this several times. It took me half a day to install it. So plan ahead. Good. So here is the result of the previous query. More about Xquery. We have Aggregate. And they're actually used in a much more simple fashion than in SQL. And Aggregate is really a function that we apply to a list. For example, here, X is bound to a book. And that book might have several authors. The expression X slash author returns a list, a list of all the authors of the current book. And Count is applied to this list and tells us how many authors there are. And it returns books that have more than three authors. So we have Count, Average, some distinct values. Distinct values is interesting. Let me show you distinct values. But not this one. Let me skip here. Distinct values also takes a list. And it does exactly what you expect. It eliminates duplicates. Now that list must be a list of atomic values. It doesn't want to mess around with finding out whether two XML subtrees are equal and have to be eliminated. So it really insists that you give it a list of atomic values. And then it will eliminate duplicates. And that's what you can do. So here, your own interesting piece here is that this query finds all authors. But of course, one author might occur multiple times. Distinct values eliminates duplicates. And now you get a list of where every author occurs only once. This is like the typical usage of regrouping. You first construct a unique distinct list of all the distinct values, and then you iterate over it. More about restructuring. There are actually two patterns to restructuring. One is flattening. You have a nested structure, and you flatten it. And the other is the opposite, nesting. So let's see flattening first. So here, remember we have our tree looks like this. We had bib. Under bib, we have books. We have books. And every book has one title and multiple authors. So it is nested in the order authors are under books. What this does, it flattens them. It lists them like in a flat set of book title comma author pairs. So here is how we do it. X iterates over the title. And Y iterates over the authors. And for each such pair, we produce one title author answer. So what happens if you have a book with, what happens if your database consists of a single book with three authors? How many answer elements do you get? You have one single book, which has a title and three authors. How many answer elements do you get? Three. Three. What are these three answers? The first contains the title and one author. The second contains the same title and the second author. Good. What happens if you have just one single book, three authors? But it has two titles. Two titles, three authors. What do you get? You get six results. Because now X iterates over a set of two titles and Y iterates over a set of three authors and you get all possible combinations. What happens if you have three titles and no authors? What do we get? Then you get nothing. You get three times zero. So you got the idea. So what happens is the fourth clause is that you get all the combinations of variable bindings. And you have to figure out how they interact. Any questions about this? The opposite is nesting or regrouping. So what we do here is that we reverse the hierarchy. Instead of having books, title, author, author, author, we will have now an author. And underneath, we have all these books, actually all the titles. So how can we do this? Well, first, we need the list of all distinct authors that we get from here. This is the first thing we do. We have a list of all the distinct authors, authors no duplicates. Otherwise, we get the same author twice. And then for every author X in this list A, we create one answer where we plug in the name of the author. And now we need to find all these books. Because we want to list all the book titles next to the author name. So we start again from the root, from bib. And this is where we do the join. We match for the current author. And we return the title of the book. Where's the title tags coming from? Ah, so let me actually erase this and ask you this. What is this Y? This is bound to what kind of node? Every variable must be bound to either a list of nodes or to one single node. Is it a title? This is bound to a title, exactly. So therefore, when we return here, we actually get a title element. And that includes the open title and the closed title tags around the title of the book? Yes. Maybe a better way to think about this is that it is bound to a node in the XML data model. And that node, when you print it, then you have to print begin tag, the con, everything underneath, and then end tag. So these titles here, each of these elements represents one binding of $Y. OK, so what's more about regrouping? So what's the difference here? I can't figure out the difference. Oh, oh, oh. So here, the difference is we didn't have A. Yeah, it's a minor difference. I don't know why I kept two slides. Ignored it. Good, so now let's take a higher level look. This is like the second query language that we studied. We studied like equal for three hours and x query for 15 minutes. So let's compare them. Here on the left, we have a simple SQL query. It does a simple selection from product. Now remember, we discussed last time there is a canonical representation of every relation into XML, where every row becomes a row element. And its fields become sub-elements with a corresponding name. Under that representation, we can express the same query that we have in SQL. We can express it in x queries. So let's see. It's right here. It's there. So here is how it goes. We go to the database. We go to the product element. Under this product element, we have many, many rows. And x will be bound to every row. And then what we need to do, we need to return its name and price, because this is what we need to return here. Notice that we have an order by in SQL. And we can do exactly the same thing in x queries. We can order by. OK, so that was a simple comparison. Here is a more complicated query that doesn't join. So here we want to return all product names of all the products that are made by company in Seattle. So let's see this in x queries. There is a trivial direct translation from SQL into x queries. x and y in SQL, they become $x and $y in x queries. So x was a product. Therefore, $x will iterate over the product rows. And y will iterate over the company rows. And here we have a join. Look, the join is right here. We make sure that the maker is equal to CID. And CID is equal to Seattle. And then we return the name. It's a very simple analogy. Now, this is a lame x query. Let me show you a cool x query. It's right here. That's a cool x query. You try to push as much as possible off the joins if you want to write cool x query. You don't write weird clauses. You push as much of the predicates inside the x path expression. So what happens now is that when you iterate over a company, you can already select only the company that are in Seattle. And now, when you iterate over the products, you don't select all of them, but you return only the products that are made by the company y. And that's a cool x query. And here is the result. More comparison, aggregates. So one thing to remember about SQL is that it treats aggregates in a very unique way, in a non-standard way. No other programming language today treats aggregates that way. The reason why SQL insists in doing this is because aggregates are, first of all, very frequent. And written this way, they can be optimized much better. They can be recognized and optimized much better. In x query, we are back to this traditional way of doing aggregates, which means nested loops. I actually tried to convince the working group at some point to adopt a SQL-like group by. But they didn't take my proposals. So we are stuck, unless they add an extension, we are stuck with this more verbose way of dealing with aggregates. So let's look at this. What does it do? It lists for every company the number of products that that company makes. Look how easy. How easy this is in SQL. We just do the join. And then we group by the company. And once we group, then we can count. Now in x query, and in any normal programming language except SQL, you do this by having a second iteration that does the count. And here it is. First, we iterate over all the companies. And now for every company, for every company here, we need to count. So how do we count? Well, right here. You start again from the root. And you look at for all the products that match your company, and that have whatever additional constraints they must have. And you count them. The problem is not that it's longer. The problem with this query is that it's more difficult for the optimizer to recognize it as a group by. That's the issue. Once you optimize and recognize that the query is a group by, it can optimize it better. And it's actually difficult for me to make the point. But in the first lecture, when we discussed x query, I showed you a SQL query with group by and the corresponding one with nested queries. And that has an implicit join. Even if you start with a single table and you do a group by, if you replace the group by with a sub query, then you have an implicit join. Depending on where you place the conditions, you get variations of that query. And only one of them is equivalent to the group by. So the optimizer has a hard time figuring out that that nested query is actually a group by query. And I hope we'll come back to this point when we discuss query processing. It's not an elegant construction because it's limited. You can't do nested aggregates this way. But it's a major insight in the SQL framework that allows an important class of queries to be optimized very efficiently. Good. So let's continue the discussion of x query. Let me skip this slide. This makes the distinction between for and let. But I have a better slide for that. So let me move to, which comes actually right here. So I have two more slides on x query. But this is an important idea that I want to make here. We have seen usages of four. We say $4x in something. And this means iterate x over every single element in the list. This has to be a list. And what the fourth statement does, it introduces a loop. And then at every step of the loop, it binds x to one of the elements in the list. There is a second construction called let, which looks like this. Let y equals something. It's a completely different construction. This does not create an iteration. It simply takes the expression on the right and assigns it to y. It's like a shorthand. You can always remove it. If you ever get confused about the distinction, come back to this slide. This is the aha slide. So here it is. We have a collection of books. And the first query says, for x in these books, return this result. The second says, let x be this collection of book and return this result. So here is a difference. In the first case, the return statement is fired once for every book. So you get a different result for every book. In the second statement, the return statement is fired only once because there is no iteration introduced here. And therefore, we get only one single result with all the books inside, because x is bound now to a list of books. So that's all I have to tell you about x query. It's really a simple and cute language. Essentially, it's a functional programming language. But it's a good substitute for a query language. It became a little bit richer than it should be. But it really serves its purpose well. It's that query language for x query. You just learn it. You do these 10 exercises for the homework, and then you know x query has not a big deal. I think it's briefly described in the book. But the slides plus a book, plus whatever help you can get from the web should be enough. OK, it's OK to move to the next topic. Good. So then today, I'm going to start discussing data storage indexes and database tuning. So here is my plan for today. Storage and indexing are covered in chapters 8, 9, and 10. So please read about them. I also want to talk about database tuning, which is covered in chapter 20. And I will not talk about security. But you do have some questions about database security on the HIFS homework. So please read either these slides, which are in the slide deck, or read the corresponding chapter in the book, the textbook. The slides where, when I wrote these slides several years ago, I wrote them using this textbook. I used Ramakrishnan and Gherke textbook. So you can use either the slides or the textbook in their equivalent. Good. So regarding the first two topics, I'm going to start discussing storage and indexing. But then I'm going to switch to skip some material and move to database tuning. Because you have a question on database tuning in the homework and it's due next week. And I don't think I'm going to finish indexing today. We are going to postpone the detailed discussion of B trees and of hash tables for next time. Good. So let's start, actually, slowly, discussing the storage model for database systems. Of course, they need to store these tables somewhere. So where do the tables go? They go in files. They are being stored on disk in files. But they are not your regular files. It's not the case that one table is one file. And this is because a database management system needs more control over how the data is stored on disk than you normally need. When you use Word or Emacs to write a little file or Excel, then the file is what you see in the operating system. And you can copy it. You can do whatever you want with it. And that's fine for that application. Not so for a database management system. It needs to have more control over how the data is organized. And you can think about this control as being in two dimensions. There is spatial control. It needs to know where the data is stored. And it needs to have control over where the data is laid out on disk so it can improve performance. And then there is temporal control. It needs to know exactly when the pages are being forced to disk or when they are being pinned in main memory because of all the transaction issues that we discussed in so many details. That is the temporal control. So what do database systems do? They need to use files, but they don't use files directly. There are two strategies. One is using raw disks. This means that when you store your database system, you say, this disk is yours. I'm not going to use it for anything else. And then the database system takes over that disk and it does whatever it wants with it. So the advantage is that it can get the highest performance. It can organize the entire disk exactly how it wants. Nobody else accesses that disk. That's the highest performance. The disadvantage is if you have your laptop with only one disk, you can't do this. You don't want to give your entire disk to the database system because you still need the operating system. You still need to log in. You still need to use Word. And moreover, it reduces portability. That's all I want to say about this. The other option is to use an operating system file. But even here, it's not the case that every table becomes one file. Instead, what database systems do, they grab a huge file to start with. I just finished installing Postgres on my laptop this afternoon. And that's what it does. It asks you, where should I store my database? And you give it this directory. It's quite protected. It doesn't want you to look inside. Somewhere, there is a big file that you shouldn't look at. And this is where the database is. So in that file, since it allocates a huge chunk to begin with, it has more spatial control. It can do whatever it wants. But there are some disadvantages. The operating system might limit the size. I'm not sure about the second one. It might limit the number of open file descriptors. So there are some disadvantages for doing this. It's OK to do for your little toy application on your laptop. But it's not OK to do for an industrial storage database system. Have a comment? I think that MySQL actually does write two files per table, one for indexes and one huge file. That's true for the MySQL tables. But NODB has one giant mess of files. All right, cool. Can you stick your index in arbitrary places on the record? So, for instance, you have a bunch of data files on one spindle, and the other one has the indexes for those files. OK, so if you didn't hear at Microsoft, the discussion here was that MySQL has two options. NODB and the other is? Of MyICEM is good to call it. It's ISAM. MyISM. I never really remember it. It's the default one. It doesn't support transactions. It's not used in industry for that reason. I see. So the one that uses files. The one that uses individual files, particularly. OK. So under these two options, in one of them, every table is mapped to a file. And in the other one, the entire database is mapped to a single file. And of course, the second one is better. It supports transactions. It's a better option. So now, so let me skip this. The most commercial system, they offer you both choices. So what happens inside that space that is allocated for the database system? At this point, the database system is free to call whatever it stores at the table. It's going to call that a file. Wherever it stores an index, it's going to be a file. But it's not necessarily an operating system file. This might be a file inside the space that it controls. This file, from the perspective of the database system, is organized in one of two ways. It can either be a heap file or a sequential file. Heap means just collection of records, no guarantee about in what order they come. And sequential file means there is one attribute called the key. And the records are sorted according to this key. And be careful. We are going to use key many times in this lecture. And none of them represent the primary key. And none of them are actually not even equivalent. So this is a key of the sequential file. It doesn't mean the primary key. OK, so what happens on disk? So the disk is partitioned into blocks. And the database system sees a disk as a collection of these blocks. They have fixed size. It knows where they are. It can read one block at a time. But now it needs to decide how to map the tables and the files to blocks. And one thing to keep in mind when discussing this mapping is that sequential reads from disks are much more efficient than random accesses. If you can read consecutive blocks, many consecutive blocks at the same time, it's much more efficient than trying to read the same number of blocks from different places on disk. OK, so what I want to show you in the next few slides are simple ideas. But I think we need to go over them. Namely, how blocks are organized on disk, how records are organized inside blocks, and how fields or attributes are organized inside the records. And just to keep some example in mind, I'm going to use this example here. It's not, I'm not going to use it in a critical way. But imagine a table where we have four attributes. So that's what you should look for. Where do the records consisting of these four attributes, where do they end up on disk? That's a question to ask. So there are three issues. One is managing, I wrote here free, but it should be managing blocks. Both the free blocks and the blocks that are not free. How do we manage them? Packing records inside blocks and packing attributes or representing attributes inside the records. So let's see how we deal with blocks. The free blocks, the blocks that we can grab when we need more space, they can either be linked and you have a pointer to the beginning of the list. And then you grab that block when you need it and you update the pointer. Or we can have a bitmap. Right, so the list of free blocks looks like this. Here is one block, like 16K. And somewhere here, there is a pointer to another block. And it says this is free. And here we have a pointer to another block. This is free. Bitmap means the following. Here are all the blocks on your disk, from zero to a very large number. And somewhere on a special place on the disk, you keep an array of bits. But there is one bit for every block. And here you see ones and zeros. One means it's occupied, zero means it's free. This is not a rule. You can reverse it, but I think it's a reasonable convention. So now if I need a free block, in the first case, what do I do? I need to insert an element and there is no more room. I need a new block to insert that element. Where do I get it from in the first case? From the head of the linked list. So I just grab this block. This is going to be my block where I put it. And I update the pointer of the list to the next one. Very simple. And of course, when you free a block, you do the opposite. You put it as the beginning of the list. But if we have a bitmap representation, I need a free block. What do I do? Look for a zero, and this tells you where to go. And this is your block that you grab. And of course, you update this now to one. How will you compare the two? Which one is, what are the advantages and disadvantages? Yes? First way is much faster because you don't have to scan through. The first one is much faster. You just get the free block. What's the advantage of the second one? Is there an advantage? You can get contiguous blocks. You can get contiguous blocks, exactly. Maybe you know you need to create a table, and the user told you, I need you 500 gigabytes. And you look for 500, you try to optimize this. Maybe you don't find 500 consecutive. You don't find 500 gigabytes worth of consecutive blocks, but you can optimize this. Yes? Well, I was just thinking that in the case of needing 500 gigs, as long as you get blocks of suspicious size, like 10 blocks at a time, then you might get 9% solution on that and not worry about it. Right. So you don't need the entire file to be contiguous. But you would like large chunks in this file to be contiguous. And this is where the bitmap helps you, because now you have a complete picture of the free space on your disk, and you have more control where which block you allocate. Yes? So when it is allocating contiguous blocks in the bitmaps method, how does it know that this is the next block for this particular kind of thing? How does it know the next sequence in the category? Oh, yeah. So if I need 10 zeros, how do I find my 10 zeros in this block? Yes. I guess you just scan through it. You scan eventually into your 0, 4, 10 consecutive zeros. Or you might do something where you look for zeroes, like treat it as a byte array, and just look for zeroes, get 8, and that's good enough. So I mean, after 8, so then 9 would be maybe not in the next one, so how does it use a pointer to the 9th one? So are you asking about the physical representation of the bit array? Yes. OK, so how is that representative? I don't think there is a rule, but we can discuss it a little bit. Obviously, you pack 8 bits into a byte, and actually 32 bits in a word. And obviously, this bitmap has to be on disk. So you probably have set aside a number of blocks, and you can do the math. You know exactly how many blocks you need, where you are going to, which are reserved for the bitmap for the entire disk. So think about this as being here at the end, because I don't have room at the beginning. So this is where your bitmap is. So you read this in main memory. It should fit, it better fit. And now you just do sequential scan through this. You have to unpack the bits, which is something you can do easily in C, C-sharp, less easily maybe in Java. Yes? Are the blocks in bitmap of the same size? All the blocks are of the same size. That's an invariant of all the blocks. Is there no time limit? No, no. What you can do when you initially format the disk, then you can decide on the size of your block. So if I use a part of the block, I cannot declare the rest of it as a block. Exactly. If you use only a part of the block, it's your job. I mean, it's a database, a system's job to decide how to organize a block, which brings us to the next question. Actually, we didn't get there yet. Let's see first. This is the link list for the more efficient. No, this is orthogonal. The link list is orthogonal to how you organize inside the block. I only showed you here, but what I showed you on this slide is how to deal with the free blocks. When you need a completely empty block, this is how you think about it. OK, so let's discuss how blocks that belong to different tables are organized. Well, same idea. You can either link some. So if this is a header page, whatever that means, then all the useful pages can be linked right here. And maybe all the separate you have pages or blocks that have some free space, and they're linked here. And you can move back and forth. But a better organization is when you have a complete picture, when you have a header for the entire table, that tells you where the blocks of this table are. But remember our little example when you need 5 gigabytes of main memory? Maybe you guess this in chunks of, let's say, 8 continuous blocks. Then this is information that you might store here. You might have in this header all the information about how these blocks are grouped on disk. So when you have to do a sequential read, you should know exactly how to optimize that sequential read. Do you see the distinction? When you design data structures in any programming language, this is your mindset. You store things in lists because that's a cleanest design. It's easy to insert, delete. That's how you maybe organize it in trees. But there is a simple structure. Since data sits on disks, it's actually more convenient for the database system to have a global picture of where the elements in the list reside. That is the header. Such that it can, for example, this allows it to go directly to the middle of the file. It's difficult to do with a list. If you have a list, you might have a pointer at the beginning, one at the end. How do you get to the middle? With this organization, you have a complete picture of where all the blocks are. And you can get to any part of the file. Why would you go there? Maybe you need to insert. And the header tells you which blocks have how much empty space. And then you can optimize a little bit your insert. Maybe you want to fill it. Or maybe you want to do the blocks that's more empty and put the record over there. Good. So this is how we deal with blocks. So I should have started, actually, with this slide. Blocks or pages that they have fixed size. I don't know where I get the 8 kilobyte size. How big are they in Windows? Is it 16? You can set it to whatever you feel like. But when you buy it out of the box, Windows system? Those are four, but the database is usually figured explicitly. OK. Anybody knows for sure? In Windows, blocks are 4 kilobytes. Good. I heard 16 for some database system, but I don't remember which one. OK, so now let's discuss records. How the records are packed into blocks. And records can be of fixed length or of variable length. And we'll discuss the implications of this. And one important concept here is that every record will have a unique ID, which is called record ID. Usually, it's a physical ID. It's an ID that consists of the page number, the block number, that allows you to get immediately to the place of the disk where you get that record. And then there is a pointer inside the block, the slot number, that tells you how to find the record inside the block. This is the record ID. Why do we need a record ID? We have primary keys. And that's what we learned when we studied functional dependency, when we did entity relationship diagrams and functional dependencies. We discussed every table must have a primary key. Why now introduce record IDs? Who needs to fetch records from disk? There is a particular place that we studied that needs to have access to records, which is a log. Especially in the aris undo part of the log, this is when it needs to refer to particular records. And it has to do so through record IDs. But it can also be used as an optimization for foreign keys. For example, when you have a foreign key, of course you can store the key value, which is a semantic value. But if you want to optimize it, you might store instead of the key value, you might store the record ID. Using the record ID, you can immediately get the record. Yes? This record ID is different from the page LSN that we studied in the LSN. Page LSN, yes. The page LSN is a pointer from the page to the log entry. The record ID refers to smaller entities, it refers to a record inside the page. You might have multiple records. And the record IDs will differ by the slot number. And it's a pointer that allows you to refer to a record, as opposed to referring to a log entry. OK, so let's see how we organize records in a disk. Very easy. We adjust the back records inside the block as densely as we can. We need to know how many there are. So somewhere in the block, there is a number that tells us we have so many records. And here are the records. Number one, number two. Remember our little product table? Each record from that product table is one record here. So now, if you want to insert a new record, what happens? What do you do? I scan all the way to the end. Sorry? I scan all the way to the end of the record. You scan all the way to the end. You need to know where this is somehow. If the records have fixed lengths, then you can compute it based on end. Otherwise, you need this information. And then you just put it here. Now if you want to delete a record, I want to delete record two. What do I do? What do I do if I have to delete record two? Of course, this has to disappear. So how do I reclaim the space? You can shift. Or we can be smarter, a little bit more efficient. We can take the last one and move it here. That saves the shift if they are fixed lengths. Now in what we said, insertion and deletion, there is a big problem. What's a big problem? We have a big problem with this plant. What of seeds? There is a logical problem. So suppose the size is fixed. If the size is variable, then we have additional problem, I realize this. But suppose the size is fixed. And we just insert and delete as we did here. Yes? Do you reckon I'm on order? Yes, that's a problem. So it's not about the logical order. It's about the fact, look at record number three. Somebody points to record number three. Remember the record ID? That says go to block B, record number three. So now if you shift them, because you have deleted, what does this record become? It becomes number two. And all over the database, we have pointers to it. And in the log, there are pointers to it. They say number two, number three. But now it became number two, and we are in trouble. We can't do this. So how will you organize it better? One of you have an indirection at the block level or somewhere around there. So it says go to block B, dig, and then just update a pointer in the same block when you rewrite it. Exactly. Somebody said this, that every problem in computer science can be solved with another level of indirection. So that's exactly the solution. We have another level of indirection, which is right here. Something happened to my slide. I don't think there was anything. Can you see? Yeah, it was blue. Sorry? I'm guessing it's blue. So this is what we do. We introduce some slots at the beginning. And they're just pointer inside the block. They are small points. We only need how many? Two bytes, two points inside the block. So now if you have to remove one, you can shift things around. But the slots, they stay there. So wait a minute. What happens if I want to remove this record, which means if I trace it back to the slot? Now we just move the problem around. How do I remove the slot? Can I remove that slot? No. And that is a problem. I mean, it stays a problem. Instead of removing the slot, what database systems do they introduce a tombstone? So they put a special marker here and say, this is gone. Don't write. And it stays here. It stays here until the block is collected and reused and reshuffled. So if you have a lot of insertions and deletions inside the database, you might end up with many of these tombstones. And then some restructuring is in order. OK. Yes? Restructuring, not to reclaim that tombstone kind of record, happens only when the database is preinstalled. I think there are commands for cleaning up. Even though the data is there, the data still lies in the database. Yeah. Now if you can do this while the database is getting to the same part of the transaction, I'm not sure. But I think this is vendor specific. But there are ways to, there are commands for reorganizing the data and exactly what they do that I would know. OK. So now let's look inside the records. Now records, they can be of two kinds. There are either fixed length records. When you're lucky that all the attributes in your record have fixed lengths, then you know exactly every record how long it is. Or they can be variable lengths. But of course, if they are fixed lengths, then it's trivial. Then we know the lengths of each field. And if you want to access, for example, descriptor descriptions, then you just need to add up the lengths of the previous two fields and you know exactly where to go. You go to description. You can compute the offset of every field by summing up the other offset. Where do you get that data from? Where is that information stored that tells you the length of every physical record? It's written on the slide, right? It's in the systems catalog. This is where you have information about each table, what is the schema. And in addition to the schema, there is physical information about how the data is organized on this, including offsets of every byte. So notice the importance of schema information. And remember the distinction to XML. In XML, all the schema information is interleaved with the data. Very interesting contrast. OK, now if the records are of variable lengths, then it's exactly what you expect. You can do one minor optimization and database system as far as I know do this optimization. They move the fixed lengths at the beginning. So those can be dealt with a little bit more efficiently. And for the others, you just need to, what am I showing you? This is the variable length. For the others, you have another level of interaction. There are some pointers in the header that point you to the beginning of each field. So that's a typical structure of a record. It is there is a header of the record that contains the lengths. It contains pointers to the beginning of the fields that are after the fixed length fields. And then there is more information, which is on the other table. There is a pointer to the schema entry. So we can retrieve information from there. There is a sync. What's a timestamp? Excuse me. What's a timestamp? Why do we need a timestamp? We should know this. Of course, we need a timestamp for every record. Because? Because actually, that's a verification. Because some concurrency control mechanism, the optimistic ones, they need a timestamp for every single element, for every single record or block, depending on whether they do logical or physical locking. That's why we need a timestamp. Remember snapshot isolation? When a transaction starts reading under snapshot isolation, you always have to return the records with a timestamp, or with a timestamp that corresponds to that transaction. So you need that information a lot. You might have several copies of the same record, and with different timestamps. Good. So so much about the physical organization. Let me actually go back. One thing that I didn't emphasize is this. Where is this one here? So this is a block. And inside this block, we place many records. But what happens if the record is bigger than the block? Yes? Two blocks. Sorry? Two blocks. You can use two or three blocks. But actually, database systems, they don't like this. They usually reject it. SQL Server has a limit. Oracle has a limit. And British Order DB2 has a limit too. And the limit is essentially dictated by the size of the block. You can't exceed in a record. You can't exceed the length of the size of the block. Because otherwise, the performance might degrade, and you don't understand why. I'm actually using now Postgres for a research project. And I had to create some fields that were 10,000 bytes long because the data was extracted from the web for some bizarre reasons. And they took it. So I suppose Postgres allows records to be bigger than the size of the block, surprise. But I know for sure that commercial database systems, they don't allow you to do this for performance reasons. If you need to do this, there is a particular type, which is called a blob. It stands for binary large object. It's an attribute. You can store an attribute in your record. You can have an attribute in your record, which is of type blob. And usually, this is physically stored in a different place. There is a pointer in the record to the corresponding blob. And this can be as big as you want. It can be arbitrarily big. There is a variation called club, which stands for character large object. And if you promise it's going to be full of characters, then SQL is willing to do some searches inside. You can use some limited predicates in SQL. If your field is a blob, then you can't use any predicates inside SQL. OK, yes? So the data type of SQL, the one that's called n bar char max, does that actually use a character large object? Who knows what n bar char max is? I don't know what it is, but does anyone know? The largest bar char you can have, probably. Is it a type, or is it a constant? It's a constant. Is it a constant? It's a type. It's unicode bar char. Oh, it's unicode, I see. So you abandon the unicode bar char. And the max is, actually, there is a max. There is actually a cap to how big it can get. Oh, so that's a constant, right? It's a configuration value, it looks like. OK. So it's not really a character large object. It's not what? It's not really a character large object. It's a SQL server thing, I think, up to 4,000. So that fits with the 8k block size, right? Right. So actually, I think it's 16k. If it goes to 4,000, it should be 16k. Well, OK, that's what it says. You can only have a character after that, 4,000 then. OK, so this is a limitation on the types that are in the record on bar charts. This is not a limitation on blobs. Blobs can be large, can be arbitrarily large, unless there is something wrong with the database. I think about blobs as movies or pictures or anything large. The limitation is what you can do with the blobs. With all the other attributes, you can compare them, you can join them, you can filter on them. With blobs, there is nothing you can do. You just return them to the application. That's a difference. And for that reason, the database system stores the blobs in a different place because it only needs to access them when it knows that they have to be returned to the application. Good. So let me move on. All this was very simple, actually. And I'm not planning to go any deeper into the physical organization of data. So let me just summarize. The files that we discussed can be of two types, random order or heap files, which means the records as they are packed in blocks and ordered across blocks. They don't come with any guaranteed order. They can be sorted or sequential access, which means there is a particular attribute and the records come ordered according to that attribute. And there is a third kind of file, which are indexes. And these are special files that have a special structure that allows us to do quick searches inside the database and inside tables. And that's what I would like to discuss in more details. But I see here that I have more slides about the physical organization. Let's go quickly over this. I have three more slides. We didn't discuss it. Maybe we should discuss it quickly. If we, you know what, let me skip this. It's not that interesting. We discussed a tombstone. Yeah, we discussed a tombstone. And yeah, you can look this up. I want to move on to indexes. So that's my plan for today is to discuss general aspects of indexing and then to discuss database tuning, which is going to be a more superficial discussion, but we need it for the homework five. And next week, we will discuss details of B3s and hash tables. And especially the hash tables may be new for you. I think you know hash tables until you see extensible and linear hash tables. So an index, we know what an index is. It's just a separate file that's organized in some smart way such that whenever you give it a particular search key, it will tell you the corresponding value. And the value can be the record or maybe some pointer that allows you to retrieve the record. Notice that key here means something else is not the primary key. And if the file is sorted by some key, it's not necessarily the key based on which the file is sorted. This is the index key. So that's how you should think about the index. Now, the value is not like a payload that you return to the application. The value is used by the database system to access something inside the table. And it can be one or two things. It can be a pointer to the record. Remember the RID. It can be the RID. It tells you how to quickly get that record. Or it can be the record itself. And that's an important thing we will discuss in some detail. So there are two kinds of indexes, but they have two different names. The index can be clustered or unclustered. It's a very important concept. A clustered index is when the order of the records on disk is dictated by the index. The index dictates how the records on disk are sorted. Unclustered means the order of the records is independent of what the index does. There is a second terminology into primary, secondary. And here the terminology is not consistent. Different people use it with different meanings. The more standard meaning is a second one, actually, which is just another name for clustered or unclustered. Primary means clustered. Secondary means unclustered. And many people refer to primary and secondary. When they say primary or secondary, they mean clustered or unclustered. Some people, however, use it with a different meaning, primary means on the primary key. Secondary means on anything else. So what I want to discuss next, I'm going to postpone the B3 and hash table for next time. I want to discuss this important distinction between clustered and unclustered. So clustered means, if you think about the B3, the B3 is ordered. And the B3, the keys are ordered. If the keys in the table are ordered in the same way, then the index is called clustered. And as you will see, this is a huge performance improvement over unclustered. If the index is otherwise, then the index is unclustered. So here is how to think about it. On the left is the index. And I didn't draw a tree, but you have some ideas about search trees. And you know they're essentially, they are cleverly organized ordered structures. So I'm dropped the clever part, and I just show you here an order structure. So this is what the index does. It tells me for every key, where is the record? And in this case, the records themselves are also ordered according to the same key. That's the important thing. What I show you here is one block at a time. This is one block. And each block holds two records. This is one record. This is one record. So it's a simple picture, but it actually conveys a lot of information. Suppose you want to read, suppose you want to answer the query, find all records where the key value is between let's say between 20 and 60. How do we answer this query? How do you find all the records whose key value is between 20 and 60? Let me clean up a little bit. Clearly we start from the index, right? And in the index, what is the first entry that's of interest to us? 30. So somehow we can access this index very fast. It's much faster than accessing the main table. So now we follow this pointer. We get this block. But how do we find the other keys? This gives us 30. How do we find the other keys that go all the way to 60? You just go down the index. And it's not just that we actually go down. We don't need the index anymore. That's the beauty. We go down directly in the data file. And it's not just that we don't need the index. The important thing is that whenever we get a block from disk, all the records on that block, they're useful to us. That's the important thing. Except maybe for the last block, where we might have some of the records might not be useful for the query. But otherwise whenever we pay the cost of fetching a block from disk, all the records are returned to the query. Compare this with an unclustered index, which looks like this. So let's go through the same exercise here. Let's find all entries between, whose key is between 20 and 60. Let me be faithful to the other one. How do we answer this query? In the index, we start. What did I do here? No, the numbers are all screwed up. It's, let me erase this. Let's go between 10 and 30. I know it sounds silly, but it's a useful exercise. So what you need to do, you need to fetch every single pointer. Right? So now we need to traverse the index. There is nothing in the data to help you figure out where the next record is. You need to go down through the index and follow every single pointer. So obviously there is a big disadvantage, right? If you, the thing that I wanted to tell you is that that if this range is too big, then it's actually more efficient to do a sequential scan of the entire data. Instead of reading these records up and down from various places, it's actually more convenient to ignore the index completely and read the data significantly. So that is, it's an important thing to know about the clustered versus unclustered index. The clustered index is okay, even if you need to read like one third of the entire data or even the entire file, you can still use the clustered index. And if you read the entire file, it's not going to be a savings over the sequential scan. But if you read like let's say one third of it, then yeah, you might save a lot of time. With an unclustered index, it's only if you need to access very few records, only then do you win. If you have to access like 10% of your file, you might be actually better off doing a sequential scan than accessing 10% of the file in random fashion. Any in the index particularly? Oh yeah, that's an issue that I discussed somewhere, but I don't know where. There are two approaches with the ID. You either duplicate the key in the index and then every key, the value is just a single pointer. Or you have unique keys and then the value, what should it be? And the value needs to be a set of pointers because you might have multiple records with that key. And there are advantages. I think I have a slide after B3, so maybe we should discuss this next time. Okay, this is how a cluster versus unclustered looks like on a B3. Here is a B3 and underneath are the records. How, look how nicely they point. And if you need to do a range query, then it's obvious what you do. If it's unclustered, then they point all over the place. Now another important idea that is kind of hard to get your head around this, that the clustered index, I mean, let me backtrack. Every unclustered index is a completely separate file. You have the file of the data and then the unclustered index is a separate file. The clustered index is most often the same as the data file. It's not something else, it's inside the data file. Think about this, it's being together. And the leaves of the B3, they are the data records. Okay, same thing about hash tables. If instead of a B3 we have a hash table, then on the right is a primary index. So the records themselves, they are the entries in the index. On the left, we have a secondary index for an unclustered index. Here, we really need the pointers. So here we have the pointers into the records. And actually I do have here, I have two more slides and then we can take a break. I do have the discussion. Yeah, this is where the discussion comes. I thought it comes later. There are three variations, there are three choices for the value. Remember, every entry in the index is a pair, a key value pair. So we know what the key is. The key is the value on which we want to search in the index. But the value, there are three choices for it. And the book denotes its value with K star. The first choice is a record itself. This is when the index is clustered. It's only when the index is clustered. So you don't get a pointer, you get the record. The second choice is when you have the key and the value is a single record ID. Yes? Couldn't you also do a sparse key thing? So yes, like a large key and a good number of record IDs, you have a key, then another key, and then your record IDs in between are bounded by the two keys. Right, so what he mentioned here is a notion called sparse index. If the index is clustered, so it's a primary index or clustered, then you have the option of skipping some keys. And you can afford to keep in the index only the first key in every block. And the others you can retrieve because you need the block anyway. And that's called a sparse index. But the distinction here is actually interesting. We should discuss this a little bit. The other option is to have a key followed by a list of our IDs. And I'd like you to tell me what is, what are the pros and cons of these two choices? And I think I have a slide here, here they are. So if you have duplicate values of a key, then you can either have these duplicates in the index and then each of them goes with a single RID. Or you can consolidate them. You can have a single entry for each of them and then they need to have a list of pointers. So what are the pros and cons? So they're not major advantages or disadvantages, but I think it's useful to have this discussion. The first one, I'm looking for 10, I don't know how many are there. Right, and the first one, you don't know how many there are. And actually when we start looking into B3s, they are quite subtle. So if in addition you have to worry about duplicate keys, managing the B3s becomes, it's possible, but it's very brittle. You have to be very careful how you program it. What is the disadvantage with the second approach? We would love to have unique keys in the index. But then what happens here? The number of values are varying, so you don't have a... Exactly, then the entries in the index, they're variable lengths, which creates another set of nightmares. I need to worry about some of them being too big and others too small. Good, any questions? Because now it's a good time to take a break. Let's have a break under eight o'clock, five minutes, and then we'll discuss some initial discussion on B3s and then we do database tuning. Okay, five minutes break. Okay, so hello folks. B3s, so I'm not going to go too much into details, actually the details are not that hard, but I'm going to skip some today, but I do have a few slides just to discuss the basics. So B3 is a search tree. How many people know what a binary search tree is? Can I have a show of hands? Good, because it's actually prerequisite for this class. But we don't know what a binary search tree is. We do not use binary search trees to index tables and databases, why not? What's wrong with binary search trees, sorry? Too many blocks. Too many blocks, in what sense? You have to do a whole bunch of reads to get down to the bottom of the tree. Exactly, the depth of the tree is where it's logarithmic, which is good, but it's not good enough for disk accesses. I'd like to think about it this way. When you traverse a tree, a binary tree, you need to read one node at a time when you go down. Reading one node means go to disk, fetch the entire block, and you access this little node. That's not a good usage of the space on the block, because that block now might contain nodes that are unrelated to my search. The idea in a B3 is to make the node as big as possible to fill out the entire block. So whenever you read a block, you should get as much information from that block that's useful for your current search. That's one way to think about B3s. So these are the B3s. The node is now much bigger. Therefore, the fan out will be higher. They don't have two children. They have a large number of children. And the plus in the B3s makes them a little bit more messy, but they are more useful for databases. But this means that all the keys are stored on the leaves. There are keys on the internal nodes as well, but those keys are only used for navigation, for the search. They are not the actual payload. It means that in a B3, you always have to read all the way down to the leaves. And the reason is because often, the leaves are actually the data records in the cluster index that leaves are the data records. So we need to separate the internal part of the tree that helps us navigate from the leaves where the actual payload is, where the values or the data records are. Okay, so details. The tree is governed by a parameter, which I call here degree, as the book calls this the order of the B3. And the rule is that every node has a number of keys that is between D and 2D. That's the rule. And it has a number of children that is one larger than the number of keys. So here is a typical note. It may say, it may have three keys, 30, 120, 240. Remember, there are no values here. These are not the actual keys in the index. They are only used for navigation. And what this says is that if you're looking for example, for a key that's bigger than 30, but less than 120, then you follow this pointer. And if your key is bigger than 240, then you follow this pointer and so on. So that's how you think about a node. It has a fixed number of keys. And one more, the number of children that is one more, one larger. So that means there's a number of children, how big can the number of children be? Are you with me? If I can have between D and 2D keys, how many children do I have? Between D plus one and 2D plus one. Good. And the leaf is exactly the same except that now every single record is, I mean, now the pointers, they are either pointers to the actual record or they can be the record itself, which complicates a little bit of math here with a degree. Okay, so here is a typical B3. The degree here is two, which means that every node has between two and four keys. There is an exception though. The root is allowed to have fewer keys. The root is always allowed to have up to one key, no matter how big D is. And we'll see next time why, and you probably know why. So now, let's see how the B3 works. I want to find 40. How do I proceed? What do I do to find the key 40? Of course, we start from the root and we go which way? Left or right? Left, we go left. Now we need to find 40, what do we do? We go to the middle, right? Because 40 is between 20 and 60, so it's right in the middle. And here is 40. And if you look at this organization, it should be right, the pointer right before 40. Okay, so I have a little animation for this. We go down, because it's lesson 80, we go to the middle, and now we found 40. Good, now the discussion I would like to have before we go to tuning is about using the B3, yes. So finding key 40, what if 40 is not there? Yeah, so what happens if we look for 45, for example? Then we don't find it. Then this is something that needs to be handled by the query processor. If you say select the record where K is equal to 45, then during query processing, it will look up the index and it will return nothing, and then the query processor knows to return the empty set. Yeah, so if you use it in order to verify foreign key constraints, then the query processor needs to say, I can't insert this value. So it depends on how this is used. And talking about usage, let's see how we use it. So imagine that we have an index on the table people and attribute age, okay? So the query is this. It says, and let's suppose this index is a secondary index. It's an unclustered index. I don't want you to get distracted by thinking about the more subtle primary index. This is just a separate pointer, a separate tree whose leaves are pointing inside the table. How would you evaluate this query? If you are an optimized query processor, how can you evaluate this query here? It's a simple question, it's nothing. It's actually written here, right? You don't use that main table, but you start in the index, you start from the root, you search for 25. This brings you down to a leaf. The leaf has a pointer. The pointer points it to the record. Is it a single pointer or multiple pointers? It's multiple pointers. And why is that? Because age is not necessarily a key, right? It's not necessarily a key, right? Age is probably not a key. Okay, what about a range query? Can we use the index for this range query? Sure, how would you use it? Do we need to, yes? It says it's not a range for the integer, because everything less than three is a 20. Oh, it says 20 less than age, and age less than 30. Oh, yeah, right. Yes? It reverse down to 20, scan through the index, and say, okay. Right, reverse down to 20, and this gives you the first pointer or the first set of pointers, and then you continue scanning inside the index. And every new key, this is why the leaf of the index, I forgot to emphasize this, they are linked. So we can scan them without going back, which would be a detour. Is this how we find all the ages between 20 and 30? Yes? The advice is that you can't eliminate 90% of the keys, usually just the corks. If you can't do what? You can't eliminate 90% of your keys, you'll just do a scan. So let's discuss this in a little bit more detail. So suppose I have your Facebook, okay? And this is what a manager in Facebook, Facebook use that they don't look up people between 20 and 30. They don't have the interface to do this, but the manager, they would like to see the trend and they would like to see how many people are between 20 and 30. And the optimizer will not choose the index, why is that? Because it's half the people in Facebook, right? Everybody who is on Facebook is in this bracket range, you're probably a little bit older, might be exceptions. But it doesn't pay off, it's actually more efficient to reach the data sequentially. There is a rule of thumb that database systems people know, which is if you have to read more than 10% I think of the table according to him, then it is more advantageous to do a sequential scan than to use an unclustered or a secondary index. So secondary indexes are only good for point queries. Okay, good. But now let's get back to my main point of the discussion, which is this. We can create an index on a pair of attributes. Let me create an index on the name and zip code pair. What does it mean? What goes in the index? We no longer have ages, but we have what? We have pairs, name and, what does it mean? How are they ordered? If I just had names, then they would be ordered alphabetically. If I had zip code, they would be ordered according to the zip code. But now I create a pair, what does it mean? How are they ordered? By name and by zip code. By name and then by zip code. Very important to keep this in mind. So now let's examine the queries and tell me if the index can be used and how. This takes the first query. How would the index be used in order to answer this query? But it's kind of straightforward. We start with a pair, smith and one, two, three, four. Concatenate them. This gives you a single key for this name, zip combination. And now we search in the index for the combination, smith and one, two, three, four. Second one, can you use the index to retrieve people who are named smith? Yes. Forget about the performance issue, but can you use the index? Yes. How? You still go with me. Uh-huh. You pretend that the zip code is very, very small. That's how you pretend. And you find an entry. You start from the root of the tree. And it's like looking for a key. It's like looking for this. It's like looking for smith, it's minus infinity. With a zip code, that's as small as possible. You don't have to store there. You don't have to put minus infinity. You adjust a little bit the search. And this gives you the first entry, where anyone is called smith. And from there, what do you do? It's like a range scan. From there, you continue like a range. And you read some until it's no longer smith. And then you stop. OK, how do you evaluate this one? Yes? You have to use the table. You have to use a sequential scan. You can't, the database optimizer cannot use the index. Cannot use this index here to evaluate the query because it's on the second attribute. And they are not ordered by the name. They are not ordered by the zip code. Put it this way. If you have this alphabetical list of old people ordered by the name, the zip code 12345 occurs all over the place. There is no continuous region in the index that contains 12345. Very important concept. It's simple, but it's important to keep in mind. OK, so I'm going now to skip ahead. And we will come back to this discussion next time. But I want to go to database tuning. And let me actually start by briefly mentioning how indexes are used and created in Postgres. So here is an example. The thing I'm going to use is an example later on. We create a table V. It has three attributes, M. N and P. So suppose you want to create an index on N. That's what you say in Postgres. You create an index, you give it a name, any name you want. And then you say on VN. And V2 will be on the pair, PM. And what is VVV is on the pair MN. And you can create index as many as you want. It's interesting in Postgres, you don't specify that the index is clustered when you create it. But you create indexes. And at some point you say cluster V using V2. What would that mean? What do you think Postgres will do now? If we say cluster V using V2, yes? It's going to rearrange the data in the table according to the fields that are used to index V2. So P first. Exactly. Well, it can't be ordered based on M, I guess. But P it will. So it will order. That's the important step. It's going to order the entire table V according to these two attributes, to P and M. It takes a while. So I told you I'm just playing with Postgres, and I had to do a clustered index. And it takes 300,000 records. It took like half a minute to order them. If you have a huge table, I expect to take a long time here. OK, so that's a syntax. Yes? If the table V also has a primary key, then that primary key would have created an index by itself. Usually database systems, they do create an index automatically on the primary key. And if you don't tell it anything, then they will cluster that index. They will have a clustered index on the primary key. But you can overwrite this. You can create a secondary index on some other attribute. And you can say, please cluster on this one. I want to cluster on this. No, if M is the primary key in this case, can we have another index on the other? Oh, sure. So that's actually an important concept. So if you have M as a primary key, so yeah. So what exactly the example on the slide applies here. So what would it do? This creates an index on N, which is not a primary key. Some people would call it a secondary index because it's not on the primary key. Here is an index on PM, which means an index on this pair. Does it make sense? And finally, when we cluster, we cluster on the combination P and M, where the primary key is the second attribute. So there is no connection. I mean, there is no prescribed connection between the primary key and which indexes we create and which one we choose to be clustered. So let's discuss the letter a little bit in more detail. This is part of database tuning. This is what you need to do to make your database respond faster to the application. And we will discuss three techniques for database tuning, index selection, data partitioning, and denormalization. And let me start with index selection. But actually before that, I have a nice picture to give you a feeling of where database tuning sits. There are several schemas. And actually in the original concept of relational databases, people have considered from the very beginning three levels of a schema called external schema. This is what you expose to the application, think of it as consisting of views. There is a conceptual schema. That's what we design. This is the boyscott normal form or the set normal form. But on top of that, you might offer the views, which are your external schema. And then there is the physical schema. This is how the database administrator decides to actually store the data. And it might not be the result of the boyscott normal form. And also, this might include details about how the tables are stored, details about which indexes are created, and clustering. So tuning refers to this part, to designing the physical schema and changing the physical schema. So this is where we do tuning. It's this lever here. Where does the application talk? The application sees what? The external schema. Is it necessarily the external schema? Is it always an external schema? It can be directly the conceptual schema. Well, what we do, we change the physical schema. But if we change the schema, if we change the physical schema here, does it mean that we have to rewrite all the applications? No. And this actually has a name, the fact that we don't have to rewrite the applications. It's called, this is a data independent. This is physical data independent, or logical data independent, depending from which direction you look at. We can change the physical schema, but the conceptual schema stays unchanged. The conceptual schema is about the logic. It's about what the applications need. The physical schema is when we mess around and create indexes and partition tables and denormalize if we prefer to denormalize them. And there is a separation between the two. The database system will keep track of the mapping between the two, and will rewrite queries in terms of the physical schema. So here is the definition of the database tuning problem. When you tune the database, you don't look just at the data. You also look at the so-called workload. The workload are the set of queries that are being run continuously on the database. Think about them as being you take your job applications or your C-sharp applications, and you extract from these all the queries, all the queries and these applications that are issued to the database. And you get more information. You get information about how often they are issued, what their frequency. And you may also need to know the list of queries, and you know their frequencies. And you may also need to have some additional information about the performance goals. Some applications are really critical. And you would really like to speed them up. Others are not so critical. You don't care so much, even if you slow them down a little bit during your database tuning. And the database tuning problem says, given this information, the queries, their frequencies, and our performance goals, decide on a physical database design. Decide which indexes to create. Decide on the physical schema. But sometimes you actually have to go beyond that. You might have to tune a little bit the conceptual schema. This really means you don't modify the interface to the application. But this is not correct. Tuning the physical schema. You modify the green part here, this part here. But sometimes you have to go beyond that. You have to go inside the applications and do some hacks here to speed them up. At the minimum, you should declare read-only transactions. You should declare them read-only. And this by itself might speed up some applications. So I want to discuss the various database tuning tasks in some detail. And I'm going to start with the index selection problem. It was a favorite and continues to be a favorite database research topic. And what the problem says here is it just focuses on which indexes to choose. You know the database schema, the tables and the attributes. And you know your workload, which means queries plus frequencies. And the goal is to choose a set of indexes that optimizes this workload. So here is a trick. Let me see how to ease you into this problem. So imagine that you have a database. And it has maybe like six relations, like the IMDB movie database. And you look at the workload. And you say, wait a minute. Why should I bother about the workload? Let me create an index on every single attribute of every single table. And you know what? Just to be sure, let's also create an index on every pair of attributes of every single table. So you also have indexes on pairs, just in case. What's wrong with that picture? Will your application run faster once you create all these gazillions of indexes? Long as you don't insert anything ever. That's where it hurts you. That's when you realize that indexes are great to answer queries, but they actually hurt you. When do they hurt you? Whenever you update the database. When you insert a record, the index doesn't help you at all. On the contrary, it takes more time to insert the record, because now you have to insert the new values in the index. Even if you update a record, you update the account balance. If there is an index on the balances, that index needs to be updated now, means inserting and deleting in a B3. So that's a trade-off. Whenever you create an index, you help all the queries that could use that index for searching. But you hurt all the applications that insert or delete from that table. Good. So I have here some cute problems for you. Let me skip this. Very simple. Problem number one. You have 100,000 queries like the one on the left, and 100 queries like the one on the right. And our table is this V of MNP. Which indexes would you create? I'm taking a bit here. Which indexes would you create for this workload? So we create one on N. Should we do one on P2? Why not? We don't seem to have inserts. So yeah, clearly, this is going to be much less useful. And now, if you have to choose between clustered and unclustered, what would you choose? Clustered on N, of course. So let's see what they are right here. Ah, I forgot to ask you the question. Should they be B3s or hash tables? Now, we didn't discuss them in details. There is one thing that the hash tables cannot do that the B3s can do, range queries. The hash table index is like a B3 index, but instead of being organized as a tree, it uses a hash function. And we will discuss the next slide. But the principle is very simple. It applies a hash function. And it tells it where to go and look up the record. But the hash function scrambles values. That's exactly, by definition, what it does. It moves them around in a domain. There is no order preserving hash function. The order is lost. So whenever you have to do a range query, the hash index will not be useful. Now, for this problem, either a hash table or a B3 would be just fine. Problem number two. Ah, so this gets more interesting, right? So we have many 100,000 queries like this, 100 queries like that. And now we have inserts. Now, of course, you see it's hard to decide which one. But give me your best guess. Create an index on n. What about P? Ignore it. You create an index on P to help these 100 queries. And you heard all the 100,000 insert queries. OK. Now, V of n, would you make it a B3 or a hash table? Of course, a B3. Because we need a range query here. Clustered or unclustered? Once it's a range query, the danger that this reads more than 10% of the records on disk is very high. So I would make it clustered. If there is this choice. If you have a cluster index, it's not going to make the inserts cost more. It's a good question. There might be some difference. It might be slightly more expensive to insert in a clustered index. If it's unclustered, it's tapped on the ND or grad table. Right. So inserting is a main table. It's cheaper with an unclustered index than with a clustered index. So yeah, so there is an argument against clustered. Because inserts might be slightly cheaper. You still have to insert in the index itself. If it's a B3, then no. It's complicated. You have to do the insertion. But it's still cheaper to insert in an unclustered index than in a clustered index. Yes, sir? We decided, you mentioned, not to index the middle one because there's 100 queries there and would slow down 100,000 inserts, is that right? Yeah. If there's only 100 of those, so how much is that going to impact any inserts? OK, so let's discuss this in detail. It's an interesting question. So suppose I create an index on V dot B, right? And I have these 100 queries here. But there are just 100 queries. Yes? Sub 50 index for every single insert. That's a problem. Whenever there is an insert, there is a value here for P. So this is going to insert M and then P. 100,000 times, this P needs to get inside the index. When we do the insertion, we don't know that it's going to be used so rarely. We have to respect the index and insert the value there. If you took away the P out of the three values that you were inserting, though, and you still have a non-clustered index on P, would that still be just as expensive because you'd have to do some shuffling of the record you're pointing to? So I want to have this discussion in detail. So you're saying, remove this. Yeah. And now the question is, what happens to this index on P? What happens? Let's discuss this. What happens actually with the insert? I have a record with three fields. And I'm inserting only two fields. What happens to the P value? It will insert the default. It will insert the default. What is the default of all the ports? It's null. It's null. So if that's the case, then we are lucky. Then, indeed, dropping P from the insert will not affect the P index. Nulls are not inserted in the index. But if you have another default value, then, well, that needs to be inserted every time in the index. So let me see. What did I say here? I said, unsure. Why was I unsure? It's not unsure. It's obviously not. You don't want to create the index on P. But what you can do here, one thing you can be, no, let me take it back. This is what you need. OK, problem's free. Now we have three expensive queries. What can you do here? Index on a pair of P and N. Uh-huh. Be smart. Use an index on a pair, namely P and N. Well, maybe N and P because we're very good. Oh, oh, sorry. This is completely wrong. Index on N and P. Now you might argue that maybe the P component of the index is not that helpful here because it's an open interval. But that's the main idea behind this example. Maybe it's not the best example. But the idea is that sometimes you get more and more value by creating an index with over two attributes. Yes? We have two pairs of index A, B, and A, C. Uh-huh. And we get it very for A. Very good question. So you have an index. We suppose you have an index on M, N. And an index on M, P. And now your query is a selection on just M. Which index will it pick? There is no obvious choice. What database systems do they actually look at the cost of using that index? They might know, for example, that in the first index, there is only one matching N. While in the second index, there might be many matching P's. And then the cost differs. It's much more expensive to access the second index. So that's how they make the, this is how they choose which index to use, if to use an indexable. Yes? At this point, you probably just want to see what the actual statistics are in PR, right? What the actual what? Statistics. Right. That can determine 99% that more P predicts it. Are you referring to her example? Oh. Yeah, so definitely if you want to look at statistics. And it's a great example that I forgot to run the setup. If you go to SQL server to the iProach server, and you run this query, which I need to, how can I get there? Let me see. Nope. Here it is. Run this query. Select star from movie. It's one of my favorites. Where year is equal to 1905. Now movie, just to refresh your memory, looks like this. There is the key ID. There is a title. And there is year. And yeah, we do have an index on year. So what will SQL server do if this is the query that you're issuing? The obvious thing. It will use the index, looks up 1905, and gives you the movies made in 1905, more than one. There are a few movies made in 1905. By the way, there is a join operation. We'll do this demo live when we discuss query execution. But now here is the interesting thing. Replaces, and try this, replaces with 1995. I hope we can do this in class. But you can try it on your own. What will happen now? No, it's not the same. You need to look at the query plan. Do you know how to see the query plan in SQL server? You click on show. One of the buttons at the top shows the pan. Yes? I think that it won't use the index because there are so many more movies. Exactly. It will not use the index. And the way it can determine the distinction is because it has statistics for every value of year. It's exactly what he said, what your colleague said. It has a statistics for every year value. How many movies there are? And it doesn't use rule of thumb. It actually does a hard calculation. How much will it cost if I use the index on 1995? Well, there are like 30,000 movies. I have to read each of them using the index too expensive. It's cheaper to scan the entire table. So that's a difference. And the other comment I had here is that it gets a little bit esoteric. But exactly this issue, which index to use if you only need n equals 5. Exactly the same issue arises with statistics. Database systems that have multi-dimensional histograms, that have statistics on two attributes, sequence ever does not have multi-dimensional histograms as far as I know. But DB2 has them. Once you have multi-dimensional histograms, you can estimate the size of the result in multiple ways and to get different answers. It's a very interesting problem. How do you combine those information to get a consistent answer? This is one of the reasons that sequence ever doesn't want to support multi-dimensional histograms. Yes? These are statistics reasons that build like this. When are the statistics built? Is it like as the various comments? No, statistics are always built offline. When will you update the security? No. It's very important to keep this in mind. We'll have this discussion towards the end of the quarter. Then for Swiss data. Statistics are histograms. How many people here know what histograms are? OK, so what this means is that you fix an attribute like for an index, you fix the attribute here, and then the database system decides, or maybe the administrator gives a hint, that it should construct a histogram on here. What this means is that for every value in here, or maybe for some ranges, it will count how many tuppers there are that have this particular value. These statistics, they are critical data structures. They are used intensively during query optimization. They must be very small. And especially since they are small, they are not like indexes. If you were to update the statistics, you need to log the whole thing. You need to log the entire histogram. Database systems don't do this. They allow the statistics to get out of sync with the main database. And they refresh the statistics at certain time intervals. It's much more important not to hurt the transactions by asking them to log on the statistics in order to set them up to date than it is to have the most accurate statistics. OK, it's complicated. We'll get back to this when we discuss the selectivity estimation. Right now, we select indexes. So my last problem, which index would you select here? I think that's an easy one, right? I don't know why I have this one. It's obvious. What do we do? Create an index on P, a B3, of course. And what do you do with N? No insert. So create an index on N2, also B3. Which one should be clustered? The first one. Yeah, we clustered the first one. So what did I see? What did I see here? Yeah, it's correct. B is clustered. Good. So this is the index selection problem. As you see, it's kind of a match of black art. There is no algorithm for selecting indexes. Actually, database systems, now they do have algorithms. This is one of the success stories of research in database systems. The first index selection system was developed at Microsoft Research by Thurajit Choudhury in the late 90s. As a research project, it was called the Autoadmin. They started by developing algorithms for automatically selecting an optimal set of indexes. And since then, most vendors have adopted this. I mean, it became part of SQL Server. It's called the wizard. What was it called? Auto-tune wizard? Index tuning wizard. Index tuning wizard. And today, all the vendors, they have some kind of a tuning wizard. I don't know where I got this one from, but tuning wizards also exist. But I can't point you to one. I don't know how I got this information on the slide. Don't use a tuning wizard for homework 5. Do this manually. And your task in homework 5 is to speed up a query workload 20 times. Make it 20 times faster by choosing indexes. OK, so how do indexes or index selections work, actually? Essentially, it's brute force. They take the query workload, and they know how often the frequency of each query. And they just try all combinations of possible indexes on the tables that you can imagine. They try to be smart about this. If a combination of indexes turned out to be bad, they don't try to add more. So they try to restrict the search space, but otherwise it's a search problem. The really cool idea that they had was to answer the question, how do you know that a combination of indexes is a good choice? Suppose you create an index on the year attribute and another index on the age attribute, and you hope that this will speed up queries, this query, this query, and that query. How can you determine this? Well, there is a complicated theory of estimating the sizes and optimization. But if you use that, then the danger is that you might be correct that the index is useful, but the database optimizer won't pick it because the database optimizer was written by someone else. So the solution they came up with was to modify the API of the optimizer to allow this question to be answered by the optimizer itself. So the new API that they had to implement on top of the existing optimizer was, assume I have these indexes. Here is a query. Tell me how you're going to optimize it. And based on these answers, they were able to make the right choices to determine whether a certain combination of indexes is beneficial or not. That was the major achievement, not so much the actual search algorithm, which is essentially a general purpose search algorithm, but this engineering solution to a key problem. So more things about indexes. Of course, you will try multi-attribute keys. Now you get a sense of when they pay off. They pay off when you have where clauses where both attributes are being used. So there is something else interesting. Some queries can be answered entirely using the index. And this is a little bit amazing. Let me write this query here. Suppose I say this, select distinct year from movie. There are at least two files. There is a file with all the years, which is a B3. And there is a file with all the movies, the main file, the main data file. Advanced query optimizers, they will only use the index. You don't need the main file to answer this query. You only need the years. When this is possible, then the index is called the covering index. Sometimes you can go even a step further. You can join two indexes and not touch the main table at all. And then they are called covering indexes. And if you really want to be aggressive about database tuning and about index selection, you need to keep these cases in mind, these possibilities. That sometimes you create an index, which is so powerful that an entire query can be answered by using only the index. It's called the covering index. To cluster or not, well, you know how to think about this. There is no obvious answer. One thing about clustering is you may create 10 indexes on the table. But how many cluster indexes can you create? Just one, right? You only have one shot to cluster. That's a problem. If we could cluster on all the attributes, that would be ideal, but we can't. And here is a nice picture. I think I got it from the book. This tells you the performance of a clustered index versus an unclustered index versus a sequential scan. I think it's worthwhile that we spend some time on this. Look at this query here. Select from R all the keys between a certain range. Depending on this range, we might end up selecting 0% of the table or 100% of the table. So imagine this axis represents a fraction of the table that the query returns. Here is the performance of the clustered index. If the range is small, then the clustered index will cost nothing. But then it increases linearly because you need to fetch more and more of the table. At the end, you need to read the entire table. If you were to use a sequential scan, then it doesn't matter how many tuples you return. You have to read the whole thing. So independent on how many tuples you return is the same cost. But if your index is unclustered, then initially it's very cheap. But then the price goes up dramatically because the unclustered index forces you to read the records in random order. And at some point, and again, there is a rule of thumb, and I think that's 10%, the rule of thumb says around 10% or so, it's better to do a sequential scan than to read through the index. Nice picture, and it's from the book. Good. Some thoughts about B3s versus hash tables. And I have a simple rule for you. Always use a B3. Why would you ever consider a hash table? Why would one ever consider a hash table? There is a tiny performance benefit from a hash table. It can be marginally faster. So if you really have some absolutely critical query that has an exact selection on an attribute, you might consider a hash table on that attribute. And the second case we shouldn't read now, it says the same thing, that if you ever have an exact selection on that attribute, that's really critical than you use a hash table. That's something to keep in mind. Every index helps queries, and it hurts updates. So that's a balance. This is why index selection is so difficult. No magic bullet. This is why we have auto-admin and all these index selection wizards. Tools again, I think we said this already. Good. So far, we discussed only one aspect of database tuning, which is index selection, which is kind of the most obvious and most important one. I want to walk through two more. One is denormalization. And the other is horizontal vertical partitioning. We discussed already horizontal vertical partitioning. I only want to refresh your memory. But let me start by discussing denormalization. It's a very simple idea. Here is a nice database schema, normalized in voice code normal form. Every product has a key and has a foreign key into the company ID. And here is a typical query that you should expect from such over such a normalized table. What does it do? It finds all products manufactured by a company in city blah. So it expects the city, and it finds all the products, and it also expects the price. It finds all products below that price that are manufactured by the company in city blah. It turns out this query is really critical for you. The application that uses this query, this is like the cash cow of your company. It has to be very, very fast. What will you do to it? How can you speed it up? With the city's products. Move the city in products, right? Clearly, of course, the indexes that's given. You can move the city in product, and that will speed it up. But you can go all the way through. You can compute the join. Why not pre-compute the join between product and company? Which means that the city also goes into that table. But you also have the company name, if you need it. This is called denormalization. It goes exactly against what we learned when we normalized. If this is the table you start with, why is this not a normal form? How would you explain quickly that this table is not a normal form? You should not design such a logical schema. There is a functional dependency that is not coming from a key. The key is right here. It's a product key, but there is a functional dependency which is not from the key. It's actually not very clear. But suppose the company name is unique. 100%. Yes. Then the company name will dominate the city. Why did I use? Because I eliminated CID. This is fine. Yeah, I didn't include CID. I should have included CID to make it clear that it's not a normal form. However, for the physical schema, it's OK. The logical schema will see the same two tables. But the physical schema will see the denormalized table. Now what I wrote here actually is not what I'm preaching. Here I said we should, no, let me see. I think I said this on the next slide. This is the wrong way to do it. You don't want to ask the application writer to go back and modify the query. No, don't do this. You could do it. You could modify the query. And then everything works fine. But you shouldn't do this. What could you do instead? What's a better design? What's a better way? What is actually not a better way? What is the standard way to modify the physical schema without affecting the logical schema? The views. Great views. So which one will be the table that's being stored? The create table. And which one is the view? The denormalized table that's stored. The denormalized table will be stored as the individual table will be used. Exactly. This is what we store. And these will become the views. Couldn't you also stick the denormalized table in the reporting database and then save the transactional one alone? Ah, that's another possibility. If you have this option, then that's what you should do. Then you make a copy of the denormalized table into a separate database, which is your reporting database where you have to run these critical queries if that's an option. This scenario, I want to consider here just to illustrate the view concept, is that this is our only database. And then you have to define views. So some discussion. Yup, it's no longer in BCNF. And because there is this function dependency, but unfortunately, I forgot to include CID to have the function dependency. And all the bad things happen. All the update anomalies now happen. You want to modify a company. You have to modify it in all the products where the company occurs. So yup, you can do this. You need to compensate for this. But let's see a little bit views and discuss, again, query optimization. OK, so here I prescribe in gory details how to get there. So you create this new table. You insert into it. Then you drop the old tables. And you create views that are substituted for the old tables. And now you run the query. And what happens? So let me see, does this actually happen here? Yeah, so now I did include CID. I forgot about CID initially. So we run this query. What do we get? Will this query now run much faster? Tell me what happens behind the scenes. We will query the denormalized. We will query the denormalized table. How exactly? Look at the product, what will we do with product? We substitute it with, well, let's see what product is. Product is this thing. So it's going to substitute product with select something from product company. What's going to do with company? It's going to substitute it with select something from product company. So we were hoping to avoid the join. It's a join that is what we wanted to avoid. Did we avoid the join? It's back in, because now we are joining the same table, product company with itself, right here. That's the question. If I think about it, how would the optimizer know? It sees that this goes seamless. Nothing tells it that one product, oh, it can. Going by a unique key and the same table. Well, I know for sure that Postgres doesn't optimize it, but now I'm confused to explain why it should be optimized. Clearly, we want CID to be a foreign key, right? But it can't be a foreign key because it's not a key. So I think we have a problem with this. I know that Postgres does not optimize it. And I said that SQL Server does a better job. This is something that we need to check for next time. I did spend a lot of time on this query, but this was a couple of years ago. And now I can't explain how it actually can do the optimization. We would have to revisit this. So let's do this for next time. Can you try to see if this query gets optimized? And what you have to do to get the optimizer to optimize it? You have access to the SQL Server. You're going to play with database tuning anyway, but you're going to do this in Postgres. But it's much harder. The optimizer in Postgres is much weaker. You don't have to do this particular query. We reduce the only tuning you need to do is direct indexes. But it's fun to play with this query as well and see if the optimizer can handle it. Good. And finally, the last two physical tuning tasks that I wanted to mention, actually to review because we already discussed them, are horizontal partitioning and vertical partitioning. What is horizontal partitioning? I have a table. What does it mean when we say we are partitioning this table horizontally? One table and n plus 1 to x rows are another table. And rows are in one table. And n plus 1 to the end, rows are in another table. That's exactly what it means. It's like partitioning the table horizontally by rows. Here is one example. Suppose we have a table product. And we can partition it by putting all the cheap products in one table and all the expensive products in a different table. That's one horizontal partitioning. So here they are. We insert into cheap products or the cheap products into expensive products or the expensive products. We drop products. And then we need to recreate the product as a union of the cheap and the expensive products. And you remember the discussion that we had in use. Now, if you query this, what happens? If you look for a product that has price 2, what would a good optimizer do? Which tables will it touch? No, it will touch both of them. Because if you look at this definition, there is nothing in this definition that tells you that the cheap products are cheap and that the expensive products are expensive, except the English description of the table name. So big problem. When we do physical tuning, of course, we want to get the best performance. We must convince the optimizer to do the right thing. And you can do this in two ways. I think we actually tried out two years ago when we designed this. One is, when you create the view, this is what I showed you during the lecture about views. You add this useless predicate. It's useless, but it is a hint to the optimizer that everything that comes from cheap products is indeed cheap. And everything that comes from expensive products is indeed expensive. SQL Server does now the right optimization. It figures out that it only needs to touch one of the two tables. Postgres doesn't do it. I was never able to convince Postgres to optimize this query. I showed you the limits of today's optimizers. Pretty sure DB2 will also do a good job. And I doubt Oracle will do a good job. Yes? But doesn't Oracle allow you to do this just by simply saying, here's a table, partition it by this predicate and leave it as a base not having to mess around with the views at all? Sure. So I should mention this. So partitioning is normally you would not do this in the way I'm describing it. I'm trying to show you here the concepts behind partitioning. But commercial database systems, they have special commands for partitioning the tables. And then they know how to handle the partitioning much better. But for our discussion, I assume that you have a database system that only understands views. And you should be able to do your physical tuning by just exploiting views and a good optimizer. And it's not always easy to get a good optimizer. Here is a second way in which you can do it. You can specify when you create a table, you can do a check. You can essentially communicate to the database system that all the prices in the cheap product are cheap and all the prices in the expensive product are expensive. And it turns out that if you do, I don't know what set up in Postgres and you could get this to work. You can even get this to work in Postgres. OK. So remember that updates through views are not always possible. For example, now if you want to insert into product, the system wouldn't know whether to send it to a cheap product or to an expensive product. And in order to solve this, you probably need to define a trigger. By the way, on this homework, on homework 5, you also need to define some triggers in Postgres. The documentation of Postgres actually starts from the book. Chapter, I forgot, I think it's 5. I think chapter 5 in the book describes triggers and you can use these triggers in Postgres. OK. So let me skip the rules in Postgres. And I have just a few slides on vertical partitioning. Vertical partitioning is like this, the partition based on attributes. Same thing, database system, they have special commands for vertical partitioning. But the game we play here is to do this through views. And how do you reconstruct the product? Well, you have the two partitions and you join. And now, if you have a query that only touches attributes from one of the two partitions, then normally an optimizer should be able to optimizes. And here I have only one correction to do it. Where is it? One of the slides refers to a project, which you don't have to, which doesn't apply here. I think your figure is out. So SQL Server does the right thing. It will minimize the query by exploiting keys and foreign key dependencies. And I was never able to convince Postgres to do the same thing. I don't know what happens to, I think a part of the slide might be missing. OK. So you can ignore this. Other stuff you will find on the slides, which we don't discuss in class, is I have about 10 slides on security in SQL. One of the questions of the homework asks you to create permissions for a certain database. Feel free to use these slides or the corresponding chapter in the book, or you just read the documentation. And that's it. That's all I wanted to cover today. Next time I want to discuss B3s and hash tables and then to move on to query execution. Any questions or other comments? Good. Then have a good week. I'll see you next Monday. Wednesday. Auto-cleaning. And I'm going to do that. Harmony. Yeah. So when we are at SQL, I'm going to do a lot of things. I'm going to do a lot of things. Instead of doing the same thing all the time. I was going to look at a cross-degrees. Oh, yeah. I'm going to do the TLC. It's not going to be in it. That's pretty neat. That is pretty neat.