 Welcome to a deep dive into Postgres QL indexing. I'm joined by Ibar Ahmed, Senior Software Architect at Percona, who will discuss how to use PGSTAT statement to find opportunities for adding indexes to your database. When to add an index, when not to add an index, and strategy around Postgres indexing. My name is Lindsay Hooper, and I'm one of the Postgres conference organizers, and I'll be your moderator for this webinar. Let me tell you a little about our speaker before we get started. Prior to coming to open source development, Ibar had a vast experience in software design and development with a focus on the system level embedded development. After joining EnterpriseDB in 2006, he started his current open source development specifically in Postgres, since then he's contributed to the Postgres community as well as other open source communities. In the database field, he has experienced in well-known databases such as MySQL, Oracle, and NoSQL. Ibar's experience is not limited to core databases, but rather with the tools related to databases such as Hive, HBase, and Spark. He's also worked on integrating these tools with Postgres. One last thing is that Ibar has given more than 15 Postgres talks in the last year alone, and has also authored multiple books on Postgres. So welcome Ibar. Take it away. Hello, everybody. My name is Ibra Rahmat, and I am working as a senior database architect at Percona. And I have 16 years of Postgres experience. So today we will study and discuss about the indexes, specifically Postgres indexes. So we will start learning about the indexes, what are the indexes, how to use them, and we will discuss how we can create indexes in Postgres SQL and how we can use that indexes. So let's get started. And so before going to the indexes, we have to study what is index and what is he. Because these are two terms, we will use in the rest of this webinar. And these terms are also used in Postgres SQL. So when you can see this, this is a book. So when we are thinking about a book, then we are calling it a he or table. So if you think that this book is a table or he and you are studying the book. So when somebody asks you to search some word from this book, so what will you do? You will start reading the book from first page to last page and note it down where you find a specific word or specific topic in the book. So this is called a table or a he. And you know, at the end of age book, you can see there is an index. Which contains a word and the page number. At the end, there is an index. A word and the page number of that word. So if somebody asks you to search this word in the book, so if you are going to search in the he, you will start reading the book from first page to last page. But if you have an index at the end of the book, you will go at the end of the index, you will search for that word and then you know the page number of that word in the book and you go to that page and you can find that page. That's an easy thing, that's called index. In index, you are storing the actual, your search query and the pointer to the he or the table. And what is the he, which is a table, which is the actual book. So you have two parts, one is a book, a he and one is the index, which is end of the book. So just keep in mind, book contains the whole contents, index contains the information you need and the pointer of that information you have, which is pointing into the book. So just keep in mind these two things. I will explain to you how you will relate that with the poster square, okay? So first the he, the book, the table, we'll discuss. So keep in mind, always keep in mind when I'm talking about the book or table or he, it's similar kind of a thing. So you have to imagine the same concept. Rose tuples, normally in database fields, we are calling them a rose, but in Postgres and some database, we call them in a tuples store in a table, which is a rows are stored on table and in Postgres SQL specifically, maybe some other database also have that architecture, but here we are studying the Postgres SQL. When we're creating a table in a Postgres SQL, Postgres SQL create a separate file for each table. So when you are creating a table in a Postgres SQL, Postgres create a separate file for that. What is the file? Postgres SQL stores all the data, user data, into that file. So how we will see what is the file underneath the table we have just created. So let's create a table here. I have created a table, which is called a foo, which contains two columns, id and name. So create table foo, id, the integer type, the name, the text data type, it will create a table foo. So after that, we will run some catalogue queries on catalogue table PG class. PG class is a catalogue table of catalogue table or somebody called that a system table, which contains our table information. So we are curing that table to get the actual file name of that table foo. So query is select well file node. That mean well file node mean the name of that file. From PG class, where will name light foo? So we are selecting the file name from the system table, where table name is equal to foo. So it will give you the name of the file. So verification, you can verify easily. You can verify easily that the physical file on the disk can be seen in the PostgresQL PG data directory. So just go to the PG data directory. I'm not going to PG data directory. I'm just LS minus LRT, just select curing directory where we can see the file LS minus LRT, PG data directory. Actually PG data directory is a folder where all the PostgresQL cluster is saved. So you can see the file exists there. And you can see the size of that file is zero. If you can see the size of the file is zero by because we just created that file. We just created that table. So it doesn't have any rows in that. So file size is zero. So that's mean when you have created a table, PostgresQL will create a file for you, a zero by file for you. And then you insert something in that, you can see the data name. One more thing you have to keep in mind that tuples stored in a table does not have any order. What does that mean? When you're inserting some rows into table, don't expect that these rows will be selected in an any order unless you specify the order by. If you have not specified order by, sometime you will get different rows ordered. Sometime you will get different rows ordered. So why? Because these tuples are not stored in a table in ordered way. I will explain that later that why it happens. Just for a tip here that when you are deleting a rows and inserting another rows then that space occupied is free used by the next insert. So it's a random values you are getting. So don't expect any order when you are inserting into the table. Okay. So some more about the heap. So I have explained whenever I told you the table, the heap, you have to imagine the book in your mind. You have to imagine book in your mind. So when somebody asked you to say the information from the book, from the table, what will you do? I told you that you will read whole book to find some information from that book because you don't have any way to randomly select anything from the book. So similarly, you don't have any way in a table to randomly select from the table. So what you will do, you will select whole table and you will do a sequential scan, right? You will read first row, second row, third row, fourth row, fifth row, and until the last row to find some specific information. For example, select here I'm selecting some data from table. So explain, select name from bar, okay? You can see if you see sequential scan. Actually, make sure I believe most of you know that what is explained. I explained, I showed you the explain query, the query plan information. So here I'm just explaining a query, select name from bar. That means I am telling that read whole bar table, the whole bar table, it contains the name, select the name from whole bar table. So that 100% makes sense that it will use a sequential scan because I am saying read whole book. So I'm saying read whole book. In that case, I'm reading the whole book. That makes sense. But in the second query if you see, explain name from bar where ID is equal to five, four, three, two. Now, this query is asking the selecting name from the bar table where ID is five, four, three, two. It's not asking to read the whole book. It is not asking to selecting everything from bar table. It's asking just read the name from bar table where ID is equal to five, four, three, two. So this query is asking some specific information. So if you see again, we have a sequential scan here. Just forget about the parallel. That's a new feature. Learn about the parallel scan. So you have to wait for some other webinars where I could explain the parallel queries and everything. But here it is a sequential scan. But why? In previous query, I'm asked the query is selecting the name, all the names from the bar. It makes sense to sequential scan the whole bar table. But in second query where ID is equal to five, four, three, it doesn't make sense to fully scan that table because we need a specific information. There is no need to read the whole table. So some more graphical representation of what I'm saying with an example. So maybe it will be more clear for you understand what is the problem. Here I have created a table full with an ID and name column. I have inserted two rows with an ID Alex, ID one and name Alex, ID two and name Bob. And I'm selecting CT ID comma static from full. One pointer here, CT ID is a hidden column in PostgreSQL, which contain the exact one of each row location. And it has two parts. The first part, which is here is zero is a block number and one is an offset. Block number and offset, zero mean block number comma, one mean one offset. Zero mean block number comma two offset. That mean Alex is zero comma one, Bob zero comma two. And ID will have that information. How to select the data from the heap? Need to scan each and every page and look for the double in the page. So when we are reading, so in the right side, you can see there are pages, PostgreSQL stores data into the file we have already discussed. There is a file and in that file, there is the 8K of pages, which is configurable at compile time. So we are not discussing that, but 8K of pages and each page contains intervals. So on the right side, you can see page number zero, page number one, page number two, and page number n. So in each page, you can see triple one to triple n. So it doesn't matter how many tuples can a page contains, it depends on the size of the tuples. It can be one, it can be 10 and any. So how to select the data from heap? So we need to scan the pages. We have to go to each pages that the value like, like a query, if I select the query that where ID is greater than two. So we have to scan each and every page, like first row, first tuple, second tuple, third tuple, that we have to rate where ID is greater than two. We have to look at that where ID is greater than two or ID is greater than less one. So we have to scan that. Like if you see the first Alex, I think you can see the color maybe it's very small, but you can see the 01 Alex is tuple one, sorry. And 02 Bob is tuple two. And what is the problem here when you are selecting all the data from the cost because you are paying the cost to sequentially scan the pages to look for the specific value of the pages. Because it is stored in an unordered way. It's not ordered way, it's unordered way, the values are stored. So how you are unable to, not unable, sorry. It's most costly to select the values like that. So what's the solution? This is the solution. When data is stored in a heap, in a book, in a table, then when you are selecting some specifically data, specific data within that table heap, then it's costly to select because you have to sequentially scan that book table heap. But the solution is the indexes. You have to create index to avoid that extra cost. of sequential scan. Let's see how we can do that. Why index? Index are entry point for the table. Index used to replace the tuple in the table. So if you remember in the first slide, I have shown you a book and I have also discussed that each book has an index at the end of the book. Where the word is written and after the word there is a page number. So the word is the actual value, actual information you need and next the page number is the pointer. The page number is pointed to the book. So here the same. Index used to locate exact tuple in the page, in the table. So it's an entry point. The sole reason I have also discussed that previously in previous slides, that's the cost is the major factor while selecting from the heap. So index is used for the performance. Index and performance are select performance mostly. The problem with the index is index. Similarly, when you have created a table, it will create a file for you and make sure that when you are creating an index, it would also create a file for that. Similarly, when you have created table, it will create it, create a file underneath. Similarly, when you create an index, it will also create a file on the disk. So that's mean more storage required. So if you inserting data into the table and you have an index on that, so you require more space there. So if you have a lot of indexes on a table, so you require a lot of space for that. Even though your table is the same. So here I have created an index. So it is simple explaining, explains select name from bar where ID is equal to 5432. Remember the previous slide, the similar sequential scan happens. That's mean we are reading the whole bar table and collecting the names and checking that that name contains the ID 5432 or not and rejecting the tuples one by one, one by one, one by one. So we need first tuple and check its ID is 5432. If not, drop that tuple. Next tuple, next tuple, next tuple. So whenever we hit ID is equal to 5432, we will put that into the queue and the next tuple. So we will scan the whole bar table until there is no rows left. And after that we project all the rows where ID is equal to 5432. So here in this query 3, 8, 2, 1, 6 rows have been selected and sequential scan, you can see clearly sequential scan happen and you see the cost of that query. In the red cost of the query. Now I'm creating an index on ID. Create an index bar IDS, like create index in a syntax. Bar in the IDS is the name of that index on table bar column ID. So now I'm creating an index, create index bar underscore IDS. Own bar ID, where bar is a table ID is a column, bar underscore IDS is an index name. So now run the same query again. Explain, select name from bar where ID is equal to 5432. Now you can see bitmap scan has been used. If you see the bitmap scan has been used and the bar IDS is used. Own bar underscore IDS, forgot about what is bitmap index scan. Just see an index we have created, it's used. Previously there was no index, now we have an index. And now you can see the cost. 64313. Previously what was the cost? 159235. Now 65313. So almost 40% improvement. Just I created a single statement give you 40% performance. So previously what was we doing? We were selecting each and every row and filter out every row which is ID is equal to 05432. Now we have created an index storing the IDs separately from the table and now selecting that value the index is used and we are getting 40% of performance. So now we back to the normal that we have proved that index is a performance gainer. When you are creating an index you will get a performance, a minor. Minor problem is that it require more space because it's storing some information. So what is the standard way to create an index? This is photo date www. Postgresquare.org docs.sqlp.index.html. So where you will get the each and every syntax of creating a index. Here we will discuss some basics and some tips and tricks. How to manipulate the index. So here I have created an index IDXB3 on bar ID and selecting the real file node. Maybe you remember or not that when we have created a table we ran the same query on the bar table. Now we are creating an index and we are running the same query on the Kettleup table. Kettleup table PG class with like is IDXB3 and there is a physical file existing. That's proof that when you are creating a table it creates a physical file for you and you can select from PG class. And then you are creating an index it will also create the file on the days and you can select that using the same query. It's no difference in table and index when you are looking at the file created by the system. So for the proof you can go to the PG data directory and look at that files 16425 and you can see the file exists. And I have some data on that index so you can see the size of that file. The file has some size. One minor need is here just for your quiz for not for the quiz it's not a class for you. So it's just for your information that it's not one file for one table or one file one index. Actually when you're creating a table it is one file for one table till it hits a one GB limit. That means when you're creating a table it will create a one file for you and when you are in start inserting data and when it hits the one GB limit it will create another file with dot one. Like if you have a file name 16425 and when this file hits one GB the next data will go to 16425.1 and that means it's one GB, two GB, three GB, four GB so it's one GB of files. So just point it for you that don't confuse that we create only one file. So some more index creation. So I have created an index here, bar index on bar ID and you can see I have created index on single column. So previously I just created the index and I have shown you that how it has been used. Now we are going for some complex indexes. This one, the simplest one where you are using a single column by creating our index. So bar is a row table and ID is a column and you have created a bar underscore IDX index on table bar column ID. So you can see bar underscore IDX is used. That's it. Now, most people have some question that when they are creating an index different kind of an index, it blocks the table and you cannot insert and update the table. That's the major problem because if you have some 100 GBs of table, the table size is 100 GB, 200 GB and you want to create the index on that. It may be if you have, it depends on the system that it will take one hours or two hours to create maybe four hours to create an index on that table. So that means your system will be down for that long because that table, on that table, you cannot insert or update because that table is locked till it will create an index on that. So it's really some problem for the most of the people. So they don't want to lock their table. So Postgres has come with the new idea to create an index concurrently. That means when you have created an index, that your table will not be locked. So you can update and insert in that table while in other session, they will, you are creating an index on that. So here is the query. I have created a create index on bar. Forget about the using between. So it created create index. Time is one, two, three, zero, three, millisecond, 12 seconds. The second query I have created a create index concurrently, IDX on bar, sorry, on bar, using B3, forget about that, create index. The one problem is here, not problem that it will not concurrently will not lock your table, but it will take more time to create a index. So that's me. If you don't want to lock your table, please use concurrently, but make sure that it will take more time to create index. If you don't use concurrently, it will lock your table, but it will do quickly to create the index. So it's up to you. You have both options. In this case, the time is almost over, more than double. So previously, I have discussed that how to create an index on a column, single column and concurrently, apparently you can also create with two columns, three columns. So let's go some complex type of indexes like expression indexes. So in some queries you have analyzed, you check with the previous type statements and your own code that you are writing some queries where you are not, you are continuously using the same expression. Like here, you are always using select static from bar where lower name is equal to something, some text, forget about it, what is the text. So that's mean whenever you are selecting, your where class always contain a lower name, lower name. So why you have to create an index on name? Because whenever you create an index on name, it has to apply the lower each and every time. The best solution is you create an index on bar like create index idx underscore x exp on bar, lower name that means the whole expression. So it will once evaluate the lower expression on the name as sold that information into the index. So what it happens, it will reduce the cost. Always think about the when you are using the expression in your where class or somewhere where you are selecting data, make sure you are using the same kind of an expression, use that expression in your index, you will get more benefit of that. Yes, it's not for the lower or upper type of expression. Let's discuss over some more complex type of expression. Like explain, select static from bar where dt is a daytime plus interval two days, less than now, it's just expression you are using that whenever you are selecting that you have a where class where you have a daytime column and you are adding two days interval on that and you are continues to doing that. So while you are creating an index on only dt just create expression index on bar double name and the whole expression index. So this is a bit complex expression. So any valid expression can be used in an index. The valid post-press expression can be used here. Post-press does not always evaluate that expression filtering out the information. It will evaluate that expression once and store that into the index. So it's fast. So another kind of a problem, sometime it happens, let's go to the back to the point one where we have discussed that index is used for the performance, but it has one drawback that it requires some space. It requires some space. Sometime it doesn't matter because the space is less cheaper than the performance, but it hits when you have a multi GB of multi terabytes of tables and you have creating an index on that and your index is on multiple GBs. So sometime it hurts. So, for example, you have a big table, very big table and after running PGstat statement, you evaluate that most of your queries or your 100% of your queries hitting the first 1000 of your IDs. Like I have a table here, IDX full on bar, sorry, bar table, which contains an ID and names. So most of the query you are thinking that which is hitting less than 1000, your ID is less than 1000 and you required a performance here, but you want to create an index on that, let's create an index on that. That's the solution I have discussed before that whenever you need a performance, you have to create an index. What is the solution? Create index, just read the left side of that. Create index IDX full on bar ID. Run the query, it will give you the bitmax index and the performance is really good, but let's get the size of this index, which is 214 MB, 214 MB. And in PDStat statement, you have analyzed that you are not curing anything which is greater than 1000. You're always curing less than 1000. So let's just have a solution for that. Create an index, like a partial index, create index IDX part on bar ID where ID is less than. So you can add a weird class where ID is less than 1000. Now run the same query. The performance is same, no problem at all. You will get the same performance, but when you are checking the size of this index, you can see it's just 240 GB. No performance difference, but size is 240 GB. So now here, you are getting the same performance with almost many times less than this. So one, this is the size problem. This is the question. What happened when with query where ID is greater than 10,000? 10,000, this index would not be used. But this index is just for less than 1000, not for the greater than 1000. That's it. It will not use, it will be used. Or some other index will be used. This index will not be used when ID is greater than 1000. No problem. You have seen your 99.9% curies or your 99% curies are using less than 1000. So just for one query, you are wasting a lot of space. Just let that query run for a longer time. No problem at all, but it's your choice. The solution answer is. So I think we have discussed some indexes, expression indexes, and some kind of an indexes. So the first index type is B3. Some, it's called a B minus three. What is the B minus three indexes? Just go to the Wikipedia and you can see what is B3 indexes. So it's actually a data structure. I think most of the people know what is the B3 and how it works and everything. I will discuss here with respect to PostCastual with some bit of internal details. But for the actual details, you can go to the Wikipedia and you can learn how B3 internally works. So actually, whenever we talk about the indexes, we have to learn which kind of operator it sports. So B3 sports less than operator, less than equal to, equal to, greater than equal to and greater than. The almost all the operator it's both, not all, almost. So the syntax is create index, IDX, B3 own foob. That's enough for the B3 because the default is B3. So you create index B, IDX, B3 own foob. That's enough. It will create a B3 index. But using B3 name is optional. It's for other type of indexes, but the default B3 index. So if you skip that using, then it's the same thing, but not for the others. So now we are explaining, so edit from foob, where name is equal to text, sorry, text percentage then starting from text. So index scan, you can see index scan, IDX, B3 is used. Oh, it's simple. I have created an index using a B3 and just during table foob, where name is some text, it's using the IDX B3. It's a simple. I've already discussed that. It's just when previously we created an index, it's the same thing. Now we're explicitly saying that it's B3. Previously it was B3 also. So here, the same example, if you go back to the, so when the same information, create table foob, integer, name, text, 0, 0, 0, 2, foob, 1, in Alex, Bob, I have inserted almost the same information. And you see, I have created a many columns, like in Alex has one dot, dot, dot, dot, dot, Bob has like, it's have a hundreds of columns. So you have a table here, which contains the information like CTID, static from foob, where CTID, ID, name and other columns and have information. So when I created our index, so you see Alex, you can see where is the Alex, the Paltu is the Bob, you can see the Bob on the right side. Similarly, so this is the heap and B3 index. The B3 index store the information like here, I have created index on the table foob, on the column name. So here, the index key is the name and the CTID is also stored. So in the B3, we are storing the index column, which is right here is the name, we are storing Alex and Bob and the position of that index into the heap, 0, 1. So you remember that when we have created a table just about there here, it contains the CTID and all the columns of that table. Like me, all the column and the information, but in index, only the index column and the pointer. And this information is stored like a balance sheet, which is very easy to traverse. You don't need to traverse sequentially. We are traveling using a balance sheet with some hits we reach to the desired value. But in the upper way, the values are unendered sequentially stored. It's really hard to stack that. But this value, like it's similar. Like you have an information that this value is stored some books and you have some one page, second page, third page, fourth page, fifth page. You are just scrolling across the pages. But like if I said that you have some information that's open the book from the center and you will know that the value is on the, like a quick search, and the value is in the second half of the book. So you open that and you immediately go to the venue. Similarly, here, the Alex, Bob and the CTID, this information is stored in the B balustrade. It will search the value very easily. So the second index, the hash index. Hash index only used for the equality apparatus. The hash index, the B tree index were used for the almost all operators, but the hash index are only supported for the equality apparatus. So here, create index ID hash on bar using hash and the name. So here, you can see the difference is using hash, not B tree. Let's see, whenever you have an equality apparatus, use a hash for the other operator, use the B tree. The third index is brin. Brin is a broad range index. Sometimes you have that information that which is your information are lying with the physical storage of the table. Like you have information sorted like that, inserted like that, January, February, March, April, May, June, July, August, December, November, December. So it never happened that you inserted the information about the January and then February and then again January. If you have continuous information that it never ever go back, then you have one more option to create an index. Like a page number. You have a one, two, three, four page and it never happened that next page will be the fourth page. No, it's seventh page, eighth page, ninth page. Never happened to be a seven again. So it's physically aligned with the physical storage, like minimum value of the column. So sorry, no, not minimum value, the page number I discussed, like date, like sequential number, like serial number like that, which can be a sequence never happened to go back. So in that case, you need to store the three values only in the paper. In the B3, what we have stored, the index column and the position of that column. What have we stored in the hash? The hash value of the column, but in the Brin, we restore the page number, minimum value and the maximum value. So what's the benefit of that? The benefit is the format and the storage. So here, I will give you the example of that. Here, the sequential scan. You can see we have a sequential scan here and the speed is 7397. And we have a Brin index, which give you a 4.233 millisecond. It's not half, it's not, it's many times less than that. Why? Because the daytime is, we have stored a sequential daytime value as a sequential like nine months, 10 months, 11 months of some days. And it creates a Brin. So it knows which page contains the actual information. So it directly hit that page. It knows the page number, knows the starting position of the page, it knows the starting ending date of that page. So the value is very precise. So you can directly go there, get the information from there. But if you are deleting information and inserting new rows in that, no, forget about that, the Brin index. You're destroying that, it will not be used. So what other benefit of that? You can see create index. I have created a retrieve index, hash index and Brin index on date. It's very clear. Size, you can see it's just, V3 is 21 MB and 48 MB because it's the 48 KB. And Brin is just 48 KB. The size is very low. It is storing very less information. So the next one, generalized inverted index. It's a specialized kind of an index in a Post-Rest rule. It is used when we have to index some composite value. Composite values, composite values. Because if we create a normal index and it will scan the document each and every time, it's not good. So we have to scan the document upfront to create an index on the composite value of that document, then the gen index will be used. Like here, for example, maybe I'm not very clear what I'm saying, for example, we give you actual value. So here I've created a table bar. ID is integer, name, address and B type, DT update. So here I'm selecting a name DT from bar where limit is equal to five. So you can see the name is a document, address and B document. Where name, Alex, Foo and everything is there. So what index can I use here? The index on name of data. No, I want to index the actual value in the JSON B document, not the whole JSON. Like if I want to index the whole JSON B document, which is name, doesn't make sense for me. I want to index the value inside the composite values, inside the JSON B document. Here I'm creating that. So what I have done, I have created our gen index, gen index name, using gen index name. And when I wrote the query, explain, analyze, select static from bar, their name, within with that name is equal to Alex. The query plan, you have seen the sequential scan on bar and 0.107 millisecond is the planning time and 1079 is the millisecond is the execution time. But on the right side, you can see the index is used and the execution time is less. And you have specifically seen that normally, whenever we use a B tree or a B tree index, that name, we are using name is equal to or name is equal to Alex and something like that. But here we are curing inside the JSON B document where name is equal to Alex. Sorry about that, this is also name and the column is also name, just confusing. On the next slide, we will change that. This column name is also name and within the document name is also name. It should not be the name. Some other name is better for that. But we are curing inside the JSON B document, which is JSON B is a composite value and we created a composite index on the composite values. And we are selecting some values which is inside the JSON B document, but we are getting the benefit of the indexes because of this Jib index. Jist, Jist is a generalized search tree. It is tree structure, access method and where it's used. It's used normally where complexity, it's actually a framework we have to do some complex data type. Like use to find the point within the box, use of the full text search, enter it. You can use for these kinds of things. So I will discuss that where Jist is used, where Jini is used, I will discuss in the next slide. Where in what? B tree, use this index for most of the queries and different data types. Most of the queries is normally try to use the B tree. Use for the equality of readers only. You cannot do it other than the equality of readers. And your value is very low. I don't recommend to very small, like the string is very small. We're trying to use the B tree index. It will perform better. The B tree index, there is no way to use B tree index if your value is not sequentially lined up with the data site. Like the data is not physically aligned with the physical story. So forget about the print. So like sequential lineup data site, the dates, sequential dates, serial numbers, page number, you can use it. For JIN, use for the documents and arrays and the Jist use for the text search. So one more index, which is called the index only scan. Index is stored separately. I have discussed that index is also stored in separately. Like the book, we have discussed it. The book index are stored at the end of the book, which contains 10, 20, 40 pages. Sometimes you know, I'll give you the query that count the number of, where number where postgres is written in the book. You go to the index, and you will check that index where the word postgres is on page 10, page 11, page 13, page 14, page 16, and it's a five 10. So you will give the query five without reading the book. Just go to the index and you can count that how many times in the book is postgres here from the index, and you will set five times. This is called index. That's mean if your query 100% fit within the index, it will not query the heap. It will directly give you the data from the index. It is called index only scan. Here is the example. So here IDX, Btree, index only scan on bar id and name. You have our index on id and name. In the first query, id name, dt, another column from bar where id is greater than 10,000 and less than, forget about this query. What is the query? I'm not concerned about the query. It's use index scan. Why? Because you are asking postgres that give me two column from index and one column from heap. So it will give you all three columns from heap. It will get the pointer from the index scan of these rows and give you all the information id name, dt from the heap. But in this next query, explain static from id name from bar where id. You can see id name, id id, all columns are within the index. So it will use the index on the scan. It will read id name from index and give it to you. It will not go to the heap and extract the information because it already had that information. In the previous query, it doesn't have any dt. So it will not give you the value from the index. So this time it directly give you the value from the index. So that's the index on the scan. So whenever you plan, try to see that your queries are hitting the, you can create an index on the scan because it faster than the index scan. Try to use index on the scan, but it's a matter of your own. So some queries, I think we are running out of time here, but I wrote some basic queries for you that how to detect duplicate indexes. Like you have created index on a table and on some column, same index on the same column, why you have two columns because you are wasting your space. So here are some of the query. You can see that when you run this query, you will get the constant, the whole query. You run that, it will say that this relation is this index key is B tree column. Your second column of bar table has two B tree, more than one B tree indexes, just remove one. So it's just to run your database, it will give you all the information on the database, drop multiple indexes. So this is the query, simple query, where you can see which kind of index can be used, which kind of operands, like operator, like you run this query about the GIN, that GIN index with array operation, ds vector, json of, it will give you all the information. So just change the GIN gist, I have just run this query GIN gist because if you run this with B tree, it's a whole long list of that. So operand family, so if you run that, you can get the information about the B tree, hash, brain, all of that. So there is some other ways to detect your index stats, like PgStats, user index and PgStats statement. So here are some queries, unused index, sometimes you run your query, you have to run your queries on the database, sometimes like quarterly, monthly, you run some queries and you have seen that you have not used some index anytime last three months or four months. Here I give you the never used, IDX food date never used because it will never scan. The second IDX B tree never used, IDX B tree ID never used, IDX B tree name six times, okay, four. So if you have seen that the previous collected stats from this collected stats, you have not used this index, why this is, yeah, because it increased the planning time and each every, every thing. So you have to, not even planning time, you have to think about that. So why these are here? So I'm just thinking that, I think it's time for some question because we already spent one hour, but I can answer some questions. And if you have some more questions, you can send them in the email. So I will answer the questions in detail, but I can entertain some of the questions. Fantastic. So it's up for you? Yep, feel free to stay on for the questions, but we understand if you need to hop off. So we have about five questions in the hopper. The first is what's the meaning of the first figure of a cost? When it says 939.93, when does this come? And I believe this is from the Y index slide. This one, 159.235? 939.93, dot, dot, dot. Okay, I have to explain, I don't know the exact where it is, 99.3 maybe. I will explain what is this cost actually. So it will give you the answer. This is the arbitrary values. It is not millisecond, it's not second. It's just arbitrary values. Bigger values mean more time, less value is less time. So it's not millisecond, it's not second, it's just an arbitrary values used by Postgre as I explained where the higher values is more time and less value is less time. That's it, it's just a simple answer. Fantastic. Next question. Why would you use a GIN index instead of a Bitmap index on name? Okay, I will go directly to this slide. Why I have not used it? So I told that when you directly created a BD index on name, then you cannot extract the value like name column. I'm extracting that value. I just told that there, I would just confuse that there is a name and the name, two names. Actually I have confused somewhere. That column name is name. I created our GIN index on column name, which is a JSON B composite value document. And here I'm curing that this document contain information which name is equal to Alex. I'm extracting that information from that JSON document. Normally when we are using the B tree, we are directly matching our information with the column. Column. Our information, like in the, which we are wanting to extract from the table, comparing with the column. But here we are comparing column with the value extracted from the JSON B document. But not directly comparing our name with the whole JSON B document. Here you see in this column, we are named, we are not comparing name with the whole name Alex, phone 2333, 22211. We are not comparing that. We are, we are getting the values from the, within the JSON B document. That's why we are using the GIN not the tree. It's just for your exercise. Just try to create this table, insert some values, try to use B tree, try to use GIN. So you will see the performance very rich. That's it. Thank you. And I'm getting notes that you've answered some of the questions already in speaking. So the last question I have here is, I've seen UUIDs being used for primary keys in some tables. Which index type would you recommend for that? Normally I think the UUID is indexes, the B tree index is better for that, I think. Because it's a normal, some, I heard that sometimes, but I'm not 100% sure which kind of an UUID is used. It is sequentially appending, but I'm not sure because sometimes it's used, some time and other things are to generate that. But I think B tree is better for that. Okay, fantastic. So with that, Ibrahim wanna thank you for taking the time to speak with us this morning, afternoon, evening, depending on where you are. I wanna thank everyone for hopping on and spending a little bit of their day with us. And I hope to see you all on the next Postgres conference webinar towards the end of the month. So cheers and have a great day. Yep, thank you everybody.