 So welcome to today's PostgreSQL conference webinars, Long Queries and the Art of Full Scan. We're joined by Henrietta Dumbrof-Skiya, Director of Analytics at BrokerX, who will discuss which queries are considered long, how to optimize full table scans, how the order of joins affects query performance, how to optimize grouping, how to avoid multiple table scans and additional techniques for long query optimization. My name is Lindsay Hooper. I'm one of the PostgreSQL conference organizers and I'll be your moderator for this webinar. A little bit about your speaker. So Henrietta is a database researcher and developer with over 35 years of academic and industrial experience. She holds a PhD in computer science from the University of St. Petersburg. And at present, she's like I said, the director of data analytics at BrokerX, the local organizer of the Chicago pug, an active community member, a frequent speaker at the PostgreSQL conferences. A researcher focused on developing efficient interactions between applications and databases. And the author of PostgreSQL query optimization book. So with that, I'm gonna hand it off. Take it away, Eddie. Thank you, Lindsay. Thank you for your kind introduction and thank you for inviting me and thank you for organizing this webinar. It's always great to be here. So today's presentation. So this logo is for me and my co-author Boris Novikov and conveniently our initials kind of make DB in the middle Henrietta Dombrovskaya, Boris Novikov. We did not quite figure out what to do with H&N. So if somebody have bad ideas, just please share. But so far we have focused on DB in the middle. So the name of the talk is Long Quarish in the art of full scan. So originally it was just the art of full scan, but yes, it has something to do with Long Quarish. So in the beginning of this webinar, I posted a question, which queries are considered long and normally I ask people and like listen to audience responses. But that's basically for like, you know, if somebody want to share their thoughts, what they think about which queries are considered long, please share because we will be talking about this in them like in a minute. So this talk also I just wanted to let you know, it is this presentation is based on chapters from the book PostgreSQL optimization, which was published by a press publisher in May 2021. And a book is available on Amazon and electronic version is available. And for this book, we created an open source database which is called PostgreSQL. So I just want to emphasize that though we created it in connection with the book, but it's not a part of the book. You do not need to purchase a book to use it. It's open source and you can go to this GitHub and download it and download all instructions and some source codes and play and experiment to the best of our knowledge. It's the largest existing publicly available PostgreSQL database. So please use. And the examples in this talk will be based on the tables, which are part of this PostgreSQL database. So if you did not get a chance to look into it, that's fine because I think that like all the joints are pretty much understandable, but maybe for future references you want to take a look. So now let's proceed with the topic. So full scan and long queries are somehow related. And just for start, let's think about which queries are considered long. So I will just use the replies which I receive most often. So first of all, nobody wants actually to talk about full scan, nobody wants to talk about long queries. It's not a cool topic. It's super cool to talk about indices. You go to any conference and there are like tons of talks about what are new indices, how cool they are, how make your queries faster. Long queries full scan is like nobody talks about it because it seems like hopeless and not interesting. So during this talk, I want to prove that actually there are lots of interesting things you can do with long queries and lots of things you can do with full scan. So which queries are considered long? First of all, it does not depend neither on the size of the query itself nor in the size of output. Because for example, if you take this query, that will be a long query with long output. So what it does, it does Cartesian product of two copies of the airport stable. So by this query makes sense, it lists all possible airline routes from any airport to any airport. So that will be long query because there are lots of airports, it's Cartesian product and there will be huge output. So that's a long query. On the other hand, you can look at this query and this is a query with very short output because how big is the output here? Just one line, right? Just one line. So what we are calculating here, we're calculating the average flight length in the average number of passengers through the whole database. And that's a lot of data to go through. It's super long query, but the output is small. So the size of output does not really tell us whether it's a long query or it's not a long query. So what is a long query? So let's go to the formal definition. That's again, it's not formal definition from the textbook but that's something we came up with. So a query is long when the query selectivity is high which means that almost all rows contribute to the output even if the output size is small. So what it means, in order to get the result of this query, you need to touch most of the rows in the tables. And most maybe like not 80%, even 30% of the rows means like almost all very like significant part of each of the tables or most of the tables. And the other thing we're mentioning is that the notion of what is long changes because each time with Postgres, with each Postgres version, we have new access techniques, new indices, new ways of data storage, new everything. So if you look at examples like from 10, 15 years ago, 5 million row tables, like, oh my gosh, how do you deal with 5 million row tables? I don't even know that if you have a sample size of 100,000, oh, that is like very good sample. And now like recently I was looking what size this table? Oh, it's just 15 million rows. Gosh, it's nothing, it's nothing to optimize. So this notion changes. So just, you know, we just have to keep it in mind that all this is relative, that the technique of those stays the same. So can long query be optimized? That's like the question which I started from because the sad thing is that not only people do not want to optimize long queries, but often there is like this widespread presumption that they cannot be optimized. Okay, that's a long query. Okay, just ask your marketing analytics in your company. It's something super long. They always need marketing analytics. They always need everything, right? They want to have all data from the inception date for the past 10 years, and they need to load them to their models and do something with this data. So most of the time people just say, no, we cannot optimize them. Why? Why? And actually, here's the thing. So first, yes, they can be optimized. How they can be optimized? There are two main techniques which I will be covering. One is avoiding multiple table scans and the other want to make sure to reduce the size of the result in the earliest possible stage. And like one thing which is not on the slide, why we need to optimize them? Because you might have heard people tell you, you know what, that's okay. This query does not need to be optimized. It just runs once a week, once a day, once a week, once a month. So you know what? Do you know what once a week mean, by the way? I can tell you what that means. Once a week means that it's nine a.m. Monday, always. When it's once a week, it's nine a.m. Monday. And you know how many queries run once a week, Monday, nine a.m. and your database is like crushing because of all these queries which do not need to be optimized. The other thing, if you just leave them go like this, it will be five minutes today, 10 minutes next week, in hour, like three months later, and then all of a sudden it cannot be done before a business actually needs these results and then we'll have a bigger problem. So long queries, they will never run in milliseconds. They might never run in seconds, but we still want to optimize them. What is important about long queries? We do not need indices. That's why we're talking about full-scale. Why we do not need indices? So this graph is actually you have, you could see this graph in many different presentation. It represents the cost of operations depending on the query selectivity. So the high selectivity, the high is the cost, but the cost changes differently depending on the query selectivity. So you can see that for the full scan, which is the green line here, the cost is always the same no matter whether it's high or low selectivity. You always kind of need to go through the whole table. For indices though, the story is different. When you need to select something with very low selectivity level, just a couple of rows from multi-million row table. Yes, index will give you faster results when you need this couple of rows. If you need like 20% of this table, then using indices actually will cost more because you'll need to go to index and go to the table and the blocks in the table are not necessarily located one after another. So the cost will rise. So for the full scan, it doesn't matter. And so for queries with high selectivity, that's what we're going to use because in the long run full scan will be just performing better. And the next thing we need to know about full scans we need the first rule of optimizing full queries is the most restrictive semi-join should be executed first. So some of you might have heard about semi-join. Some of you might not have heard about semi-join. So what is semi-join? We do not know this. It's not in the list of relational operations. Okay, semi-join is a join between two tables. And you've done it. You might not know the word, but you've done it. I assure you. If you use exist subquery or you used in operation and you have sub-select with this in, that's when you use semi-joins. So we are returning the rows from the table on the left side without duplicating when these rows satisfy the predicate on the right side. And I know those all these words predicate and whatever. So let's actually look at, let's look at examples. Okay, then you will know that you actually used semi-joins. So these are examples of semi-joins in this table. So select from flights with flight IDs, select flights from booking lab. What that means, by the way, it means that somebody booked a trip with this flight, okay? Because booking lag is a part of the reservation. So this flight was booked. Or the other version select from flights where exist select flight ID from booking lab. So both of these examples are cementically identical in both select flights, which are part of some bookings. For both of them, we'll have the same execution plan. In this execution plan, you will see this semi-join. So what that means that again, we are joining but we are also eliminating duplicates. So by selecting flights, we satisfy this criteria. For both, it will be the same plan. Now, okay, I get it. So semi-join restrict the dataset. So how do I know if I have several semi-joins, which one is the better one? So let's take a look. That's a long query, okay? Because it selects all the flights, all the reservations for the United States from this database is like a long query. So what do we have here? We have one joint like booking like flights and then we have semi-join with airport stable. And here the airports are from the United States. So you can see here that's hash join because we are doing sequential scan on the airport with countries US but this is executed before the next join because using this join, we reduce the size of the next result set. So if I would be going consecutively, joining all booking legs with all flights and only after that reduce it to the flights from US that will be very long query. So postgres is smart enough to figure out, first we select airport code and then we reduce the size and then we join with booking legs, okay? So here, okay, yeah, we got that's some restriction. By the way, here we do not have index. So in this execution plan, we purposely we did not create an index because that is actually not good column to index. But you might wonder what would happen if we would index. So I can tell you if you create an index and if you postgres technically should not use it but it might accidentally. So this one will be executed slower. This one will be executed slower than full table scan. So now what if we have two semi-joins? So, and I apologize, it's a little bit crowded slide. I'm just trying to fit everything on my slide which is not all the season. So here there are two semi-joins. So first we select again airport code with country US and the bookings which were updated before July 1st, 2020. So to give you some perspective, the postgres air database became live in August, 2020. So in this database today is August 18, 2020. So this gives us kind of like last month and a half of all the reservations which are in this database. So you look at this, you cannot even tell which of this is more restrictive, right? And I mean, again, I'm not asking like 100 people what they think are more restrictive but at least it's not obvious. So let's see what postgres thinks about it. So that is the execution plan. And you can actually see that here the scan on the airport is executed prior to the sequential scan on booking, right? So first airports then booking, which means that postgres thinks that this selection by US is more restrictive which is actually right because there are like lots of flights. Again, there are lots of flights for like eight weeks of seven weeks of data. But if you will modify it, so if you will change update condition and we will select dates not starting from July 1st but starting from August 1st, then it will be different story. So here postgres actually can figure out not only that this is more restrictive semi-jewel but also, by the way, you can use an index here. So this one will be executed first. So execution plan changed based on the statistics based on which data we are passing to this query as parameters. And by the way, does index make it better? So here we saw that we use an index and does index make it better? So this is the way like another version of this query where we modified this query transformation which blocks the usage of index. So semantically this is the same because updated timestamp is never now but using this transformation, we block postgres from utilizing the index and let's see what will happen. So if we are going to this execution plan, so here we are blocking an index and here we can experiment and see when it will be faster, when it will be slower. So and if you make the same experiments on this postgres database, you will see that indeed index will work when this timestamp is like allows us to select smaller intervals. So for this particular case again, today is August 18th and if the timestamp is earlier than August 1st full scan works better. If it's like August 1st or later then index usage works better. So that's about semi-joint. So second technique, anti-joint. Again, anti-joint even if you did not know how this called, you definitely used it. It's either not exist and not in. Again, technically speaking, both notations are semantically correct. However, in postgres only first form guarantees anti-joint in the execution plan. So if we look at the execution plan of these two, so first one will show us anti-joint and the second one where we have not sub-plan it will not use anti-joint but again, semantically they are identical and execution is pretty much the same for both of these versions. So question if people ask now why all this exists, not exist, anti-joint, semi-joint why just not use join? So what will happen? Something will happen. So first of all, if we use just join instead of semi-joint we need to make sure that we're not selecting duplicates because if we will use first of this SQL statements instead of semi-joint, we will receive duplicates and in order to avoid duplicates we will need to first select distinct flight IDs from booking flags and then join it with flights. And if you look at the execution plan by the way you can, if you are wondering whether it's faster or slower it is actually twice faster. So again, using semi-joint is not always desirable but each time I cannot tell you whether it will be always like this or not. So you need to experiment and see which one will work better. So can we do auto-joint instead of anti-joint? Actually yes, because for anti-joint there is no question of removing or not removing duplicates. So you can use the same like auto-joint where here like flight ID is now it is again, semantically the same and you can see anti-joint in the execution plan. And so you can see, oops, sorry, wrong one. Okay, you can see anti-joint in execution plan and this actually works the same and sometimes even better. And by the way, if you just need flight ID if you just need flight ID from the flight stable so you do not need the rest in the flights then the execution plan will be different again and it will be even faster because here you actually can use index all the scan on both tables and then it's all different story. So again, it's technically a long query but when we can use index all the scan then we might even do not need full scan and this might be faster. All right, now we talked about the order of joins and how it matters in how selecting the right order of joins actually helps to improve query performance but then the question is, okay, how I can tell Postgres that I've won this order of joins because again, those of you guys who've been with Postgres for a while or Postgres is your first database you might even not know that other databases allow you to dictate the order of joins but for people like I who came from very long-term relationships with Oracle there's like, what do you mean? You cannot give optimizer hints, you cannot tell which order of join to use. Nope, you cannot Postgres decides. So, so far, what we saw we can tell, okay, Postgres chooses the right order of joins but what can we do? What can we do to help it? Most of the time nothing except of choosing the right way of writing queries. So in this first half of the presentation I showed different ways to write semantically the same query, which like might give you different execution plan and usually it's nothing you can do with tuning parameters but there is one parameter which you can turn locally and that's something which I always kind of suggest people to experiment. So this parameter is join collapse limit. You can set it within your session and what it tells you, it tells you how many different versions. So what optimizer does to optimize the order of joins. So by default it is set to eight and what it means that, okay, so if you have up to eight tables joined in one query optimizer will choose different execution plans. Actually now I'm saying eight factorial in reality it's not eight factorial, it is less. So in reality optimizer would like pre-select the potentially good execution plans but anyway, so if the number of tables is like less than eight it will try to find the optimal order of joins. If it's more than join collapse limit it's like, okay, you know whatever order I will just fell over to the standard optimization. So sometimes people want to increase this parameter to something greater than eight. So again, we normally do not recommend to increase it to more than 12 because otherwise the number of plans it have to analyze or look. If n is eight it's like 40,000 plans if n is 10 it's like over three million plans and just finding best plan takes forever. So I have one horror story from my history. So once I talked to a data scientist who asked me what's wrong with his query. So his query had 30 tables to join which is like something very predictable very much expected from data scientist. And in order to have the best execution plan he said join collapse limit to 30. So you know what happened? It's not only that the query could not run the explain plan command never finished because there's too many things. I mean, you can try to estimate how many plans PostgreSQL had to analyze. So in OLAP, in data warehousing it might be actually good idea if you know exactly what you need because often you have, okay you have like couple of fact tables and all other side dimensions and you know that your dimensions had to go last, right? So sometimes you can say, okay PostgreSQL I know better and set join collapse limit to one and then explicitly write the order of joins which you actually know will be the best one. And again, what I suggest like this is local parameter it can be set to your local session just change it for your local experiment. You can see how long does it take to produce like the best execution plan and you can see where that makes any material difference. Okay, grouping next topic cause also when we have this long queries which touch most of that in tables, we have grouping. So that's like long query with grouping what we have here. We have all bookings in our database. So for each flight, we select an average price of the ticket and we select a number of passengers on the slide. So good long query with grouping, nothing bad for this so far not much you can do. And you know what's next happened you know I hate views by the way. So why I hate to use because next thing somebody figured out how to calculate it and they save this query as a view in the database. And then somebody wants to see information about particular flight. So select from whatever we just selected where flight ID equals some certain flight ID. And that's actually the real number from the postless database so everything can be replicated. So what do you think will happen? What you'll see. And again, I cannot like ask you to raise hands you know, you do not know what will happen but what I can tell you what will happen depends on where you are. Okay, so it did not work in older versions. Why it did not work with older versions of postgres because here grouping is done before select. So in earlier versions of postgres how this would work everything will be calculated all this like group by and then after this there will be selection of the specific flight. So that would be like super inefficient. So now like I think starting from at least postgres 11 if not earlier actually the conditions can be pushed inside group by postgres, by postgres optimizer without any effort from your part. So the right way of writing this query would be like this. Do the flight ID and then your group when you already preselect the flight ID. So now you do not need to do it postgres does it automatically. So no matter whether you did it like in the first slide first group then select or you do select the execution plan will be the same and you can see in this execution plan that it will first select the flight and then we'll do all the grouping. So if we have any condition which is imposed on the fields involved in the group by current version of postgres will push it up and again not current is already like several versions. So here, for example, we condition put a condition departure airport is where then a higher airport in Chicago. And if you look at this execution plan that will be the right execution plan. So first post this will preselect all the airports which are all the flights which depart from a higher airport and then everything else again by the way, everything else is her joints, right? Again, it's long way, it's still long way. Like it's like one of the busiest airports in the world. Still long query, but the first part is first we preselect all flights we start from a herring that makes life easier. That makes this query to be executed as fast as possible. So next, but it's not always the true. So here look at this condition. Now same subselect and we select the flights where departure is like between actually it's around 4th of July, okay? And the number of flights which we want is really small but the departure schedule departure is in no way part of this inner grouping. So postgres is not pushing this condition. So if you look at the execution plan of this query that is how it will look. So you can see that there are like tons of her joints and everything all the grouping is done and all the after the grouping is done then we select the flights which satisfy this condition and then the results are joined. So that is something which is like not optimized and not even not optimized but okay. That is called pessimization and I did not invent this word like people before me invented this. So that's like the practice which guarantees slowing down the execution of this query. So again why it happens because like there is no way for postgres to push this condition inside and that is something which we can explicitly avoid. So how we can write better queries so not have this situation happen. So that is the correct way to write the SQL, right? So we first, so we include this criteria inside and we filter before grouping. So this one will be executed much faster and much better. So here you can see that it starts from pre-selecting flights. So using a bitmap index scan and like life is much better this way. All right, the opposite situation. So now I told you first we need to filter then to group. Sometimes though you need to do the opposite and it's important to distinguish between two cases. So here is another query. So what we're doing here. We are selecting a number of passengers departing each month from each city pretty much, right? So again, long query and we are writing it like as I said we select join airports, flights, booking legs, boarding passes and okay, voila, we have all this like number of people. This query on postgres database that we installed locally will execute for seven minutes legit, right? And so yeah, execution plan as you expect like all the hash joins like nothing interesting. How we can make it faster? We can make it faster. That's how we can make it faster. So first of all, city has nothing to do with all other calculations. City is an attribute of the airport and airport is an attribute of flight. So we can do all the calculations using only booking leg and boarding pass and after we're done, we can join with flights we join with airports. So here we do grouping first and then we like doing all the rest. So in this case, that's the execution plan. Again, it's still hash joined but it limits the size of the intermediate result set. So this time it will execute two and a half minutes it's still long query, it's super long query but it's almost three times faster than the original query. So another technique, using set operations. So what we often can, we can use except instead of not exist, intersect instead of exist and not in and use union instead of complex selection criteria with or. So I will just show one example. Again, the same thing of flights which are not booked, ever booked. So instead of doing in or not in we do select flight ID from flights except select flight ID from booking leg. Execution time twice faster than anti-join. Again, it's all for the long queries. It might not be necessarily true for queries which can utilize indices. And that's the execution plan. Again, it's yet another execution plan. So always check what works the best. And one more example. So here we're doing the opposite. We are selecting flights which have some bookings on them. So here it's not except but intersect it's even faster. And again, different execution plan from what we saw before. In the last technique I will cover because actually I promised more than I can when I looked at how much time I have and what I promised. It does not necessarily... So I won't be able to cover views, materialize views and such but one technique I want to show avoid multiple table scans. So when multiple table scans happen. So if you develop any system for a while you come to the point when you need to store data which is like something new, something you did not plan and you need to store it somewhere. So what people do? People create special types, custom fields or whatever. Like almost all systems I know have this custom fields. So here what we need? So the regulations change. We need to store passengers, passport information in the system. Passport number, passport expiration date, passport country and we do not know how regulation change next week. You know, we all live through the pandemic you have no regulation changes. Now we need to store COVID test like whatever. So we want to be proactive and we create a table custom field and for each passenger we store whatever. We have several custom field types and each type stores something. So now we need to select for all passengers their passport number, passport expiration date and country and okay. So why it is limited to like whatever, five million here. It was just a question of, so when we did, when we ran this examples for the book and like for PostgreSQL database, we wanted to make sure that we do not have disk failover because we need to, because we need to make sure that our data sets all sit in main memory and all the times are realistic. So that's the only reason you can see here limitations by ID. Okay, but anyway, typical situation what people don't you join this table custom field three times because okay, first time you select passport number next time you select expiration date and next time you select a country that issued passport. So what will happen with the execution plan? Scan, scan, scan, scan, how many times? Three times, scan on huge table, okay. And this is not the best way of doing it because you know what, you can do everything with one scan. That's how we do this. So we scan this table once and we look at each record and if it's a passport number, okay, here's it. If it's passport expiration date, here is it. If it's passport country, here is it. All the rest skip, all the rest is now, okay. And here we're grouping, okay, problem solved actually problem is not solved. So horror story. That's an example which many of you probably use this technique it's like technique widely available. Okay, problem is solved and you know what happened? This works slower than the previous version. Why do you think it works slower? Again, I cannot really do like raising hands but I can tell you why it is slower. It's actually not even slower. So that is our execution plan. So why it is so bad? It is so bad because it's not only bad but it is incorrect because how we were grouping here we were grouping people by first name and last name. And how many people were the same first as last name? Plenty. And it does not matter that we need to print all the first and last name. We need to have all distinct people. So this was incorrect. So what we're doing now, a little bit more like a little bit more sophisticated but actually the right version. So now we do passenger ID although we do not need to have passenger ID. And then the group by passenger ID first name and last name in this actually is executed much better and much faster. So that's our new execution plan. And this does what we want. Now, one more optimization. So one more optimization we are doing here. We can reduce the size of intermediate results same as we just did with airports in the cities because we do not need first and last name until we're done grouping and selecting. So what we're doing first. So first I do not know how much is visible. I cannot do that. But we have here a sub select and all the group this custom fields for each passenger ID. And when we are done with this, we have passenger ID and for this passenger ID we have passport expiration date in passport country. And then we join it with passengers. So this will be actually the best execution plan and executed first no matter what though again is still super long query. And by the way, it even helps us because here we can use index on passenger ID and do merge join. And this will execute much faster. So again, we cannot tell Postgres precisely what we need, but we can write equity this way that it makes it easier for Postgres to do this. Okay. And okay, so summary what we learned. So optimization goal for long queries is to reduce size of intermediate data set. So all technique I showed today had this goal. We need to make sure that we do not join, we do not start from largest tables and we do not scan things which we do not need to scan. And to achieve this goal, we have several techniques. Control the join order, use semi-join, idle join, filtering before grouping or grouping before filtering whatever makes sense. Use set operations instead of all of the above technique and avoid multiple table scans. So these again, you will need to try and see what works. So I think I'm keeping telling you cannot learn these techniques and you cannot just apply them mindlessly that you need to look at your system, you need to look at your statistics and you need to look what works for you. And it's very easily that for each of these long queries you need to use several ways and see which will produce the best results. Okay. And Lindsay, I think I actually did it well and we have 10 minutes for questions, right? We sure do. That was great. We have three, four questions here. If our attendees have any others, go ahead and get them in. So the first one is a little bit long, but I think it's a good question. I've been told that if a query has 12 tables involved, but the joint collapse limit is eight, for instance, then it will perform very similarly to joint collapse limit equals one. Does the planner actually look at the first date before just deciding to add those other tables in at the end in the order declared? Yeah, so okay. So even the number of tables involved is greater than joint collapse limit than yes, this behavior is like pretty much what was described. So the optimizer will like literally use like one, two, three, four, whatever the joint order is defined in the query. So what I meant by setting joint collapse limit command because sometimes even if you have eight tables involved you know what to do. You might know what to do better than postgres optimizer. And then you just want to like, okay, forget about it, I know what to do. Beautiful. There's been some discussion in the chat and the question asker has asked me to ask you, what are the units of cost? Okay. Units of cost, so the units which you see in the execution plan are kind of like abstract cost and you can read in documentation how they are set up. So basically when you set up in the postgres instance the cost is assigned by default to all operations and you can modify it. But the idea is so what you can tell about the cost. The cost is proportional to the number of IO operations and also to the CPU usage. So it's like separate big thing how it is calculated but yeah, there are two main factors which is trying to balance. Most of the time we are looking at the cost is so the more the cost means that there are more read operations involved. So that's like on a very, very high level. Great, thank you. Do you recommend using global temporary tables to break up many joins into a series of intermediate steps? No, and that is something which I actually did not cover because first I had plans to cover this part how to use temporal tables and how not to use temporal tables. So in general, like in 95% of the cases using temporal tables is not justified. So maybe I need to give another talk on this because I actually will have behind all using of CTs using of temporal tables, using views and using materialized use and maybe can have a separate talk about this. But in general, I caution people against using global and on global temporary tables. Well, you know, Hedy, we have availability in November if you want to present again. Yeah, I already presented in October, you remember, right? Yeah, well, third time's the charm, right? Okay, so we have two more questions. Do you recommend using of extension PG Hint plan so that we can use hints and influence join orders? I personally did not work with this so I cannot kind of give any expert advice about this and I probably should look and see what will happen. So from my previous long-term experience, again, my attitude in general towards hints I know by postgres like core contributors are against hints and actually my personal opinion, the absence of hints in postgres made it the best optimizer I know with my like 38 years of experience in industry. I've seen a lot and I think that one of the driving factor for post-bisoptimizer being so outstanding and not matched by any commercial or non-commercial systems is because it is programmed to function correctly without hints. So hints are often great and sometimes you miss it because I know why it does what it does because I know how to do it right. But when you start using hints it's kind of difficult to stop using them and then you kind of like get hooked on indices and you're on hints and you always have to adjust them and then make sure that they still work and data changes. So great part about avoiding hints is that your data can grow, data can grow not with the same speed. So the like comparative volumes of data in different tables might change and if you do not use hints there are better chances that it will be still performant although nobody can guarantee this. So but I actually did not use this extension and I will take a look at it. Great and our last question. What do you do about queries with moderate selectivity say approximately 10% of the table when index only scans aren't possible? Say you have a GIN index. Okay, experiment, you know like honestly experiment because again moderate selectivity is very tricky situation because here first of all you do not know how it will decide to work and second moderate selectivity, okay it's maybe an index on the column and the values of the column might not be distributed uniformly. So in a book we have like a follow up of this example with a country equal yes. So if country is the country with smaller airports so in general index by country is moderate selectivity but it might work for example for countries where we have very little number of airports in this country. So and again this is like one small portion of the whole query optimization. So you know sometimes it's still worth to create index sometimes what might help create a conditional partial index index with a wear clause. Most of the time actually that's what works because if you want to select like something very specific index with wear clause might work but again that's like the whole part of big optimization picture. Okay, wonderful. Petty thank you so much. I always know that you're gonna bring an awesome presentation and I always know that you're gonna get solid questions from the audience. I mean it's just that engaging. So thank you. Thank you to all of our attendees for joining us. Thank you to all of our question askers for keeping it lively. And I hope to see you on future Postgres conference webinars. So cheers.