 Welcome to today's Postgres Conference webinar, The Evolution of Partitioning Features in PostgresQL, a supercharged elephant. We're joined by Jobin Agustin, Senior Support Engineer at Precona, who will discuss the importance of upgrading to the latest versions of Postgres and the maturation of partitioning features in Postgres. My name is Lindsay Hooper. I'm one of the Postgres Conference organizers, and I'll be your moderator for this webinar. A little bit about your speaker. Jobin is a PostgresQL expert and open source advocate who has more than 18 years of working experience as consultant, architect, administrator, writer, and trainer in Postgres, Oracle, and other database technologies. He's always been an active participant in the open source community, and his main focus area is database performance and optimization. Welcome, Jobin. A few quick notes for our attendees, and then we'll get started. So the first is that I put all attendees on mute to avoid background noise and preserve audio quality. And the second is yes, we are recording this webinar and it will be accessible on the Postgres Conference site by early next week. So with that, I'm gonna hand it off. Take it away, Jobin. Okay, thank you, Lindsay. Welcome to Postgres Conference webinar. And today's our topic is evolution of partitioning features in Postgres, PostgresQL. And it is maturing over a period of time that's the concept of this talkies. So let us get started. Welcome to the presentation. And myself, Jobin Agastin, I work for Perkwana and I have public profiles, LinkedIn and Github. I have a few interesting scripts in the Github. Please visit. Yeah, and I used to blog, please visit my blogs as well. So first of all, why this talkies? Okay, the Postgres partitioning feature is maturing over a period of time. It is one of the powerful feature now. And when we consider the partitions in Postgres, it is basically tables. This architecture is quite different from other database systems. And this is very simple, elegant, and a portable architecture. And because when we this community design this architecture, we considered almost all other partitioning features available in other database systems. So a partition table is a collection of tables. Each partition is a basically an individual table by its own collection of tables. And it is referred as partitions. Partitions belongs to a partition table. And the parent table does not store the data. This is a basic concept behind the partitioning. In Postgres. Okay. And when we talk about evolution, it all started long back. So there was a prehistoric age for the Postgres partitions also. And there were many attempts to solve this puzzle of partitioning in Postgres. And it started by partitioning the table inheritance possible in Postgres. And there was a parallel attempt to create a block range index, because that is virtually partitioning a table. So these were the attempts before, the real native partitioning possible in Postgres. And we need to know a little bit about table inheritance, the prehistoric part of the new native partitioning, because still it is relevant and there are a lot of installations still using this method of partitioning. And this is possible because Postgres Equal is object relational database, object relational database management system. It's called ORDBMS. And it supports table inheritance, unlike many other database systems. So the table inheritance can be used to for creating sub-tables or child tables. So this is a syntax. What we are seeing in the screen is the syntax. Create table, a table name, even somewhere here, like the parent table, including all inherits the table. So all the inherits means it inherits the defaults, constraints, indexes, storage and comments, et cetera. All these things will be copied to the child table. It inherits everything. So the child table can have additional set constraints later. So instead of talking a lot, seeing what it is will be interesting. So a quick demo. Let me pull up the demo screen. Yeah, hope you are seeing my screen. And I'm going to create a table. So what you're seeing in my screen we have Postgres 10 in this window and in this box and Postgres 11 here, 12 here and 13 here. So I'm going to fire the same command, same set of commands in all the windows. So here we go. So I'm going to create a table. So create table, some columns, and yeah, table is created in all Postgres instances. And now I'm going to create an index on that. So if you describe, it's a regular table. And now we can create a child table by inheriting all these things. So create table events. So as the message says, it inherits a lot of things. And it creates a child table. So if you describe the child table, we can see that the primary key is inherited and the index is inherited. So we can see that the primary key is inherited and the index is inherited. So basically what all things are there in the parent, all those things are inherited. So apart from that, I just briefly mentioned about the additional check constraint. So I'm going to drop this partition quickly. So, oops, sorry. Okay, and then create child tables with those additional check constraints. Okay, as we can see, create table with the check constraints, the date between these dates. So only those values which satisfy this check constraint can get into this child table. And it works across all the Postgres versions. So even though we are talking about an old inheritance-based partition, we should remember that it is not something discontinued. It is still relevant, and we have a lot of users still using that. And we can have multiple supports or subpartitions or partitions. So another range and another range. So this is the way we used to create the child partitions or child inherited child partitions. Okay, going back to the presentation. So we saw how the check constraints are added to the inherited partitions. Due to these check constraints, each child table can hold only validated data. So we are clear that the data is really partitioned into child tables. And it works on the concept of constraint exclusion. The constraint exclusion will skip over the tables whose check constraints guarantee that there is no way to satisfy the condition for the query. So if we have a partition for values with the check constraint, values in between 1 to 10, we are sure that there won't be a value of 20. That will be in another partition. So the postcards can use the check constraint as a constraint exclusion method to eliminate or prune the partitions and arrive at the right partition. This is the underlying concept behind the inheritance-based partitioning. However, we initially need something to redirect the rows to these partitions. So that is done by triggers. So triggers are added to the parent table to redirect the data to individual partitions. So what we are discussing is about the pre-native partitioning available in Postgres. And it worked and still working, but there are a lot of limitations. One is the trigger which we talked about. The trigger has a lot of overhead on transactions. And the constraint exclusion method is not so intelligent. It just checks the constraint values and selects the partitions. So it's not intelligent enough. And because of that, only equality and range comparisons can work with constraint exclusion, which is a big limitation. And we keep seeing a lot of Postgres user complaints that this partitioning is not really working fine. The partition pruning is not happening properly. Yes, because all these are limitations of the really dumb way of partitioning the table based on inheritance. And additionally, it is a maintenance overhead for DBAs. Partishing maintenance requires a lot, a lot more complex procedures, like creating the check, maintaining the check constraints and triggers. But luckily we had many extensions. Postgres is known for extensions. And pd part minus one of the extension for automating this maintenance process. So even though it is automated by external methods, there is always, there is a maintenance overhead. We need to install this extension and maintain it and make sure that everything is working fine. Okay, so over a period of time, community saw that there is a scope for a native partitioning. And it was in 2017. The Postgres 10 is released with the native declarative partitioning feature. It was not the evolutionary nature, it was a big mutation. That's all to a completely different syntax. And the new syntax came, create table partition by, what type of partitioning it is. So range partitions and delist partitions were there in Postgres 10. And many of the maintenance overhead is removed. Automatically it creates the partition constraints internally and it takes care of temporal routing. We don't need the triggers anymore. And moreover, the Postgres internally checks the partition bounds and ensures that there is no overlap. And additionally, new features like attach partition and detach partitions also included. So there is a SQL interface for that. So because we discussed about the partitions are basically tables. We can attach a partition, attach a table to a partition table and it becomes a partition of that table or detach a partition and it becomes an independent table. So this is a really nice feature, especially for archiving and purging requirements. This is really wonderful. And Postgres 10 supports multi-level partitioning. We can have partition and there can be a subpartition and the subpartition can be of another type. Say we have a range partition, there can be a list partition within that. So all these things were supported. And so quickly back to the demo to understand the concept clearly. So I'm just going to zoom this area. The Postgres 11 part, sorry, the 10 part because that's something new in Postgres 10. So we can see the title here. This is PG 10. So I'm going to zoom to that area. Okay. And this is how it works. I am going to create the table. Let me drop the previous table. Okay. So that's done and create a table. So table is created. Sorry for... Okay. Let me go back to the previous screen size. It will be good for me too. Okay. So table is created. Sorry for... Let me do that for all the versions. Re-executing the things. Okay. So table... Sorry for the problem. Let me drop it again and create again. So from Postgres 10 onwards, we have this partitioned by range. And then we can create partitions, say for events on quarter one and quarter two. And when we create the partition for the quarter two, it is not a partition, but it's a partitioned... It itself is a partition table. So it can have subpartitions. So because we are expecting a lot of activity on the quarter two. So we can go and create more subpartitions there. And so you get the concept, right? So we can have partitions and subpartitions for each of this range. So based on the requirement. So there is no limitation how you are going to partition the table. Okay. So this is a new concept introduced in Postgres 10. Okay. And let me drop the table. Fine. Going back to the presentation. So we saw the multi-level partitions within the partitions. Okay. And this is a really wonderful feature. Something which was not possible earlier and never thought about. Okay. Next is another wonderful feature is the partition itself can be a foreign table. So there is a syntax for that. A foreign table name partition of another table. So the partition need not be sitting in a local database or it can be on a remote Postgres instance or host machine itself here. Each partition can be on a remote Postgres SQL and it uses the Postgres SQL FDW. So this kind of feature is really wonderful. So this is an added feature which we never expected before Postgres 10. Okay. And everything looks really nice. Wonderful set of features. But there were a lot of problems. Okay. There were no unique keys possible, no primary keys, no foreign keys. So this created a practical limitation for Postgres 10. So the adoption rates were not so good in when the Postgres 10 came up with the native partitioning because of all these limitations. These are very basic things which we expect from a table. And that's the time Postgres 11 came in 2018 fixing all practical problems. That's we are going to see what we know that post system came with the post the native range partition and list partition. And then now we have even harsh partition also majority of the partitions in the real world will be still range and list. But Postgres 11 onwards we have a hash partitioning feature also which is requested by a lot of users of other database systems because they will be migrating and they want similar things. And so some there are still some of use cases like data distribution. So the hash partition advantage is the data will be almost equally distributed to all the partitions irrespective of the partition. And it helps in sharding and the parallel processing. So there are specific use cases for hash partitions also. So that's a new completely new feature in Postgres 11. And we talked about all the practical problems are solved. Yes, we have unique keys, primary keys and foreign keys in Postgres 11. So coming back to the demo. So we are going to create the same table on different Postgres versions. And we can see that in Postgres 10 it failed. Primary constraint is not supported on partition tables. But the same thing works in Postgres 11 onwards. 11, 12 and 13 we have the primary keys. Okay. That's that's that's solved some of the practical problems. So let me create another table. Create a table called partitioned partitioned by range. And had a unique constraint in Postgres 10 it fails. But in Postgres 11 onwards we can see it is successful. And adding a primary key that also fails in Postgres 10. But from 11 onwards it works. And let me let us create another table say blah. Okay. And add a foreign key constraint. It fails in Postgres 10 but 11 onwards it works. So these are the practical problems which Postgres 11 solved. Okay. Coming back to the presentation. And another beautiful thing in Postgres 11 the migrations. Okay. What is migration? The row can migrate from migrate to write partitions on update modification to values of partitions partition keys allowed. Okay. And coming back to the demonstration. So let us create two more partitions. Okay. For the previous table. Okay. So if you describe. We can see that all the partition constraints are already placed properly. This part. Okay. And let us drop the foreign key. Because in Postgres 10 there is no foreign key possible. So it's not existing anyway. Okay. And let us move to the demonstration of row migration. So let us insert some data into this table. Okay. This table insert a couple of rows. Sorry. One row to the table. And if you look at where the row went, we can see that it went to partition partition for in the range of one to five. Okay. And if you look at the other partition, it is empty. Which holds the value from five to 10. And what happens if you update this update this. We have only single row in the table. So we just update postgres 10 complains that. While is the partition constraint, but it is successful on post 11 onwards. And what happens behind. So if you look at the previous partitions, we can see the partition for range one to five is empty now. And the row is migrated to the next partition, the appropriate partition. This was really required in real world because there will be updates happening in the table. And we expect the partition table to work and handle the situation instead of throwing some error that it violates the partition constraint. Yeah. So that was a really practical problem solved. And catching the following. So we. The problem of what if what happens if if there is a value which is out of the range, which we specified. So post 11 introduces default partition. So if there is no partition available, which can satisfy the condition, the row can go to the default partition. And this is important, very important for a real production deployments because we will be partitioning an existing table and we won't be respecting the partition table to function as it was without throwing any errors. Yeah. So now from posters 11 onwards, the default partition is supported. That's really, really, really nice feature to add. And this is another thing. Still, I'm discussing with some of the Postgres users about this feature because this is so beautiful. We can define indexes on the on the partitioned table. And whenever a child table is created index will be created on that automatically. So an additional indexes can be created on each individual partitions based on the requirement. And say, for example, if there is a live partition, which holds the latest information. So we may need additional indexes on that particular partition alone, that's possible. And this is a performance improvement in Postgres 11. The partition wise aggregate. So what happens is if the partition key is part of the group clause, say we partition the table based on monthly and our query requires a group by month. Okay. So now the Postgres knows that the data for each aggregation can come from individual partitions. So partition wise aggregation is possible from Postgres 11 onwards. And if the group by clause does not contain any of the partition keys, we can still perform partition aggregation for each partition and then finalize the aggregation after the appending partition process. So it is go beyond that capability also. And we have few cases where we saw up to seven times performance improvement because of this partition aggregation happening over foreign servers. So the aggregations can happen on foreign servers and the result can be sent over to the local machine. So it gives a lot of performance benefit. But still not perfect. Postgres 11 came up with so many good features and fixes. So what was the problem? The performance. The numbers you are seeing in the screen is from a real user. The planning time was several times than the actual execution. These are real world numbers. So we had many user complaining that they partitioned the table. Everything works wonderful. But they are not seeing any benefit out of it. Any performance benefit out of that. Yeah, that's a serious complaint. Planning time is very high in many cases. Partition pruning is not intelligent enough. Many times it's resulting in poor plan. No performance gain after partitioning and sometimes even slower performance than the unpartition table. This is the most difficult situation. We guide people to go for partition. And finally they see a negative performance result. Yes. And another serious problem. Performance degrades as the number of partition increases. So those were the serious problems till Postgres 11. And that's the time Postgres 12 came in 2019. It's faster. It solves all performance-related pains. And here is an example. So I'm not going to do the live demo, but just keeping the screenshots. As we can see in the screen, the Postgres 11 was taking a lot of planning time. And the same query takes considerably less planning time in Postgres 12. And not just select statements. Here is an update statement. Here also we can see that it takes very less compared to the previous version. So almost all queries which is using partitions were started running much faster in Postgres 12 without any code modification, from application perspective, without any change. Just out of walks. Just after update, they started seeing performance benefits. And not just statements. Say there is a huge performance gain when we do bulk data loading. So even when we do just PG dump and PG restore, there is a performance gain, noticeable performance gain. And in typical PG dump and PG restore, I see up to 21 percentage of gain. But a lot of reports that there are seeing up to 40% performance gain. So other things, insert performance improved. Better scaling up as the partition count decreases. This was one of the pain point in Postgres 11. Now it scales well. Even if we have more partition, the performance won't degraded as Postgres 11. And primary key lookup becomes several hundreds of times performance gain in at least in some cases. So this were the real world use cases where people started seeing the performance gain. So in the underlying thing, we saw that the planning time decreased because in Postgres 12, the metadata for the plan is loading, the metadata loading is performed after the partition pruning. The partition pruning took the precedence than the rest of the things. That was a big change in design, but it resulted in a lot of performance gain in multiple areas. And another planner-related modifications, improvements, say we can see that the merge up and node is gone in Postgres 12. So unnecessary merge up and node, merge up and or just up and nodes are gone in many cases, resulting in a better performance. And it is not just these things. So I had a different talk on the only Postgres 12 that at least 15 slides were about the Postgres 12 performance improvements. So this is the summary of some of other changes, other improvements in Postgres 12. So as I already mentioned, the partition pruning took the precedence. So post-form, rest of the things to happen after that. So it need to be performed only on few of the pruned partitions. And improve the performance of partition pruning, redesign of the initialization of partition routing. So the DMLs started performing much better. Delay the lock decisions, so the concurrency improved and speed up the planner. And multi-inserts on partitioned table. That's also working fine in Postgres 12. I was sorting when partitions are already being scanned in the necessary order. So the planner and executor, every part of Postgres 12 is fine-tuned for the partition table. So the partitioned table is a prime citizen from Postgres 12 onwards. The planner is fully aware about it and it does the right thing to do. And yeah, we discussed about a lot of speed up, improvement in speed. Is there anything other than speed? So the obvious question is that Postgres 12 is not just about partitioning speed. There are a lot more partitioning-related features. Here is another big benefit. The attach is non-blocking. So there can be better concurrency because attaching a partition is part of the regular DBA job. When a new window of the partition starts, the DBA or the script or the automation need to add a new partition. So attaching a partition is a regular part of DBA job. So now it can be performed without in a non-blocking way. It's a big game. And the foreign key references to partition table is possible from Postgres 12. It's more closer to the non-partition table experience, which is what generally people expect when they partition a table. So as we can see, Postgres 11 will throw an error, but Postgres 12 can accommodate the foreign key reference to the partition table. And when we talk about the partitioning of a table, the DBAs may not want always to come up with exciting values of partition boundary. Instead of that, they may want to have an expression. And it is supported in Postgres 12. So here is an example. Partitioning a table without actually specifying the boundary values. Now, from now onwards, till now plus five days. So this will be evaluated while creating the partition. So anytime if the DBA or the script want to create a new partition, they can execute the same statement. It will be evaluated and new partition will be created, which was not the case in previous Postgres versions. And table space specification at partitioned table level. So this was a limitation as we couldn't give top level, grade and so on, were the partitions to be placed. All new partitions created will be using the table space specification at the partition table level. So we know that the parent table doesn't hold any data, but it can have the information about the table space to which each partition should go. And we can alter that so that all of subsequent day the partitions will go there. And table space can be specified at the partitioned table using create table or alter the table later so that we can segregate the partitions. This was an important feature from the practical perspective when we deal with multiple table spaces and disk systems. And we may want to have archiving partitions into some other table space. So all this partition, the table space manipulations were really great. And these are some of the improvements in terms of DBA experience. So we can look at the partition hierarchy using a function pgpartition root. It's a user level interface. So we can say select start from partition tree. It will list out the partitions in hierarchy order. And we have a root and ancestors. There are many functions added for DBA's convenience. And another thing is the system catalog view started giving a lot more information about the partition indexes. pgindex will give information about the partition index as we can see in the screen. Post-us 12 will have the information from, but post-us 11 won't give that information. And not just server side, we have a lot more things in the client side as well. PSQL metadata commands, say we can say slash dp from post-us 12 onwards, it will give all the list of partition tables and indexes. And even if we are just describing a table, it will be clearly mentioned it is a partition table, not a regular table. That's an improvement. And same is the case with, even if you are checking a privilege on a table, it will be clearly mentioned that it is a partition table. So this is a client side improvement, but really helpful for those who are working with post-us. And then comes the post-us 13. So now many of us used to think, say post-us 12 fixes all performance improvements and comes up with all sort of new features, all fancy features. So what is there in post-us 13? It is released in 2020 and the evolution is not stopping. The big addition is we can have triggers before each row triggers are possible from post-us 13 onwards. So here, everywhere we are seeing that the partition table is becoming more and more closer to the real, the single table experience. So what was missing is added later and there is no more penalties for partitioning a table. So the triggers were not possible in post-us 12, but it is possible in 13. And there is a lot more cases for better partition pruning. And one beautiful thing about this is this actually simplified the logic. The logic behind the partition pruning become much simpler while attaining more capabilities. The logic become generic, apply constraints more generally in partitioning. A constraint exclusion logic itself won't be invoked in avoidable cases. So as we can see in the screen, instead of going through multiple partitions, in many places it will result in a single step. So a lot of optimizations in terms of planner. And another thing in terms of planner is advanced partition matching and the partition wise joining. Previously, the partition wise joining technique only allowed when the input partition tables had exactly matching the same partition bounds. So if the boundaries of the one table is a certain range, the next table, the partition also should be of same range. But the new algorithm overcomes all those limitations. And as we can see in the screen, this was the previous condition. And now it can happen, the partition wise joins. So you can see that hash join happens on the partition wise. Instead of preparing a complete hash and then not showing a duet thing. So this is much more efficient. And this is another practical useful feature in partitioning. Now partition table can participate in logical replication. This feature allows partition table to be logically replicated via publication. So previously partitions had to be each partition, each partition had to be replicated individually. Now a partition table can be published at the top level, can be published, causing all the partition to be published automatically. So we don't have to specify each of those partitions. And addition or removal of partition to an existing partition table causes additional or removal from the publication also. So a lot of things, a lot of problems are gone, headaches are gone. So new additional option to publish via root. So there is a syntax as well, how to be published, published via partition root is equal to true. So the publication happens on the root of the partition tree. That was not the case in process 12. We couldn't do that. It used to throw errors. And the advantage is not just a publisher part, even in the subscriber part as well. Now subscriber can replicate to partition tables, applying replicated changes to the target table, which is a partition table and to its relevant partition. So the subscriber part also is much simpler now. Previously subscriber could only receive roles into non-partition table or the individual partitions. And process 13 has had a lot more features. I won't be covering all of them. And this year we have process 14. So evolution is not stopping and much smarter. And now DMLs like updates and deletes are handling much cleaner. So again, the underlying code becomes more generic and simpler. So this is one thing I like about Postgres. So to add a feature, the code is not getting complicated, but the previous unnecessary hacks are removed from, and it becomes simpler and straightforward. So the work is done by Tom Lane. This particular thing is reducing the planning course and memory consumption for DML operations. It generates a single sub-plan for the target relations of a partition table. We got to end it instead of having all sub-plan for all the target relations. Change greatly reduces the planners overhead. So now DMLs are even fast with even simplification, code simplification. And when we talk about a big feature, so the previous one was kind of underlying change. Now one may not notice that, but here is a big feature. Now the detached partition can be online. So we can detach a partition without blocking. And it is implement as a little hack. It's not so straightforward, so there is separate syntax. So alter table, detached partition, concurrently we have to specify. For attaching the partition, we don't have to specify this. Automatically, it will be non-blocking way. And this is one of the most frequently requested feature, allowing partition to be detached from the partition in the table without blocking concurrent queries and new syntax. And there is an additional thing, because as I mentioned, this is a little hack by introducing two transactions at a time. So there could be chances that things can go wrong. So we have an option to finalize some of the detached operation. If the session gets disconnected or about it in between the operation, don't worry, just to execute the finalize option, it will complete the work. This is like when we do index rebuild concurrently, the index can go invalidated or non-useful condition. This is the same situation. But we have an option to come out of that cleanly. So there is no problem. But there is some limitations. We can't do this in a transaction block because the entire thing happens in two transactions. So we cannot put that into another transaction. So that's a one limitation. And we have a dedicated syntax. We need to specify concurrently explicitly. And this won't work for default partition. And there are other improvements as well. The collision is implicitly caused to do the collision. So it won't throw error anymore. It will be forced to. The addition of attaching the partition is simpler and straightforward now. Re-index process to all, say if you execute re-index, it will process all child tables and all indexes of all the partitions. Internally, each partition gets processed in a separate transaction. This is one point we should remember. So when we execute a re-index on a partition, the table, each of the partitions will be getting processed by individual transactions. So another feature is Postgres FEW. Now we can say import foreign schema limited to so that we can even import partitions using the syntax. So there is a syntax improvement. That's very useful. And so we talked about up to Postgres 14. But yeah, it is not about just major version changes. Even there are improvements happening even with the minor versions, which is silent inside. Some of the examples are there. So keep upgrading your Postgres. Keep up to date. You are going to get all the benefits of new features and improvements happening in the system. And some of the good talks and the references and things like that. And thank you for attending the conference. And if you have any question, please post it to us and we'll be replying to that. Thank you very much.