 So welcome back again Our next speaker is make a seconder He's a core team member postgres committer postgres major contributor everything about he knows that most of things about postgres And now he's going to talk about postgres 11 features. Thank you Now we're on yes technology I'm gonna say good morning even though it's you know noon because We consider it morning I just want to make one more note For the record and for Devin before I get started which is no And then we're good to go So we're I'm here today to talk about postgres 11 Who's already using postgres 11? It's good. Who's already using postgres 12? You guys are boring So as Devin said, my name is Magnus Hagander. I work for a company called Red Bull in pro We're an open source services and consultancy business in the Scandinavian area. I'm out of our Stockholm office Where I work well primarily with databases and unsurprisingly that means postgres in our case That's kind of why I'm here within the postgres project. I'm one of the core team members I'm one of the committers on the database back end and I do a lot of work for postgres Europe Which is the non-profit that coordinates a lot of these activities across Europe for one thing our Dev room here today and the PG day that we had on Friday more downtown But let's talk about postgres 11 And let's talk about the new things in postgres 11. Some of you have Seen these things before I do this sort of the new version of postgres talk every now and then the format's the same Luckily the content is different. There are actually new things in postgres 11 And as with all postgres releases we go for approximately one release per year Targeted at release in September What happens then is the postgres 11 actually started in August of 2017 When we branched off postgres version 10 At that point we opened up the development of our master branch for what would become postgres 11 Now in postgres everybody has you know their own names for everything We work with something that we call commit fests Which is basically our way of doing iterative development where we do the idea is I should say because we all know how well plans work But the idea is to do one month work of building features and Then one month worth of reviewing and then committing these features That's the commit fest and then we just do rinse repeat that for four times Which you know if you do the math that's eight months not 12 But then it takes a while to stabilize it and get it out to be an actual release So we started in August of 2017 and we finished in almost September 2018 It some people says it was October, but you know we can pretend that it was almost September We're only less than a month late is not all that bad. So the release has been out since October of 2018 Which is why I guess I'm actually surprised that so many people put their hand up for that you're already running postgres 11 That's great news. Well, I hope it's great news. I hope you know it actually works for you but That's actually unusually fast About this far into release. I would normally see fewer people who are actually on this version and more people on the previous version Anyway, let's get into the features that we want to talk about here. I've tried to separate them up a little bit into four sections We've got DBA and administration sequel and developer Back up and replication and then just sort of performance because everybody loves performance And there's always, you know, new interesting performance things coming out So let's start in the area of DBA and administration and first of what what's the actual difference between DBA and developer today Look DBAs do development and developers do whatever and DBAing But I just decided to split saying well, if it's used Through SQL then it's developer And if it's you know config file in command line Then it's at DBA So let's start with a small feature or potentially big depending on whether you're needed Those of you who worked with Postgres for a while have probably run into the fact that our transaction log or our Wow Is made up of 16 megabyte files and They're always 16 megabytes and if you happen to Generate a large amount of transactions when you do your log archiving for backups It you know sends the files off one by one and if you generate many of these files every second You know generate a few hundred megabytes in a second. That's a lot of overhead in just transmitting the files You've actually been able to I think from the day that transaction log was added to Postgres change the size of these files You just had to recompile from source and get an incompatible data directory format, which you probably didn't want 11 will let you configure this while segment size By say giving a parameter when you initialize the database You still won't be able to change it once you've initialized your cluster But at initial point you can set it. I would expect the normal Use case for this is if you have a very high transaction rate system that generates lots of transaction log You want this to be a higher number You can also set it to be a lower number if you are really resource constrained and things like 32 megabytes of disk space is way too much But most cases today when we're dealing with database servers like 30 megabytes. It's really not that much But there could be a use case for it The next thing I want to mention is something that can potentially break your things potentially Who's using the plug-in PG stat statements today? Okay, the rest of you should really look into the plug-in PG stat statements It's awesome. It gives you a lot of insight into what your system is actually doing in a very useful format now PG stat statement tracks your most used most frequently used SQL queries and the system across time and It hashes them and gives you something called a query ID, which uniquely identifies this query Prior to version 11 this query ID was 32 bits In 11 it's 64 bits Normally, you don't really have to care But particularly if you have a system that for example snapshots the data from this Into into monitoring to generate, you know, cross-time graphs or something You need to update that system as well to support the 64-bit version It really is that simple But there are systems out there that break it also means if you're using one of the more common tools around this there there are you know the web visualization tools For PG stat statements. You just need to upgrade them as far as I know. They've all released updated versions that support it But you need to actually install that version Otherwise, it doesn't help We've added the ability to collect statistics on expression indexes If you've been tweaking postgres query tuning, you know about the command Alter table set statistics because you get a better query plan You tell postgres will look at a bigger part of the data to build more detailed statistics to generate better query plans and Previously you could only do that at the column level So you could say this column we need to collect more data about this column to get better query plans Making planning more expensive trade-off for better query plans, right? The new thing in 11 is that you're now able to do this for expressional indexes an expressional index is You create an index on well something that's an expression. So it's not actually a column In this example here created an Created an index on x and then you know x y and z plus t The last thing there is not a column. It's an expression It could be a function call to do all sorts of different things and previously there was no way to tell postgres to look more at that Now the syntax for doing this is alter index and then alter column and you actually give it the Column ordinal in the index so the third column in the index because columns and indexes don't have names They just don't have names When you do on a table, you'd say alter index blah alter column my column But since it doesn't have a name you have to give it the number Other than that the set statistics works the same The default value is normally 100 you increase it collect more data analyze takes longer query planning takes longer Hopefully you get a better query plan The other big thing that we've done to indexes is we now have something called include indexes People used to other databases will also know the term covering index for example The idea here is that we can add columns to our index making them less efficient And of course everybody wants less efficient indexes, right? The idea is we do this in order to be able to use an index only scan If you look at something like the exemplar I have a unique index on my table if I have a unique index like my primary key Then obviously that index can be used to look up things in this column. That's unique If I want to be able to look up one more column by using an index only scan I can add this column to the index But this becomes a problem because now the index is no longer unique on my what I thought was my primary key Right is now unique across both of the columns and The difference with include indexes is when I say something like this I say create unique index using B tree on ID means it's going to be a regular unique index on ID And then I say include second field It's going to add second field to the index for every row, but not to the key So it's still unique only across ID But any query referencing the ID and second field columns and no other columns can now be satisfied by an index only scan And you never have to look at the actual table But for as a regular index this index just became less efficient Because there's more data in it less data fits in cash So it's a useful trade-off But it's something to consider that the actual index does become less efficient But the idea is trigger the index only scans Anyone using the PG pre-warm extension A couple of people not too many it comes in Postgres contrib PG pre-warm has been around for a while The idea behind PG pre-warm is for example, if you need to restart your server You can run PG pre-warm. It'll snapshot the information, but what's in your cash? Because normally when you restart your cash goes away So you can tell pre-warm snapshot it restart and then load it back And the thing that we can do now in postgres 11 by setting this to automatic is we can have it just snapshot every five minutes And if it crashes or you know, you get a replication failover or something like this It will automatically load the cash as it looked up to five minutes ago Which is normally good enough Now what it does actually store it doesn't dump your entire cash So if you have you know 50 gigs of cash, it doesn't write 50 gigs of data It just writes a list of which disc blocks were in the cash Not the contents of them just a list of them So in particular in cases like Recovering from replication failover and things like that. This can be a good way of saving this Terrible performance blip you usually get when no data is in the cash anymore We've added a bunch of new default roles This is part of the steps towards getting rid of super user. Well, not really getting rid of super user But getting rid of the need to use super user So we have three new roles PG read server files write server files and execute server programs They kind of tell you what they do right if you grant PG read server files to a user They can read any file on the server as long as the postgres OS user has permissions on it and if you grant PG execute server program. Well, they can execute programs on your database server Which may not be the greatest idea But it depends on who you're granting it to and it's still better than super user Because super user can do all these things and more So it's making these things more granular and making it possible to make it more secure installation I think one of the most requested features in postgres over many many years Ultra table add column who's ever accidentally done that in production and added the column with a not null and default and then looked at the downtime counter as the entire table was rewritten Well, you can now actually add a column not null with a default and It's more or less instant does no longer need to rewrite the table Now it does have to be a non-volatile function You can't set things like current timestamp because that would change over time That would be very strange But you can give it an actual default value and what happens is the postgres remembers this and then as the table gets rewritten Either by you know you updating something else in the row creates a new copy of the row Then it'll materialize the default value into that row So eventually it'll be like slow updating across the table and eventually it will be just like it was before But it does all of this without holding an exclusive lock and rewriting half your database and causing downtime and evil things So that's I'm sure everybody has made that mistake at some point if you haven't well you will And luckily it will make much less effect now as long as you set it up, right? So let's take a look at some of the SQL side of things Who's using postgres full-text search? Been enhanced further in particularly doesn't actually enhance the the search itself But there we've previously had to TS query which takes a postgres specific syntax plain to TS query which takes a plain text phrase to TS query which takes a phrase and turns these into the internal query syntax now We have web search to TS query which takes what you expect and turns it into a search thing Basically it takes you know the syntax is that we are used to using in search engines So you can type you know foo and bar you can use parentheses to group things You can do not you can do all these things in like free text mode And it will parse those and turn it into a reasonable search query and then run that So I would say almost in any case if you're taking user input and Doing a search based on it, which is not exactly an uncommon way of using full-text search You probably want to be using the web search to TS query and it's a drop-in replacement Of course the syntax changes a bit for the user, but functionality. It's just passing the search term run the search It's kind of what you want In almost every case the previous ones are of course there if you need the more control and if you need to be More specific about what you're doing Postgres domains have been enhanced to basically do things that They couldn't do before there were like little corner cases. You can now create an array of a domain and You could order and you can create a domain over a composite type So you can do a domain over an array over a row over an array over a domain. Yeah. Well, okay, don't do that But basically it's been one of those. Oh, there is a little one box that wasn't checked on things You could do with domains And now you can do those things which is always good On the SQL side the really biggest thing I think is full support for a sequel 2011 window frame clauses all of them our resident SQL standard expert has been telling me that we're actually at this point the only database that does that Or at least we were about half a year when I last saw him give that presentation I don't know enough, but we do add a lot of them In particular, the new added feature is the range between Previously we could only do rows between on the window clause So it now have those values and not just row counts and The fact that we have exclusion clauses so we can exclude current rows. We can exclude ties and things like that Of course, everybody knows exactly what that means, right? Okay, let's try an example If you look at the first one, this is the classic the way we used to do it Well, which we of course still do we do a select I in this case is silly table with one three five seven nine Etc. We said I and then we say some I over Or to buy rows between two proceeding and two following That's what we've been able to do before it for each row We declare a window that's the two rows prior to this row this row and the two rows following it And then we do a sum over those and we get a you know running average or running some across our data So that's our column that gives us, you know nine sixteen twenty five, etc The different thing that we can do now is we can also say some I over Or to buy I range between two proceeding and two following So instead of looking at two rows, it will look at two values So for any row if you look at for example the row five Well, the 25 here is one plus three plus five plus seven plus nine The 15 comes from well, it starts with the current row and goes backwards. It's like oh this one is two or less Okay, include this one is not so stop And then the other way though. So in this case, it's three five seven Now if my data is this uniform, it's actually every second value I could just say rows one before and one after that would be smarter But this lets you look at arbitrarily arbitrary data through that And the exclusion thing that we can add is we can just say well also when you do this just exclude the current row Then you get the two before and the two after but not the current And you can say exclude ties which means you skip rows if they have the same value as the previous one Things like that So you can do that one of the more interesting things around this I think this is a more interesting example through it as you can use any data type in Postgres That's sortable basically that has the B3 operators as required So in this case, we're saying you know select T and then we're counting over Order by T range between 15 minutes preceding and 15 minutes following This you presumably T is a timestamp So we're saying take create a window of 15 minutes before us and after us and just run it across our Enter data set and return the count We can return many aggregates. We can return sums or averages or sort of anything like that And we get data back like this for every row. You can say well at this time stamp at you know zero zero thirteen Going 15 minutes back and forth. There were 12 rows in our data And it's with use cases like this that this feature starts to really pay off and you can do really Interesting analytics that previously required you to either write a stored procedure or Copy all your data up into the application and just loop over it And obviously this is going to be faster than either one of those ideas Because none of those is this sort of ever going to be so basically you have the count and the over You always have to have an order by otherwise it would be like yes, please take me random data and give me a sum We don't that doesn't really help you create the range 15 minutes preceding 15 minutes following creates the window and then it just runs the window across the data The other big thing at the SQL level is stored procedures And some of you have been around for a while It's like hey didn't people used to say that the reason we should use Postgres instead of MySQL as the Postgres had stored procedures and MySQL didn't Like back in the days when they didn't It was like yeah people kind of said that but Postgres never actually had stored procedures Postgres had stored functions very similar Slightly different you know the important thing is uses the SQL standard syntax so you can now say call instead of select It's really important right It's the main thing now the really interesting difference between stored procedures and stored functions is that stored functions Runs inside of your transactions stored procedures can control your transaction You can't do rollback or commit inside a stored function. You can do rollback and commit inside a stored procedure The syntax and how you create it is very similar to how you create a function except instead of saying create function you say create procedure and You don't have a return value the same. Well, you don't Return as results that in the same way So if you look at this we say well we say begin we say insert values one commit that Insert value to roll back that if you tried this in a stored function You would just get a syntax error saying sorry you can't do this inside of a stored function Now what you'd expect if you write this right you read that you go well if I call this Store procedure the table should end up with just the one right? And the two would then be rolled back and gone if you do it. Well, that's actually exactly what happens So it does work the way you would expect it. It gives you the ability To do full transaction control in stored functions. You could use save points, but you can't control the outside Transaction that sort of runs everything So that's also like a big thing in particular. I've noticed I do a lot of work with people migrate from you know The big expensive proprietary databases into Postgres And there are a lot of patterns there and a lot of in particular things like loading jobs and ETL jobs kind of things that really Use this heavily you've usually been able to accomplish the same thing in Postgres previously But this makes it a lot easier both from the migration and the actual Implementation side of things So let's take a look at backup and replication. You all do backups, right? No, you don't I Mean come on really There've been a couple of there's been no super major things but a number of incremental improvements Postgres adds something called the ability to advance replication slots Primarily, this is not something that you will use directly It is something that your replication management tools will be using you know tools like Patroni or rep manager or all of these things around Postgres The idea is we have these replication slots that keep track of where different nodes are in the replication stream But if you have say you have a primary in a standby and then you do logical replication to a different machine and Then you get a failover between your primary in a standby It will actually lose track of the logical replication step by default Because the replication slots are gone and what you could do previously is well Then I create the replication slot on the standby as well and the logical replica will work just fine Except now your standby will instead run out of disk space because you have a replication slot that blocks disk reuse and There's previous to be no way to move this replication slot without actually replicating the data So there were you know workarounds where let's create a replica that pipes all this data to DevNol Which is not really productive The idea is very simple You can now at regular intervals just move all your replication slots across a cluster of many machines to be in sync They never have to be exactly in sync But they you know close enough in sync within a couple of megabytes so that you don't run out of disk space So this makes the kind of like mixed clusters of mixing physical and logical replication a lot more manageable Speaking of logical replication who's using logical replication in Postgres 10 or 11 today? That is surprisingly few that was one of our headline features come on guys. You should be using it since 10 It took a lot of work to build that One of the things that we couldn't do in Postgres version 10 is logical replication did not replicate truncate It would replicate your insert updates and deletes, but if you actually did truncate on the primary The table would go empty on the primary, but it would keep the data on the standby because the truncate wasn't replicated Which is usually not what you want In 11 it does You can turn this on and off on an individual publication You can say for this table. I want truncates to replicate for this. I don't want them to replicate Logical replication has a lot of like non-standard use cases where that actually makes perfect sense But since in most cases it doesn't make sense it will be on by default So if you just create a publication or if you upgrade your postgres version 10 Your logical replication will just start replicating truncate. It will just start working And if you don't want that you have to explicitly go turn it off With ultra publication Another more on the physical replication side and in particular on the backup side Base backups taken through the postgres internal protocol no longer include unlogged tables Which makes a lot of sense the unlogged tables are tables that don't go into the transaction log So by definition they're not crash-safe But they're useful for things like large data loading job You might load the data into an onload table and then do some processing and merge it into a regular table for example And unload tables work the way that when postgres crashes or restarts It just wipes the contents of the onload table comes up as an empty table because It's not crash-safe. We don't we can't trust the data. So we delete the data the table is there But it's empty and what we did for backups was we included this table in the backups Then we restored this table on your restore And then as you started postgres we deleted it again And particularly given that these are often used for like large loading tables That's kind of a waste of space in your backups if the only thing that we guarantee is we're going to delete them as soon as we start up So we no longer do that And similar thing for temp tables Which are even more apparent right temp tables when you do create temporary table well if you disconnect the table gets deleted That's how it's supposed to work right so by definition if postgres restarts It gets deleted if you restore from a backup It gets deleted, but it used to be included in the actual backup and get deleted on the restore instead So yeah, it's it I've seen cases where this saves a lot of disk space in the backups That we're just completely useless The other thing that backups do now is that they will validate page-level checksums as the backup runs If you've enabled page-level checksums on postgres Unfortunately, this is something that is not enabled by default and you can only enable it currently on start Sorry on initialization of your database cluster. So when you run in a DB you have to decide to use them You can't turn them on after the fact You probably should turn them on in almost every single case because you want to know if your data is corrupt Now the way that this worked previously is whenever postgres reads a disk page So you run a select query or something it reads it page from this it validates the checksum But if you have tables that are very infrequently read Or portions of tables that are very infrequently read you can get bit rot in those files and You would never know because postgres didn't validate the checksum now when we run backups We read all your data, right? That's kind of you have to do that and the expensive part of validating the checksum is reading the data Actually calculating a checksum across you know an 8k block is very very cheap today So what we're doing now is just whenever we read the data for backups as soon as we read the block we validate the checksum of it and This way because backups will include your infrequently used portions of the database So it will then notice that your data is corrupt Now it can't fix it for you unfortunately, but it can tell you hey you have corrupt data in you know this table The idea being well, hopefully you still have your previous backup And you can go restore from your previous backup and use your log archive to basically roll past the corruption and just remove the corruption But if you only get to know it you know six months later when somebody query the infrequently queried table You may not have a six month old backup, and you may not want to restore a six month old backup Right, so it's all about getting the information sooner So that you can properly handle it While it doesn't cost you so much to handle it, but again Checksums are not enabled by default, but I do strongly encourage you to consider enabling them We are looking at in the upcoming version of Postgres there will most likely it's not been committed yet But most likely there will be toolage available to enable and disable checksums Unfortunately enabling checksums on a large database is very expensive because it basically has to rewrite the whole database But disabling checksums will be very cheap and there are actually tools you can get They're just not part of Postgres today that will safely turn off checksums on a running cluster You just have to stop Postgres run a tool that runs in less than a second and start it again to turn it off Which I find a good reason for well just turn them on by default because it's easy to turn them off It is not easy to turn them back on They do create a bit of overhead In my experience nowhere near enough overhead to be a problem Versus the problem of realizing too late that you have this corruption And if you trust your hard drives or your sand or your your cloud provider not to generate this corruption then Good luck with that Far too many people do and it can be very expensive Okay, let's talk about a few things around performance, right? Everybody wants things to run faster Almost everybody wants things to run faster Postgres 9 6 Added parallelism so the ability to run parallel query use multiple CPUs for the same query right up until 9 6 You could only you had the matching of you know one connection one CPU Which becomes a problem with modern machines with hundreds of CPU cores and you're like yes, you can use one Sometimes we could use to one for a background job and one for your query As of 9.6 we could use multiple ones But in practice It didn't really help that much because the limit it limited the number of queries like the number of types of joins the number of types Of scans for example in 9.6. You couldn't parallelize a query if it involved an index scan Postgres 10 made it really really useful because it removed most of these restrictions I can actually say I think I'm still at a point that none of our consulting customers have actually had any use for parallelism in 9.6 and Probably 80% of them have had use for parallelism in 10 So it's really that's when it became useful now 11 adds a few more things that makes it even better There are a bunch of general enhancements that just sort of you know silently makes it better and you don't have to care We've added something called a parallel append plan nodes Doesn't really help you know that it just runs faster And we have something called parallel aware hash joins whereas previously if you were running a hash join It supported parallelism But it did basically if you parallelized it across four processes in a simplified version You can say it ran for parallel hash joins. They built their own hash tables. They were completely independent just on different parts of the table With the parallel aware hash joins they all share a hash table So you can get a much bigger hash table much more efficiency by still using it across multiple processes And these are all things that you don't have to do anything about this is just like literally run 11 and your queries will be faster That's the best kind of enhancements, right? We now have the ability to do parallel create index Because well create index is a DDL command that usually takes a long time if you have a lot of data And it's very often CPU bound Unlike many other DDL commands which are blocked by the amount of IO you can do create index in its simple form Well, it loads your whole table. It sorts it and then it writes it out as an index, right? Obviously, if it's a large table, it's not quite that simple But the basic principle is that and sorting takes time sorting is expensive So we can parallelize that We have a new parameter called max parallel maintenance workers Say that fast four times The default value is two which means that your create index will use two CPUs to run There are definite cases particularly if you're like loading from a dump or you know Big load jobs where you don't really care about the effect of the rest of the system where you will want to increase this quite a lot Like eight you scale out across eight CPUs For example problem is of course if you run your create index across eight CPUs, you probably kill everybody else on the system So you might not want to change that globally for everything But it's something that is worth considering. Now, unfortunately parallel create index only works for B3 indexes The most expensive index I think we normally create today in Postgres is for example post GIS geographical indexes They can't be paralyzed Well yet, but they can't be paralyzed in 11. Hopefully we'll get that sorted, but it's not there yet So parallel query was the big headline feature of Postgres 9.6, right? And it made really useful in 10 the big headline feature in Postgres 10 was partitioning and In my view sort of just like parallel query was not really that useful in 9.6 Partitioning is not really that useful in 10, but it builds a lot of important infrastructure and 11 for partitioning is like 10 for parallelism. It starts delivering like now is when you get all the benefits There were a lot of things Declare that partitioning with a syntax and basic functionality was in 10 So if you were using the previous sort of manual partitioning and your own 10 It's definitely worth migrating to this one But it fundamentally works exactly the same as before just with prettier syntax But it has all the same drawbacks almost it fixed like one thing Now 11 fixes a bunch of things number one it adds support for default partitions In postgres 10 if you added partitions and you try to insert a row that didn't match anything You would just get an error In 11 you can add a partition like this, you know partition of the default and every row that doesn't match anything else will go here One of the absolute biggest thing is if you did in previous versions if you didn't update on a row So that the partitioning key changed so that it would end up in a different partition. It would just fail You couldn't do that once you put a row into a partition. It would have to stay in that partition In 11 you can just update it. It'll get moved to the other partition. It'll just work What you had to do previously was basically manually delete it and then reinsert it Then it would work, but that comes with other like it's a lot more work It's tricky and the whole point of of partitioning is that it's supposed to be transparent It's supposed to look like one table to your application while actually being a hundred tables underneath and then having to not do update It's not really transparent It's still not perfectly transparent, but it's much closer Now you can still get some concurrency issues because in practice what it does is a delete and then an insert but they're Not as bad as they were before but the code in theory you could do the delete and then somebody else does the same thing at same time and things like that Another limitation and an advantage is that you had no way to actually once you'd partitioned your table You couldn't create indexes across the whole thing you're in you would have to create indexes individually in each partition 11 ads what we call local partitioned indexes, which is you can create an index on your master table Which will then automatically get created across all of your partitions and most importantly if you create a new partition It will also get the index So it keeps going now you can still add Individual partition indexes as well because maybe you know if you have time-based data your old Partitions can offer can afford to have a few more indexes because they're mostly read only so you can still have them But you can create these indexes that will Spread out to all of them including future ones and You can actually now create a cross partition unique constraint as long as As the all partition keys are part of the constraint Because then postgres can know that this is actually unique across all of the All of the partitions by the fact that it is also unique on the individual ones Now this is the foundation for being able to have a foreign key pointing to a partition table Unfortunately, we can't do that yet We're hoping to have that it's a patch that's being reviewed for postgres version 12 and just a couple of days ago I think we heard the report saying it's most likely going to be in there, but it's not there yet. So no promises Other things around the partitioning insert on conflict use that the the absurd method insert on conflict do update or insert On conflict do nothing it didn't work on partitioning now it does Okay, that's good. That's also one of those it's not transparent if some things don't work Another really big thing is we have a new and better way of doing partition pruning Partition pruning is is when the query planner and executor figures out that oh, I only need to look at one of your hundred partitions Thus things run faster. That's one of the main reasons we do partition, right? The way that this used to work is that postgres did this in the planner So when you submit your query, it would look at the query It will look at the parameters to the query and based on that data it would figure out which partitions to scan and There were a lot of cases where that could fail For example, if you had a query, you know where x equals or where x in and put a sub query there Then the planner has no way of knowing what the sub query is going to return so it will scan all partitions and That defeats a lot of point the way the 11 does it is it does the old pruning first So it looks at the query and deletes it removes all the partitions It can but then it runs a second run of pruning once it's in the executor and it can run the sub query It can get oh, I got these numbers back and then it run a second pass and deletes a new bunch of partitions So there's a lot of types of queries that would previously just give you a scan across every partition that will now Just scan the partitions that you need Which again is one of the main reasons that we use partitions, right? It's to be able to scan them and to be able to delete them we've added hash partitioning previously we had list partition and Range partition now you can say hash partition which is the idea you pick a column You say I want this in four partitions have the system distributed evenly between these partitions as Good as evenly becomes based on the hash value, but if you have a good hash function, it will be pretty good so in this case previously you would say You say partition by hash of I instead of partition by list or range and then you still have to create the partitions So here I say we'll create this partition for values with modules for remainder zero Then you get a modules for remainder one two and three there's your four partitions, and you can sub partition it as long as you Keep the the ratios workable and get into more balls And this is for just sort of if you just want to distribute your data Maybe across multiple storage devices or something like that you can put your partitions across them We've added partition wise joins Which is basically if you have two partition tables That are partitioned on the same key and you join on that key Then you can actually execute that query as joining partitions individually to each other Instead of just doing one partition across all the others than one partition across all the others It has to be a join on the complete partition key But if you do that the planner can figure out and see that okay This is the special case where you can run these as many simple operations instead of one more complicated operation This is turned off by default So there's a configuration like enable partition wise joins equals false by default And the reason for that is that it makes planning of your queries on partition tables more expensive Like there's a cost involved in figuring this out But if you have a workload where you know that you're actually doing this you're joining tables that are partitioned on the same keys Then you should definitely consider turning this on for those particular sessions by you know Just issuing the set enable partition wise join equals on or true or one or you know whichever one you prefer So that's one of the few of these that you actually have to turn on most of the other things are just there We've added something called partition wise aggregates That's again if you do a group by and you group by on your partition key or You know part of the partition key. Sorry the partition key is part of your group by then you can actually run the group on the individual partition and Bring up the summary values and then run run it again on those thereby making the scan much more scalable For example, I mean the sum of a bunch of rows is the same thing as the sum of the sums Then it's based on that as long as you have the partition key There is some other sort of general performance The particularly big thing you'll see here is Postgres now has a jit compilation of expressions and Jit compilations of what we call tuple deforming and forming which of course everybody knows exactly what it means, right? It uses LLVM So the availability of this actually depends on your package Postgres What's normal is you may need to install an extra package for example if you're using the Yum distribution on Red Hat or CentOS There's a separate package that you install on top of your Postgres that just enables this and what it does is For example, it does Expression in lining which is if you run like date big data warehouse inquiries the way the Postgres works is you know You've run the expressions Normally in software and it's not super optimized for the CPU because it can't be because we have all sorts of interesting data types And what it'll do is when the cost goes above a certain cost It will trigger and have LLVM By compile this particular expression, you know take this column plus this column divided by this column and turn that into machine code and Then run your query with that Which you know if you're processing a few hundred million rows or something that can be massively much faster Seen benchmark of you know ridiculous numbers like 90% That's obviously a constructed benchmark. There are also cases where it will be slower Because it takes longer to do the compilation Then it takes to to run the query itself This system tries to only enable it when it's beneficial It will not always succeed. You can there's a tuning parameter that you can tune how high should the cost be before we turn on jet But particularly if you are Running like data warehousing or something like that something in the process is large amounts of data this Go look at installing those packages. It will definitely make things faster If what you're doing is simple OLTP, you're never looking at more than two rows at a time It's probably not gonna make a fundamental difference It's for the cases where you will run a lot of expressions fast So that's a lot of features right then we probably covered. I don't know 5% But hopefully we covered all the big ones There's always going to be a lot more things right there's a lot of smaller fixes There are a lot of performance improvements that we can't go through If there are so usually say if there's any of the Postgres developers in the room and I didn't mention your feature I'm sorry I'm sure it's important because all the features are important. They're important to somebody. They're important to somewhere. So Please help we do still need your help Postgres 11 has been released But we still need your help out here to you know download and test it and run it and some of you are already That's awesome. Now you need that need to go test the new features that you maybe didn't know existed Download your packages run your application and let us know We're still very much interested in feedback on things like in particular, you know Oh in this workload, you know jit made things slower or you know the parallelism doesn't actually work and things like that Where we believe we're in pretty good shape on the effect that we still deliver the correct responses But there can definitely be you know performance regressions under certain workloads Where we may need to tune Default values of parameters and things like that and if we get that feedback in early it makes it easier You don't have a version of postgres 12 to test on yet, but feedback that you give us on postgres 11 now is going to make it into postgres 12 Which you're going to see in about six months most likely development ongoing. We hope to have it ready in Let's call it September And we'll see what happens. We hope to have it ready this year We all know you know time plans and things so that's all I wanted to mention I want to thank you very much for showing up and staying awake on the second day of false time. That's pretty impressive If you have any further questions, I think we have like three minutes or so before we have to get out And after that feel free to just grab me outside for any further questions, but anyone Yes, is it on yeah, I believe you said for the partitioning that there is now a feature where it will Run a sub query first and I'm replan based on the results So that is not entirely correct it will not replan things based on the result it will rerun the partition pruning So the only part that runs is the whole oh, I know that I don't need to scan these partitions over here Because based on the things that came up. I know all my data is over here. It doesn't actually change the query plan Well, that's a shame I see a lot of cases where that might help actually there That would definitely be an interesting addition sort of generic replanning when you start running a query and realize Oh, this is not good, but that's not something that we have today. No, sorry Anyone else any questions Good one in here. How should we take care of indexing when we use the window range operations? Do we preferably have an index on the field that is used for the ordering as I guess, but do we have other things to take care about? You mean for the the window query. Yeah, but you presented earlier It's it's really the same what you what it's running underneath is, you know, it's select with an order by so you yeah You need the same kind of indexing considerations as you would if you just did that select with an order by Okay, you may also have an outside wear clause and then combine it with the order by for an index like that But but at the bottom, it's the same kind of indexes that you would have for existing indexes would be used efficiently for this kind of operation We don't have anything on top. Yes exactly existing indexes will absolutely be used Of course if you run it across your entire table, it's probably not going to use the index But it didn't do that before either yeah, because that becomes too expensive But but it's the same considerations as you you had for sorting that kind of data before Okay, thank you Yeah David run Okay. Thank you for transaction and procedures now we can build the detail process without any third-party system And I would like to ask about small thing calculated field to have calculated field and table in tables like in the method square to have indexes on this in this field and Yes, so we think like I think the standard course is a generated column Right where you you define that this column is going to be x plus y All right, we don't have that what you can do impose because you can create an index on the expression Right and that will be used if you're querying it and then you can create a view on top So that you read it through the view But there's there's no way to materialize it into the table on unless you want to write the trigger Thank you. Yeah, thank you. I think it's time to kick everybody out and change the next talk. Thank you very much