 Hi everyone, I think we're ready to get started. I want to welcome you to Postgres 12 and PostGIS 3.0. This session will highlight a number of reasons to upgrade to the latest and greatest versions, including parallel query improvements, generated columns, and covering GIST indexes. My name is Lindsay Hooper, and I'm one of the conference organizers for Postgres Conference, and I'm also going to be your moderator for today. I'm here with Ryan Lambert, owner of Rustproof Labs. Ryan's been working with GIST since 2011. He got a start in PostGIS when, on a quest to update a roadmap, he started using PostGIS, Postgres, and OpenStreetMap, and he has since been a contributor to the OpenStreetMap project since 2015. He's given many talks on PostGIS and had it run on large to small scales. He's currently working on a book on how to use Postgres, sorry, PostGIS and OpenStreetMap together. With that, I'm going to hand it off to Ryan, and Ryan, you can take it away. And thank you to Postgres.com for hosting this webinar series, and of course, thank you to everyone who has joined us live for today's session. It's great to be here continuing this webinar series talking about Postgres, PostGIS, and OpenStreetMap. This is number five of a six-part series, the prior four sessions have all been recorded and are posted online. The link at the top from that first session has a link to all the other recordings as they're posted, and today's recording will be the same. It'll take me a couple of days to get it up, and I'll also include all of the links that I show during the slides today. So today, we are talking all about the new versions of Postgres and PostGIS, and looking at some of the features that I think are really cool and have really neat ways to impact our PostGIS databases. After we talk a little bit about some of those new features, we'll spend just a little bit of time going over the install process versus the upgrade process and some details you may wanna be aware of in those scenarios. So jumping right into Postgres 12, it's been out now for a few months. We are on 12.2 is the latest minor release, and with Postgres 12, we have a number of really cool improvements. The first for me that I noticed right away was Parallel Query finally works now with PostGIS data. Parallel Query was first introduced to Postgres back in version 9.6, and in versions 10 and 11, the Parallel Query feature received significant improvements, but up through 11, you still weren't able to take advantage of Parallel Query for our GIS data. Spatial joins, aggregations, and the like, just did not, we're not able to take advantage of Parallel Query. And with 12, that has all changed, and Parallel Query will finally kick in for our spatial data querying purposes. The first place I noticed about this was in the OSM to PGSQL process. This is the data load process that gets the OpenStreetMap data loaded into PostGIS from its source, PBF, or other formats. And I talked about this process in depth in the second of this series. And when I ran this process with Postgres 12, right away, it was obvious that Parallel Query was really rocking and rolling. And the bars on this chart show time and minutes to process the data. And this particular example was from a server with eight cores and 32 gigabytes of RAM. And I was loading the US West region from OpenStreetMap. And that's a little under a two gigabyte source file in its highly compressed PBF format. And then when it's extracted and loaded into PostGIS, it's about 10 gigabytes within the database. And so to take out five minutes of a 20 minute process or reducing the time to load this data by 25% was quite significant to me. Anytime I can get a 25% benefit to anything from simply upgrading, that's a really neat enhancement for me. The ETL process though, that loading the data is only one part of what we do with PostGIS data. We do more often, we're doing queries like this. This particular query is example I used in the prior session on performance tuning. And what I was doing here was looking for waterways that have a road nearby. And so this particular query is one that I had identified as a good candidate for parallel query. And I wasn't disappointed. This particular query runs more than 70% faster with parallel query enabled versus disabled. And while 25% benefit on an ETL process is great, 70 plus percent benefit on normal queries is really, really cool. So I'm gonna take a jump into this explain output. This is the same format I used in the prior session on performance tuning. And what we can see in this explain output is as I scroll down, we're gonna get down to this gather step here. And in the gather step, it will tell us that it planned three workers and it launched three workers indicating that it's using parallel query. Underneath this initial node that launched the workers, we'll get a few more nodes of detail that will show us what the workers were doing, how many rows it was, et cetera. So you can see that we had a nested loop here that used parallel query. We had a parallel sequence scan also. And then at the bottom, which I'm having trouble scrolling down with the zoom, the index scan also was using parallel query. So this shows up in our explain output. Another really neat feature with Postgrace 12 is when you're running these explain out analyze queries to look at what the planner is executing, there's a new option for settings. And what this does is Postgrace will now tell us what settings have been changed from their default values that it thinks would impact this particular query. So we can see here that just in time was set to off, max parallel workers was set to three, its default value is two. And I also adjusted the random page cost. So this is a really beneficial addition to the explain analyze functionality with Postgrace 12 that really lets you see what you've done on the query that might impact the planning that otherwise you may forget that you have changed those settings. So parallel query is working and it can help your queries run faster. At this particular query took advantage of it by the index scan, the sequence scan, as well as that loop. And in the prior session, I pointed out a couple of times where a lot of our spatial queries will have a significant percentage of their execution time stuck in that index scan segment. And so anytime you see a high percentage of an index scan in a non-parallel execution, it's likely that enabling parallel query will help that particular step go significantly faster. But I recommend reading up on the docs to understand when parallel query will kick in and when it won't kick in depending on what your query is actually doing. And you do wanna be a little bit cautious with parallel query because it can cause some negative side effects. If your instances have a very large number of concurrent queries running all at the same time in relationship to the number of CPU cores you have available, parallel query is likely gonna cause contention on those CPU resources. So if you already have a large concurrent load and a limited number of CPUs, you may wanna do some testing before you just blindly enable it. And also it's generally not productive to set the max parallel workers per gather higher than 50% of your cores. It just doesn't seem to benefit, maybe on a small instance or a low usage instance with a small number of users. You might be able to see some edge case gains, but in general, keeping this parameter under half of your cores is probably the best bet. And also another side effect with spatial data and spatial querying is often our row estimates that the planner receives are bad. They're just not accurate. And so what has happened in some use cases is when that row estimate information is particularly unhelpful, the planner may decide to use parallel query when in reality it wasn't gonna be beneficial, but it didn't know that because the estimates were not there for it to make that decision. And in all the new things, you just have to test with your data, test with your hardware and test with your usage patterns. Now, parallel query is one of those performance benefits that I like to be able to take advantage of. It's a great benefit, but it isn't a reason why I normally upgrade. In the case of generated columns though, this is a reason why I upgraded. This was actually one of the core reasons of why I was excited for post-grace 12. If you watched the prior session on performance tuning, you may have noticed that I didn't spend very much time talking about how to configure post-grace. I spent a lot more time showing how to write your queries, restructure your data, restructure your queries and materialize expensive steps and offload those to the beginning so that later on when you're doing repeated queries against those, you don't have to recalculate those expensive steps. And with generated columns, we have a brand new way to do exactly that. This first example I'll show, this one isn't necessarily a spatial specific query, but this was the first thing that I put into production with generated columns. I added a column to this table, Colorado Businesses. And the first two lines here look pretty much like any other alter table command to add a column. You tell it what you wanna name the column, you give it a data type. The difference here is we now have this new syntax generated always as, and then we have a set of parentheses with a calculation inside of that. And in this case, I'm concatenating a few different strings together and putting some commas in spaces where they need to be. The ultimate purpose for this particular calculation is geocoding with the tiger geocoder. The geocoder works best when it receives the address as a string formatted in this way. And before 12, what I had was this kind of code peppered through my ad hoc queries. I might have had a view here or there that wrapped this up inside the view. But now with generated columns, I can put this down at the table level. And so every time I want to do an ad hoc query and pull this address full, I don't have to think about what kind of concatenation I do and where exactly those commas need to go. This is a good way to move more complex logic out of the way that the human doesn't need to see on a regular basis. So this is a great example, but it's not spatial. More specific example here is to use the same functionality. We're now adding a new geometry column. I'm calling it way thematic. And what I'm calculating is the simplified geometry for these polygons. And this is another performance helper for the end user who's visualizing this data, especially if they're visualizing a large area. That ST simplify removes a whole bunch of points from the polygon data that really won't be rendered at any sort of scale outside of looking down at the street level. But this allows me to do that expensive step first of simplification, but the real bonus here is being able to create the gist index on that column after it's created. So that way, as I'm running my analysis in QGIS, when you zoom in and QGIS queries the database, it's gonna send off the bounding box of the geometries it needs. And this index can be used to really quickly retrieve that data. With a generated column here, you can query it from the base table just like you would any other column. So the first query is querying the original full resolution way column and the second query queries the new generated column. So from this perspective, there's no difference. I am gonna jump into this explain analyze for the original query real fast. And this is a fairly simple query plan. It doesn't get much more simple than selecting a few columns from a small table. It's gonna do a sequence scan no matter what in this case because it's pulling all the data. We're pulling out just under 500 rows and this width parameter here is what we're interested in. And we'll come back to this in a minute, but the width of this particular query is almost 6,000 bytes. And again, on accessing that data from QGIS as an analyst, when I wanna bring in the simplified column that's in this generated column, I can just simply bring in the way thematic data. It shows up right there in the layer listing that QGIS or other graphical software will show you. All right, so coming back to that width parameter. One of the width here is used by the query planner and it estimates how big it thinks each row is gonna be. And one of the most common issues when I see folks with performance issues with post-GIS data is they don't realize how much data they're actually pulling back. So this is one of those good parameters to look at how big each row is gonna be according to the query planner. And we can see here by simplifying the data we go from almost six kilobytes per row to 50 bytes per row. And this particular table only had just under 500 rows total. So on this table, it's not gonna be a huge difference, but if you have 50,000 rows instead of 500 rows, this difference in size of the data really is gonna matter to the end user using QGIS or another software. All that data has to go over the network. The software has to render it, simplify it, do all that with it. So it's just gonna be easier to process all the way through this way. Another improvement in post-GRACE 12 is the covering indexes have been enhanced to support the creation of them on GIST indexes. Covering indexes were first introduced in post-GRACE 11, the prior version before 12, but it did not allow you to create this on GIST indexes. And so this index creation looks like any other index except for it has this third line include name. And so what post-GRACE will do is it'll create the index on the geometry, but it'll also include that name column in that index's details. And the idea is if you're doing a spatial index, if you're using that spatial index and also just need that name column out, it can get all that data right from the index without having to go back to the base table to pull that data out. So this is one of those really neat things. It's not gonna, you can't just plaster this all over your database and expect it to be a benefit. You're gonna have to find the right kind of query for this to work, but it's a really neat new option. Another really neat new option is the partial decompression of our toast data. So toast stands for the oversize storage technique. And it is post-GRACE's method of handling data for a row when the row size exceeds the page size, which is generally eight kilobytes. So as a single row exceeds eight K, it will get put into this toast storage and it also by default gets compressed at the same time. And that's really great where if you have a lot of really large data, the file size storage is great to have that compression. The downside has always been when you have to retrieve that data out, it had to decompress the whole blob so you can use it. And in the case of spatial data, the bounding box of the geometry is stored really early on in that toast data. So before 12, what had to happen was it had to decompress that whole toast object in order to get the bounding box out. And now what it can do is it can start decompressing that blob stream. And once it gets the information it needs for the query at hand, it can stop the decompression. And so this will only impact you if you have really large spatial data. I did some quick testing on our data and I didn't really find much that allowed me to take advantage of this, but I'm looking forward to finding some good examples where the benefits really kick in for those large data sizes. Another change here with Postgrace 12 is just in time is now enabled by default. And what I found so far is just in time will kick in if it's enabled, but I haven't been able to find any queries in my workload that benefit or are a detriment for having just in time enabled. They all run in about the same time, whether it's enabled or not, if it's enabled, it does kick in and the explain output will show all the different metrics about what just in time was doing. But I haven't seen any personal performance benefit from just in time on. A one caveat is the OSM2-PGSQL process. When running updates, if you're refreshing your data with an update instead of reloading from scratch like I showed, some users have reported significant penalties for leaving just in time on. So again, it's one of those you have to test it with your data, your process, your workload, your hardware and see what it's gonna do for you. I'm really excited about this though because I think over the next couple versions of PostGrace we'll start seeing more significant improvements for having just in time enabled. So that wraps up the main changes in PostGrace 12. So now on to PostGIS3. One of the big changes is that roster functions are now in their own separate extension. And when I first heard this news, I got excited because I don't personally you do much with the roster data. But then I started thinking about it as I call from an administrative perspective, I don't really care. There's a bunch of functions that are included in the roster set that I don't use, but it doesn't really add that much overhead. It does not a drag on my servers or anything like that. But I think the biggest benefit from my perspective of having that roster portion pulled out into a separate extension is when onboarding new users to PostGIS, especially if you're only using the vector functions, getting rid of more than 500 functions out of the way means that the new users don't have to filter through those and try to determine, do I care about this or not? And so it's just one more way that we can simplify the initial data, the initial tool set that they have available without overwhelming them. And it's like, if a brand new carpenter walks into a shop and they have every possible cutting device for every possible material in the world, if they're brand new to carpentry, they're not sure which tools are the appropriate ones for the job. And this is kind of the same way that if you have so much option at your fingertips, sometimes, especially for new users, it's hard to focus in on the important functions they really need to know. And if you do need the roster functions, it's pretty simple to install it, just like the base post GIS. I did get curious if you could install just the roster functions without the vector functions and the answer is no. So if I were installing the post GIS and I needed the roster functions, I would use this command with the cascade because it'll automatically install the underlying post GIS function. If you leave off the cascade in this case, you'll get an error instead of creating the dependencies. So tile envelope, generated columns was the reason I wanted to upgrade to post grace 12. Tile envelope is the reason why I wanted to upgrade to post GIS three. So if you've been watching through this series, I've professed a number of times that I do not run a tile server. Haven't been in the past, I currently don't have a tile server in production, but I do now have a project in development thanks to this function. And so what this does is it takes the zoom and X and Y components that are standard for tile servers to know where on the map you're supposed to be looking and it converts it to a post GIS bounding box. And before this function, you had to do that work directly in the front end code. And so in my case, that'd be either in Python or JavaScript. And I don't like writing that kind of code. Now that all of that logic can be wrapped up inside the database. I got much more excited about trying to run a tile server for some specific projects. So the way this function works is you pass in a zoom X and Y. So in this case, I've done a zoom of 12 and X of 850 and a Y of 1,554. And I've wrapped all this in ST transform to make the output a little bit more readable, the longitude latitude that we see. But the result is that you get this bounding box. With this function, I can now pass in those tile, those, the zoom X and Y components from my front end code right into a SQL query that uses this function in a join to filter my other layers down to the area that I'm in. So this allows getting that logic implemented without requiring a bunch of extra front end code on my end of things. So the ST tile envelope is really useful if you're doing the MVT data output. And so the ST as MVT function has been around since post GIS 2.4. And in this initial release, it was found that it was 10% faster to render inside of the database than to pull the data out and render it in MAPNIC. And I think that's pretty awesome when the initial release is already faster than kind of the de facto standard. That's a pretty good sign. Well, with post GIS 3, it's gotten even faster. There are a number of steps under the hood that the ST as MVT function has to do. And a number of those steps have all seen improvements to performance. And this post from Paul Ramsey goes into detail about those changes and how to help performance. And now the ST as GeoJSON is another function that's been around for a number of versions. I forget exactly when this one was introduced. But before post GIS 3, this function would only return the geometry object portion of a GeoJSON blob or object. And so I have a couple of processes I run intermittently throughout the year that I had a bunch of extra code that wrapped up that geometry object inside some more code to make it a proper GeoJSON object for use in other systems. With the newest version, you can now pass in a full record. So basically your rows go in at a row at a time. And the returned object is a true GeoJSON object. And by going this route, I know that I'm gonna have to rewrite some code, but I'm gonna make that code more succinct and easier to maintain. So I'm really happy about this change. Also, if you're using a projection or an SRID that is not the 4326 or that is commonly used, it will tell you what format or projection your data is in. I have received multiple times, for example, shape files that don't have the projection information included. And if you've ever had to go hunting and pecking to try to figure out what projection the data came in, just so you can integrate it with the rest of your data, that's a huge pain. So by the transparency that this adds by telling you, it's either 4326, if it doesn't say, or it'll tell you what projection it's actually in. That is a super handy feature, especially in the data sharing realm. Okay, so now we're gonna get into some a little bit more administrative things. We'll quickly go over the installation and then what's a little bit different about the upgrade. Installation is fairly straightforward. You'll need to install post-grace-12, post-GIS-3, and then the scripts. You might be able to do without the scripts, but it's best to just install them. And your package manager should take care of all the dependencies. We can see that in this case, it's installing 75 total things and it's handling a lot of the dependencies behind the scenes. After you have post-grace installed, it's always good to check your versions, make sure you have the version you expected and that it's running. In this case, we can see that we are running 12.2. Another little piece that's good to note is that I installed this from the PGDG repository. PGDG stands for the post-grace global development group and they maintain a repository that has all of the components in their latest forms and all packaged together nicely. So it's good to use this repository for installation. And then once you have post-grace up and running, you can install post-GIS. This is a per database step, so you do need to create your database and then once you have a database created, you can connect to it and create the post-GIS extension. Don't forget that the roster functions are separated. So if you need those roster functions, I would instead use create extension post-GIS roster cascade. And now with post-GIS installed, you can connect to the same database and run the post-GIS full version function. This gives us a good amount of output. The first part is the version of post-GIS we're running. In this case, I had 3.0.0. And I'm gonna make a little note here that 3.0.1 was recently released three or four days ago, I believe. And I have not had a chance to test the newest version yet. So that was related to a caveat coming up here very soon. Another good detail to pay attention to is the version of post-grace that it is installed with. We see that here we have 12. And it also is gonna tell us about our Geos and our Proj libraries, which are a couple of the core dependencies that you need to make sure you have the new enough versions installed. As I mentioned, post-GIS does have a number of prerequisites. Three of them are listed here. These are three of the bigger components, in my opinion. And then the Wiki link that's at the bottom of this page does have a couple grids on versions and compatibilities and what versions you need of various dependencies in order to run different versions of post-GIS. So that's a good helpful cheat sheet for when you're getting ready to upgrade especially. On the note of upgrading, that's where we are. And what I have found so far is that a hard upgrade is required. There is a page, this is included in the post-GIS, installation notes, they have a header for it. And I found this, there is a caveat, that's why there's the asterisk there. And it's related to the latest version of post-GIS. The upgrade process itself though is really pretty straightforward. You need to install your new dependents, your new versions of the binaries. So you need your new versions of post-GIS and post-GIS. You'll need the post-GIS version for three that also matches your older version of post-GIS. So if you're running post-GIS 11, you'd need to install post-GIS 11 dash post-GIS 3. And once you have all those dependencies installed, you can run the upgrade, the update process for post-GIS. And then you can do the PG upgrade process. And really this process runs quite smoothly. If you have good documentation, your procedures are in order, and you follow your steps. If you do things out of step, that's where things can kind of go, get a little wonky on ya. But after you upgrade, this is a good time that you'd want to check your version again. That post-GIS full version, you'll see that right away it says post-GIS 3. And when I first did this test, I didn't really look a whole lot closer. I just saw that and I went on with my testing, running queries, checking various workloads, and et cetera. I didn't notice that it still says the post-grace version that it's installed for is 11. And it has this nice little warning here that the procedures are not upgraded for post-grace 12 yet. After a few hours of testing, I didn't find anything broken, but I also was not trying to use any of the new post-grace, or post-GIS 3 post-grace 12 enhancements. So I was really just making sure nothing broke, and I didn't find anything that did break. That's good news, but it wasn't fully upgraded for the latest version. And so when I saw that note that it procs need upgrade for 12, I went ahead and I found this upgrade function and I ran it and I got the error that's listed here. It's giving me a problem on changing parameters in one of the extensions. Now I saw this and I've been through a number of post-GIS major version upgrades, and I immediately went to the hard upgrade. I didn't look to see if tickets have been submitted. I just assumed that hard upgrades are required. I had procedures on how to go through this, and it's really straightforward. You have to create a new database with a clean installation of post-GIS with the latest version, and then you can dump and restore your old database into the new version with the clean install, and everything is back in happy and ready. The downsides here, it is per database, and from in my sphere, the databases that are post-GIS enabled are without a doubt the largest of all of my databases. They just tend to be bigger than the other more relational databases are. And so there's a downside, the downtime to dump and reload databases is unfortunate, but it's a good time to get familiar with PG dump and PG restore. There's a really neat option with the directory format. You can do both the dump and the restoration in parallel if you have multiple cores available, and also you can do a partial restore. So if you decide that you know what, you can leave some of that database out. You can bring in only parts of your database if you choose to, but this is a good process to become familiar with if you have to do the hard dump. Another thing you might want to do at this point is rename your database either before or after you do the dump and reload. You likely have a bunch of clients in front-ends and code that assumes that the name has not changed, so you can rename your databases in order to preserve the databases you already have. On the other hand, if you were like me and you realized that some of your databases weren't necessarily named the greatest, now is a good time to go ahead and rename them. For example, one of my databases that got renamed in this process is one of the first databases I ever created in Postgrace. I was migrating away from MySQL and it got named simply Rust-proof Labs. While it was kind of applicable at the time, years and years later, the name had held, but that database name really was not appropriate for that database anymore. So I used the opportunity to go ahead and rename that database, and I also decided in one case that I'm consolidating a couple databases from separate databases into a single instance. So I used the opportunity to do some infrastructure restructuring, so to speak. All right, and so now back to that caveat. 3.0.1 of Postgrace was released a few days ago. I hadn't really been paying attention to what was going on there in the last couple months, but I saw the release and when I looked through the release notes, this particular line item caught my eye. If you were paying really close attention to that error message I got a while ago, that ST line crossing direction function is the one that was causing me issues. And that ticket 4590 that fixed this issue was reported before I ran into the issue. I honestly didn't even go looking for it to be a bug because hard upgrades are one of those things I've dealt with before, and so I just jumped into that mode. But this issue seems to have resolved the issue of requiring a hard upgrade. I have not had time to test and verify that it fixes the whole issue, but I am reasonably confident that it will. So again, it's a matter of using the latest versions is always a good idea. So that concludes today's session on what's new and improved in the newest versions. I thank you again, everyone who joined us today and I hope you have a fantastic rest of your Wednesday.