 So, I'm going to talk a little bit today about Postgres, kind of what's coming up in particular in v12, which is the next major release of Postgres that's happening, that'll be coming this fall. We're actually coming up pretty close on feature freeze for it, so we've got a pretty good list of features that are already committed that'll be included or very likely to be included in that release. A little bit about me, as mentioned, I'm a major contributor to Postgres. I'm also on the committers team. I run the Postgres infrastructure, so if you see an issue with pg.org, you can call me. I also worked on role-level security in 9.5, common prologes in 8.4, kind of my really big claim to fame, I guess, is I implemented the role system in 8.3, which was pretty nice. I've hacked on other things, plpgsql, postgis, sendmail, I've got the Linux kernel, so it's good stuff. All right, what features are we going to talk about? So I'm going to cover the, these are kind of the topics that we'll talk about. Plugable storage is a really big one. I'm going to talk a fair bit about that one because it's really important. I'll talk a little bit about our partitioning improvements, performance improvements, some new SQL features that we've got, and some nice things that have changed around system administration, somewhat nice, somewhat not nice, and so depending on your point of view, I suppose. All right, so let's talk a little bit about plugable storage. So what's going into v12 right now has been primarily just back-end internal changes. This is some re-architecting and refactoring of the way postgres accesses its storage. It's going to provide for a basis for what's called Z-heap or Z-heap, which is not going to be included in v12, but might be included in a future version. So this is an alternative storage architecture to postgres as traditional heat-based format. Traditionally in postgres, you have MVCC and then we have only what typical database people would consider to be a redo log, which is that we have a write-ahead log and in the event of a crash, we can go to the last checkpoint in the write-ahead log and play forward. Some other database systems have things like an undo log, which allows you to go backwards. And this is something that is not something postgres has supported because of the way we do our MVCC as every update is actually an insert and a delete. Well, part of the re-architecture of the storage layer in postgres is to allow for an alternative heat format, which would have a redo log and an undo log for it. Because in some cases that's nicer for certain query patterns and for certain update patterns than the MVCC model where you have insert and delete updates or insert and delete that we have today. So that's a pretty massive change. The other really big change around pluggable storage that we've been talking to a lot of people about is that we want to be able to support columnar storage. So if you were here for the last talk, you heard that people go out of their way to get columnar-like storage on postgres using different techniques, a foreign data wrapper to some kind of external file basis, one way of doing it. There are other techniques like using arrays in postgres, which allow you to pack data much more tightly because you can get away from having that per tuple overhead, which ends up being the big problem, the driving issue when it comes to column type tables in postgres where column type tables would be very narrow tables, right? One column or two column tables. So these are pretty massive changes to postgres. And this beginning work has, a lot of it's made it into v12 of this re-architecture. I think there's probably going to be more of it going into v12. I know that there's so much standing committees out right now from Anders Farman in the particular who's been kind of the driving force behind pluggable storage. But we're not going to have anything really user-facing in v12 here. This is all kind of part of a multi-year plan to get postgres to a point where we can do pluggable storage as well as support for ZEDEAP. We've been making a lot of improvements to our partitioning. So postgres has declarative partitioning today and over a number of releases we've been improving on it. So one of the big things that is going into v12 is much faster planning when there are a lot of partitions involved. This is a big, big change from postgres is kind of inheritance-based planning system that we've been using for a long, long time because we've kind of gone away from inheritance-based because it is inherently expensive to plan with over to a declarative style. But the initial versions of the declarative partitioning in postgres still use the inheritance type of planner because it was easier to get things working that way. But it meant that we were still limited and we couldn't have more than a few hundred partitions before query planning started to really suck. So a lot of work has been going into v12 to fix that, essentially, to make that better, to make it so that you can have thousands of partitions and things will still be performant and possibly even more. And that allows you to have much more flexibility in how you do your data management with postgres because that's where partitioning is really key is for data management purposes. Another thing that's relevant for partitioning in particular is that there's this multi-insert option inside of the back end where basically you can insert multiple rows at a time. That was disabled for partitioning initially simply because it was complicated to make sure we got it right in the event that a tuple needed to go to a different partition than whatever one we're currently working with. That's been fixed. That change will go into v12. What this means is that the ingest of pulling rows, if you're doing a copy into a partition table, postgres is going to be much more performant at doing that in v12 than we were in earlier versions, which is really, really fantastic. There's also a new function called pgpartitiontree which is really cute and really handy to display the entire partition tree. So you pass in a parent partition and it'll give you the whole tree down because of course postgres has multi-level partitioning. You can have multiple levels of partitioning off of different even columns at different levels. So that's something that we now have which is really, really nice. We have much faster float handling. So this is an interesting one. So there was actually a new library that was put out there that allows you to do floating point much faster. This is converting from a floating point value into text. In fact, it's actually even more accurate than we were previously. So this is going to be a really significant speedup for a lot of different workloads, particularly like analytical workloads, things where you're doing a lot of exporting of data with lots and lots of floating point numbers. That's going to become much faster in postgres v12. It does also mean that it's possible you might see differences. If you're actually doing equality comparisons with floating point, you're probably doing it wrong. So let's just be aware of that. Hopefully most of the people here know that, but it is a really impressive improvement in the performance and throughput when exporting data out of postgres that has a lot of floating point in it. So this is one that I'm hoping, I'm not sure if he's actually committed it yet, but we've had a lot of discussion about it, which is the ability to have serializable with parallel query. So it was previously unsupported. Serializable is the highest isolation level postgres support, and it's also what we call truly serializable, which tends to be a step above what a lot of other database systems provide. In that we actually track reads versus writes and look for what are called dangerous edges, or dangerous transitions inside the database system. Doing all of that correctly when parallel query is involved is a little bit complicated, but that's something that's being worked on. And the really nice thing about using these higher isolation levels is that it means you have to worry less about concurrency. The application developers don't have to write as much code to deal with potential concurrency between different transactions that are running against the database system. So this is something to be on the lookout for if you've been using serializable as an isolation mode and you haven't been getting parallel query, well, hopefully we've now solved that in V12. Another thing that I didn't put on here that I think is actually that's related to serializable that's important is the ability to have serializability on replicas. That's not something that we were supporting. I know that's something Thomas Monroe was working on. I don't know yet whether it's going to make it into V12. I kind of hope it does because that would be really awesome. But if you're looking at serializable as your isolation level, you're doing that already, you'll be looking at V12 to improve things in a number of ways for those kinds of workloads. We also have this cool thing called copy from with a where clause. So it used to be that if you wanted to import a whole bunch of data into Postgres and you wanted to filter it out so you didn't actually get all of that data coming in, you only wanted a subset, you could do something like use the file SDW or you could import it into a temp table and then filter it. Things like that. Now we actually have an ability where you can pass in a where clause to a copy statement and filter that data on the way in so that it's just a much simpler alternative to things like file SDW which is still really handy and really nice. If you want to do joins and other things against that file but if you're just doing filtering then you can use copy from with where instead now. So it's a really big change, right? So Postgres has supported copy table expressions for 10 plus years but they were always done as a materialization step. So we essentially run the query, materialize the results and then use those results for whatever subsequent parts of the query that you're doing. What this meant was that if you had a... on the outside of your query, like in the main part of your query you had some kind of filtering expression, right? And that filtering expression threw away like 90% of the rows from the CTE. We would still go generate all of those rows, right? And we wouldn't actually filter them out until we got to the main part of the query and in some cases that sucks, right? That's bad. I've had to give many a talk where I've told people don't return lots of rows out of your CTEs because Postgres will materialize them especially if you're not going to throw them away. Instead push those down into the CTE or use sub queries or something else so you can do that filtering early on. V12 is going to give you the option to do it either way, right? So and the default is changing. So starting in V12 we will not always materialize it, right? There may be cases where we still will materialize the CTE for various reasons, basically because we have to typically. But if you are running a CTE on V11 and you run it on V12, you may see a difference in what the query plan ends up being because we're going to try to inline that CTE unless you actually explicitly say with materialize. So that's the way to get the old behavior back, is to say with materialize and then it will be exactly like it was in V11 and earlier. One of the nice things about this is that there's actually people who have done analysis about the Postgres query planner versus the query planner in other database systems and the fact that we didn't inline CTEs when it made sense to has been pointed out for many years as being a weakness of the Postgres optimizer and that's getting fixed, right? Which is fantastic. So this is a pretty big step in terms of Postgres' comparison to things like Oracle and SQL Server and other relational databases. It's not so much of a comparison against other databases that only just got CTEs. So let's just... You know, and don't support hash joint but that's a whole different discussion. Another really big thing that has happened is we have gotten rid of the recovery.conf file. So if you've ever set up any kind of high availability solution with Postgres using the Postgres replication protocol you've played with a recovery.conf file or some tool that you're using underneath of what you're doing has had to go create this file for you. It's now gone away. We have gotten rid of it. One more. What we've done is we've basically merged all of what was in recovery.conf into the postgresql.conf file which means that you can use things like alter system to make changes to the recovery parameters. It increases the flexibility a lot of Postgres with this but it's going to be a pretty disruptive change to anyone who's doing high availability or doing regular recovery scenarios, right? So if you back up our server and then cover it on to another server that process is likely to have to change unless you're using a tool like PG Backgres or something that kind of hides that stuff under the covers for you. I do think that this is going to kind of reduce the overall fragility of H8 solutions moving forward and it also did things like allow us to have a SQL level command called PG promote which is a function that you can call on a replica and have Postgres just get promoted, right? And the other thing that this allows that I probably should have put on this slide is that this will allow you to on the fly on a Postgres replica change which primary it's pointing to to get its replication stream which is a pretty massive change. It was previous that you had to actually restart Postgres on the replica to get it to start following a new primary. That's gone away now with this change now that the recovery.conf file is gone that change can be made on the fly which is fantastic in a lot of ways. Particularly useful so that you don't have to have an outage of your replicas in order to have them re-point to some newly elected primary. So that's something that I'm sure you'll see things like Patroni and the other H8 solutions that exist for Postgres picking up on quite quickly because it's a really important and valuable change. Maybe not of interest to everybody but certainly of interest to some people who really care about security you're now able to control the SSL protocols that Postgres will run as a server with. Previously we didn't have a way of doing this basically and in a lot of environments you're required to disable the older SSL protocols because they're unsafe. So this is a important advancement in Postgres. Previously you could enforce it with FIPS mode OpenSSL which is basically what FIPS mode for OpenSSL did but that required actually dealing with FIPS mode in OpenSSL at that level here we can do it in Postgres so you don't have to worry about it modifying or monkeying with the system-wide OpenSSL setup. We'll be updating the CIS benchmark and the STIG to accommodate this change of course. This is a new output mode this is a CSV output mode which basically means that you can do a pset inside of PSQL and say give me CSV so if you're in the middle of writing a query you can just pset to CSV, rerun that query and suddenly you get CSV output which is nice. You could previously use copy with CSV but awkward and you have to actually modify your query and stuff like that so this does away with all of that you can just do it directly from inside PSQL which is awful nice. We also added an option that allows you to sample out the queries that are running in Postgres so previously in Postgres you had a what's called a logman duration statement which basically is a way of saying okay Postgres log all the queries that take longer than this amount of time now in a very high rate system if you set that to zero Postgres slows down because everything bottlenecks on trying to rate these queries out to the log now you can actually get away with that if you also configure this logman sorry the log statement sample rate because you can tell Postgres only sample queries at this rate and then what will happen is Postgres will give you a sampling of queries and have those go into your query log for later analysis so this is particularly helpful when you have systems that have lots and lots of very short queries right that's the big place where this is going to be really advantageous because previously if they were really short really fast queries Postgres wouldn't log them and if you enabled logging of them you would log all of them and that would slow everything else down so now we have this ability to do sampling and because we can do sampling we can then extract out those samples and do further analysis on them and you can see like on an individual query basis how fast those queries actually work is there a question with that? yeah is it like 100 queries per how you configure it? I believe it's a percentage basis some kind of percentage basis that you can tweak and you can adjust right based on what you want to do that's my recollection I have to admit it's brand new in v12 and I haven't played with it very much myself but I'm pretty sure it's percentage based so that's kind of wrapping it up in terms of what I had yeah 5 minutes so I did want to make a few other comments so these are kind of features for v12 looking forward to the future I talked a little bit about zheap I talked a little bit about columnar storage I do think that those are big things that are going to be happening in postgres moving forward past v12 I think that there's going to be more work going into just in time compilation with postgres so for those of you who are not aware postgres already has the ability to do some just in time compilation of parts of a query actually taking parts of a query and compiling it down into machine code and then running it that work has kind of gotten backlog behind the pluggable storage work but I suspect once we have pluggable storage in v12 I'm really hoping that v13 starts to move things forward more on just in time compilation and the modifications to our executor that entails because that's going to be a massive win for analytical queries and things that need to process lots of data because eventually we want to get to a point where you can potentially have an entire query that is compiled down to machine code and optimized and then run against the system which can really be a massive improvement in terms of overall performance so with that I've got five minutes left so why don't I open it up to questions if anybody's got some I'm happy to entertain pretty much anything thank you I would like to ask about the ecosystem about how the system is what companies develop about the ecosystem around Postgres so I would say the ecosystem around Postgres is really quite positive we have a lot of different companies there's really I would say three or four major companies that are contributing code to Postgres and by the form of having one or more full-time staff who are spending time hacking on Postgres which I think is a really good thing for our community it's kind of interesting because we're starting to see people buy up Postgres companies there's been a couple of acquisitions recently the most recent one that I think people might be aware of is CitusDB was bought by Microsoft I don't know that that's going to change anything right now my friends over there have said that things are changing but they're still very much focused on Postgres and so I'm hopeful that it will that said CitusDB was an extension they did their own thing they didn't really actually they contributed some stuff back but I don't think it's going to have a major impact in terms of the overall velocity of the project Microsoft has more than once kind of made inroads and expressed interest and actually contributing code back to Postgres they haven't been terribly successful yet, I'm kind of hopeful that they will because they certainly have a lot of really smart engineers and if they can get some people who have some time to work with the community and to help Postgres forward that's great, that works great what about Amazon they should go on database based on Australia support because RDS or you talking about Aurora they should get from Oracle for something Amazon continues to try to make continues to spend effort moving off of Oracle and I think you're going to continue to see that from a lot of big companies they're not the only ones, right? Salesforce has continued to make massive inroads in terms of moving off of Oracle and moving over to Postgres too and I think you're just going to continue to see that happening from a Postgres perspective it works for me more people running Postgres the better, I mean I work with a lot of very large multinational companies helping them with their Postgres strategies and getting Postgres deployed into their environments often times with a end goal of moving off to Oracle other questions? Chris has a question I just discovered there's Postgres conference here in Singapore yes, there is I'll be speaking at it so just let everybody know there's a chance to register for it yeah, you're still seats available I think both for my training session as well as for the conference itself the conference is called PGConf APAC starts I believe on Tuesday if you're interested definitely go one thing I saw on the commit fest was a patch that was listed as ready for committer involving SSL and allowing the option with say SRAM to require that the C name matches the username that would allow 2-factor authentication with SSL certs and passwords do you know if that's likely to get committed for 12 or get pushed and I'm not asking for a part here in T I mean if it's marked ready for committer that patch as I recall was pretty straightforward and relatively small so I think it probably is going to boil down to if there's any committer who has interest I can name a few who I think might and we could probably go down to them one of them unfortunately being myself so maybe we should chat about that if it's pretty reasonable and it's been through perm enough reviews I haven't looked at it myself but that is interesting I wasn't actually I hadn't followed that one so I'm curious about it any other questions alright well you get a couple minute break for me and then I'm back up here talking about more stuff