 Anyway, welcome to this session on Postgres 9.6. My name is Magnus Hagander. I work for a company called Red Pill in Pro. We do open source services in the Scandinavian region primarily. I'm out of a stock home in Sweden myself, where I run our database offerings. In Postgres, I'm a member of the core team. I'm also one of the committers, and I do a lot of work running events like this over in Europe for Postgres Europe. Anyway, there's a lot to cover. So let's start. Postgres 9.6. Who's already using Postgres 9.6? In production? No one? Come on, this is... Yeah, no. Come on. You should be on that. Feature freeze was days ago. You should all be running. I'm running it in production. Maybe not mission critical production, but, you know, still. So Postgres 9.6 actually started in June last year, when we branched off 9.5. So how we work in Postgres is we decided then on June 30th that we created a branch for 9.5, and any new features can only go into master. 9.5 stabilized over, well, we intended to do that from June to September. It may have been from June to January, but, you know, we did get 9.5 out eventually. And then we started the process that we call commitFest in Postgres, because, you know, everyone has to have their own name for how we do iterative development. In Postgres, the general idea, which doesn't always tend to work out, but the general idea is we do one month of development work, and then we do one month of review and commit work of that work. That's what we call the commitFest. Then we do another month of development, another month of review and commit. That's how it's supposed to work. It hasn't really worked that way. We've more or less been in what we call continuous commitFest for two years. It's been getting slightly better, but there's definitely more to do. But that's the general idea. We've been running five of these for 9.6, July, September, November, January and March. And then on April 8th, we had our feature-free state. And we're going to try really hard this time not to make this take eight months to get to the release. The target is to get back to releasing somewhere in September, October-ish. To do that, we need your help. We're in feature-free. That means we get no new features. It means now is the time for testing and for fixing the things that we have there. We need you guys to help us test. We have our tests, but you have real tests. They're called your applications. Whether you may have tests for your application, please run them against our 9.6. Whether you do it now, you might wait for the first beta version, but the earlier we can get that feedback in, the better. Both to get feedback on the new things that we have and to get feedback on what we broke. Because it's probably something. That's usually how it goes. There are no new features. We still reserve the right to remove features that we put in there. If they turn out not to be as awesome as we think. That means that anything that I present on here might go away. I think nothing has gone away yet. That has happened during these presentations in some cases, but I don't think we're there. I think we haven't had the time to remove anything since April 8th. But again, we might still do that. When I give these talks, and I tend to give them for every version, I usually try to divide up the different classes of features. It's really hard because many of them go into multiple categories, but I'm going to try to split them up in DBA and administration, developer and SQL, replication and backup, and then a whole special section for performance. It's not entirely even, and it's actually interesting. I started working on this talk a couple of weeks before the feature freeze, and I got really worried because I didn't get very many slides. There weren't that much. A lot of these features, there was a lot of infrastructure work going in in the earlier commit fests, but there were actually very few things that were actually visible to you. But then I had to start typing slides very quickly as we drew towards the feature freeze, and more and more things got in. So right now, we actually have a lot of things. There are very many awesome features. And I'm going to highlight some of them. Of course, 45 minutes is not enough to talk about every single thing. So starting under DBA and administration, in the topic of, we like configuration parameters that you have to type a lot to access. We finally, who has ever needed this? Idle in transaction timeout, and it's actually called Idle in transaction session timeout. This is when you really like tab complete. But it works, and the basic idea is, you've all had issues. I'm sure if you've been working with Postgres for a while, you had sessions that were Idle in transaction, they were sitting around for a long time, and they caused bloat in your tables. You can put a timeout on it. What will happen? We already had the statement timeout. You said for a single query. If statement timeout fired, the statement would be canceled, and the transaction would be in a failed state. You could do rollback and go back. When this timeout fire, you'll actually be disconnected. Because we don't have a way to roll your transaction. We'd lose sync with the client if we just rolled your transaction back. So we actually kill your client. But that means the transaction gets rolled back at that point. Fairly small thing. If you've been there, it's a really nice thing. Another thing that has been enhanced in a very useful way is PGStatActivity. I'm sure you all used it. It's how we see what's going on in the system right now. In particular, it has much better weight information. Previously, you had this little Boolean column that said, wait, or sorry, waiting. And it just said, okay, this guy is trying to do something and it's not getting ahead because it's waiting for somebody doing something else. And that's all the information you'd get. This Boolean has now been split up into two columns and has much, much, much more information about what you're actually waiting for. So the Boolean column, the classic just waiting equals true would probably look something like this today. Weight event type is the new column. It says, what are we waiting for? In this case, we're waiting for a lock. That's the thing that would show up as waiting before. This can be like lightweight locks and buffer pins and things like that, but this is when we're waiting for an actual lock. And then weight event says, okay, we're waiting for the lock held by a transaction. So there is, in the documentation, two pages of the different combinations of these two fields that will tell you exactly what this session is waiting for. The column called waiting is gone. That means you will have, if you have monitoring scripts that are using this, you're going to have to update those. The difference is basically waiting equals false, as it was before, is now going to be weight event type equals null. Is null. Sorry. Yeah, I know. We don't have that. We still have that? It's too painful. So I pretend it doesn't exist. So this will let you dig in much closer to what you're waiting for. And you might be waiting for something that's not actually another session. You might be waiting for the while lock, if it's on the transaction lock, or a relation extend lock when someone is trying to make your table bigger. This will all show up here. You'll get a much better chance to dig in. Another thing that's been surprisingly hard to get before, that's been made much easier now, is we now have a function called pgBlockingPids, which will basically tell you, you give it the process ID of this process that's in waiting state, and it'll tell you who it's waiting for. Not just it's a transaction, it'll point you to the exact back end. We used to do this in weird ways with little left joins back through pgLocks, and sometimes it didn't work, and this one does actually look at the lock structure. So it really should always work. And it will return an array of integers, which are the Pids, of the back ends that are blocking the session. And then you can figure out who that is and blame them instead of me. Another advancement on the utility, on the monitoring side, is we now have what we call utility command progress reports. We only have an implementation for vacuum so far. The idea is to be able to have implementations for more, so there's a set of infrastructure that's in, and there is a specific implementation for vacuum, and hopefully more to come in future versions. But basically it means whenever you have a vacuum process running, or one or more, you'll get a row of PgStat progress vacuum. That'll tell you what this process is doing. So in this case you can see it's running with this PID on this database, on this particular relation. Right now it's in the face of scanning heap. It has this many blocks, it has scanned this many blocks, it has vacuumed this many blocks. And of course if you want to see how it progresses, well you select again, and you see how these numbers get bigger. So it gives you an insight into exactly what your different vacuums are doing. This will include both your manual vacuums and auto vacuums that are running. It'll tell you how far ahead they are. Some more things that are simple on the whole sort of system information side. There is now a view called PgConfig. That'll just get you the same data that you would get from the PgConfig command, but you can access it remotely. So it's like compile time information about the Postgres instance that you're running. If you need to instrument it to see what's going on. We also have a number of functions called PgControlSomething. I think it's five functions or so in total for the different groups that will return the same kind of information that you would find in using PgControl data. So like what XIDs, when did your last checkpoints happen, these kinds of things. Just to make it easier to access this data without having a shell access on the machine. So small things, yes, but very useful. And when we're leaving the whole monitoring side, there's I think one really, really big feature if you run into this. Many of you may never have seen this and you go like, I don't care. But if you run into this, it can actually be a really big problem. This is what is known as the vacuum freeze, or the freeze map. It's an extension of the visibility map that will basically what we do is we track and each of your relations have every single thing on them in what we call the frozen state. Freezing is what we do to avoid this anti-raparon. So who's ever seen their auto vacuum fire up an anti-raparon vacuum? Okay, at an inconvenient time. Yeah. I mean, we still have to do this. But the idea is if your tables are mostly read-only. So when we run this anti-raparon, we end up marking this page as everything here is frozen. That means we don't need to vacuum this again until someone modifies something on this page. And if it's a mostly read-only table, which many of our really large tables may be, then the next time, we don't actually have to touch that table when we do the vacuum. We can just skip it or touch that block. It's tracked on a block level. We still have to do it once because we still have to mark them as frozen once. The only real exception to that is if you loaded your table with copy-freeze. In that case, we don't have to vacuum it at all. But in any other case, we'll have to do it once. But only once. And again, if you have large, mostly read-tables, it'll make these wraparound vacuums much, much cheaper. And they can be a real problem. They can be a real problem if, when they fire, they don't run quickly enough before they have to fire again. And this will make them much faster, so it'll take that problem away. If you're one of those who goes like, well, two billion transactions, that's going to take me 20 years, then this is probably not going to help you. It's not going to hurt you either. Okay, so let's leave the DBA and then administration side for a while and talk a little bit on the developer and SQL features. This is actually the section that I found the least really big things in, in this release. But at least we have something new. We have something called phrase searching for our full text indexing. In existing versions of Postgres, when you do full text search, it looks at the individual words, but it doesn't care about the order of the words. So in the first example here, that's the traditional way that we've done it. In this case, we're using plain2t as query to take the text we're searching for and turning that into an internal structure called the TS query, which we then use the double at signed operator to do a full text search. In this case, I'm doing the search against the result of this function 2 TS vector. If we're doing this on a table, the function called the 2 TS vector would be a column in the table that's a TS vector. But in this case, we're saying, okay, search for quick folks in the text, the quick brown folks jump, right? And we get true. It's present because we're only looking at the individual words. But with the phrase search in 9.6, we can instead use phrase2 TS query instead of plain2 TS query. And then we're looking specifically for these two words coming, you know, quick, coming directly before folks. And that's not present in our TS vector. So it's a way to get a more detailed full text search. Obviously, these will use the same index, right? You don't need a separate index, and it will be fully indexable like all the full text searches. Now, this is the simple way of using it. We also have a more generic way. Actually, what this, the second example here, the phrase2 TS query, that function actually returns a TS query that looks like, as you can see on the top there, it's the quick and then the less than dash greater than folks. That says that these words have to be next to each other. That's if you're using the actual TS query syntax, which most of you probably aren't doing. But what you can do with this, you can use the generic version of it, as you can see in the second example, where we're saying if we're saying quick, less than 2 greater than folks. That's the 2 is the distance between the words that's allowed. So the uppercase is the same as 1. They have to come directly next to each other. With a 2, we say there can be one word in between these two. If you made it a 3, you say there can be two words in between these two. The distance can be 3. So you can do a more advanced phrase searching that way. It goes up to 9. If you need more than 9, it doesn't work anymore. And again, of course, same index, same syntax, same TS vectors. And that's actually the only thing that I really found to put under SQL level functionality. There are others, but this release, I'd say really is a release for the DBA, for the administration side, for the performance side. And not necessarily, as we've had some other, like when we added JSON support or lateral queries or these things, they were really about updating your applications to take advantage of this. The difference here, of course, is you don't have to update your applications to take advantage of most of these things, which can be useful. Okay, so replication and backup. These are very similar things in how Postgres has implemented it. You probably know if you set up a replica, you use PGBase backup or use the backup functionality of a toolkit around it. So they're all very similar. The first thing is, we're breaking all your config files. Actually, we're not. But we could. There is a new while level. You remember, we used to have while level minimal, archive, hot standby and logical. Archive and hot standby are gone. And they're replaced with this called replica, which is the same as the old one, hot standby. If you actually pick archive or hot standby, we will just translate it to replica. But the functionality difference between them doesn't exist anymore. The difference between archive and hot standby was so small, and the benefits were much bigger than the difference, so we did just remove that one. Minimal is still there. Logical is still there. But the other two in the middle merge together to beat this replica level. Just to make things a little bit easier. We have a new view for statistics about replication that's visible on the standby node only. It is called PGSTAP while receiver. Today, you have a view called PGSTAP replication. You can connect to it on the master. You can see a list of all your standbys. There are different transaction log positions for different things, like how far ahead have they written, how far ahead have they applied, and all of those things. This is sort of the other side of that. This is the receiving end, and what the receiving end believes. What the receiving end is the last thing it's seen. The last thing it's applied. So you can compare these two, or you can just connect to your receiver. If replication isn't up and working, there will be no rows. That would be a good first check for your replication. Make sure you're actually connected. Get a row. You will never have more than one row, because a Postgres replication standby, we only have one master. You can't connect to more than one master. It only covers the Postgres standard replication, the streaming replication. We have a few fairly small changes that can make your life easier in a few cases with replication slots. First of all, PG-based backup can now automatically create a replication slot as you run it. So to make sure that you have enough time to get the replication slot up and running before you connect to replica, and to remove one step through there. Unfortunately, we still can only use it in the scenario of replication. We'd like to also use it when you're using PG-based backup with streaming transaction log at the same time. That feature is not in 9.6. Hopefully, it'll be in the next version. And you can now also, if you're manually calling PG, create physical replication slot, also aiming for nice long-function names, not just parameters. You can tell it to reserve the transaction log immediately. That means that as soon as you've done that, Postgres will stop rotating transaction log. So if you just run that and forget to set up your replica, you will run out of this space. But again, it closes these windows if you have a really high velocity system. It closes some windows where you could fail to set up your replica because your system was generating changes too fast. So those are the fairly small things, I'd say, on backup and replication. We have a couple of really big things changing in the replication side. First of all, we now have support for multiple synchronous standbys. Since Postgres 9.1, I think, 9.1 is when we added synchronous. You've been able to pick one. You could have 10 standbys, but one of them at any given time would be the synchronous standby. You'd set your synchronous standby names to something, the name of one of them, or more than one, but Postgres would pick one of them. And as soon as one of your standbys had received the transaction, we would release the transaction and keep on running. With multiple synchronous standbys, you can say, I want my transaction to be on three nodes before I allow the system to continue. Or, you know, five nodes or ten nodes. Don't set it to five nodes if you only have three nodes. And generally, you probably don't want three nodes if you only have three nodes. The idea is you can get it to enough of them, but maybe not all of them. The syntax, it goes into the same parameter. So previously, we'd say synchronous standby name equals and the name of one or more nodes that we accepted. We can still use that syntax. Then we're back to saying one of them is enough. If we use the syntax a number, then a parenthesis and then a list of nodes. In this example, we're saying I need three nodes to be caught up before I release the transactions. It can be any three of node one, node two, node three, or node four. So you can list, you know, ten nodes and say I need five of them. Or, and I actually, maybe someone can confirm this for me, those of you who were working on that, can I say three nodes and then a star, any three nodes? I think that works. Does anyone know? Okay. For the audience to test, go home and play with it. I'm pretty sure that it works, but in general, I wouldn't recommend using a star there because you really want control of where your replicas are going. In particular, when we're talking about, you know, a multiple synchronous, this is about getting the really high availability. You don't want random nodes to accidentally become part of your high availability solution. Keep them under control, use a strict list of actual nodes. The next big thing that we have on the replication side is we have yet another name, yet another value for synchronous commit. As Robert said in his talk, the increasingly misnamed or overloaded parameter synchronous commit, which we can now say remote apply. Insynchronous replication previously, even at its highest level on, you were still not guaranteed. You did a commit on the master. If you immediately queried the standby, your row might not be there. Actually, we guaranteed that the row would be on the standby, but it wouldn't be visible in the database on the standby. So from an availability perspective, as in, oh, I crashed, whereas my data, you were fine. But if you actually wanted to, for example, load balance a read operation over there, that might not work. If you change synchronous commit to remote apply, we now guarantee that it's been sent to the standby, written to disk on the standby, synchronized to disk on the standby, and replayed into the database. So that it is now visible in the database. Yes. This does not do anything with two-phase commit, because you still can't perform operations on the standby. We do have two-phase commit, but that's entirely unrelated to this. Because the standby is still read-only. But what it gives you is you're guaranteed that if you make the update, if you make the commit, and immediately after that you query on the standby, the data will be visible. Immediately after the commit returns. Yes. You have to wait for the commit to return. How much overhead does it have? Yes. I would say exactly. Surprisingly little, but there will be an overhead. It will obviously depend on your workload. But in a lot of tests, I've heard others do them. I've written them. I've done some tests myself. It's been surprisingly little overhead, but it's certainly not zero. It doesn't do any more work. It adds a little bit more traffic because you get a message back from the standby when it's replayed, which we didn't bother sending before. But you're adding latency at every step. Yes. Exactly. If you're comparing it to asynchronous replication, the difference is going to be bigger. But if you're comparing it to the previous synchronous, it's not going to be huge. If you're unlucky, you might hit a workload where it is. It's just one of those we need more input to know for different workloads, but it's surprisingly low overhead at this point. What? I don't think so. No, it shouldn't. Streaming delays should not interact with that. No. Because you're opening a separate transaction. We'll still kill you. Are you disagreeing with me, Stephen? Oh, no, you're right. I was thinking... I haven't actually tested that, but yeah, that seems like a miscarriage. You would probably want to run it with a whole standby feedback. Yes, you can still do this per transaction. You can run some transactions asynchronous, normal synchronous, and some of them remote-apply synchronous. It's still tunable per individual transaction. And you can combine it with multiple sync. If you say remote-apply and you have this thing set for three nodes, it's going to be visible on three nodes before we release the commit. Which obviously, the more nodes you add to this, the slower it gets. And then you add this, and it gets even more slower. Again, surprisingly little slower. There is a second part of this that's going to take care of that part even a little bit more. It's called Casual Reads, I believe. The entire feature didn't make it in, unfortunately. We got this part. So there will be some further enhancement coming up. So now that we've got the synchronous replication, that will make things slower. You can also have some performance features that will make things faster. You can combine them and everything will be just like they were before. Or actually, hopefully, they will still be significantly faster than they were before. I'll start with something that surprised me, and I think it surprised a bunch of other people when they realized it, is we can now send the date time much faster if you do like copy. And the ridiculous thing, benchmark, if you do a copy from the database to a file of a single table with a single timestamp column, it's twice as fast. What that really says is that it was really, really slow before. And you probably don't have a table with just the timestamp. You have other things in there, so your application is not likely going to be twice as fast. But it's one of those, the patch goes in and we're like, what? I'm okay, it's faster, but how did it get that much faster? And it's basically the output functions for timestamps, for dates, and for times are rewritten from scratch in a significantly more efficient way. So yeah, the up to two is single table, single column, that's the best case scenario obviously, but if you're sending lots of date and times out, things are faster. You don't have to do anything. One of the best types of performance features I think, the ones where you upgrade and it's faster. You don't have to change your application. There's also been once again, this has been a theme of a number of releases, changes to how Postgres deals with locking and how Postgres deals with low level concurrency. There are better tracing of locks. We have this new PGstat activity that tells you more things about these. But in general, Postgres 9.6 is going to do better if you have a machine with many CPUs or many CPU cores and many concurrent sessions that are actually trying to use them. Hopefully there aren't going to be a lot of bugs in that, that can be really, really hard to track down. But it's again one of those, if you have highly concurrent loads, they'll just run faster. You don't have to do anything. A specific thing around this that we've finally gotten to, I know some people, some who are in this audience, who've been complaining about this for 10 years, at least 10 years, is that Postgres has actually been really slow if you have multiple sessions loading data into the same table at the same time. Because the extension of a table, so when we added a block at the end of the table, simply did really bad in concurrent workloads. That has been changed now by extending multiple blocks at the same time. So instead of every time someone needed a little bit of space past the end of the file, we'd increase the file by 8 kilobytes. Which we then filled immediately and started locking again. The idea now is that Postgres will look at how many people are waiting for this table to become bigger. It'll take that times 20, which is a magic number, and extend it that much. The idea is we can't just, extend it by 2 gigabytes every time. Because then you'll run out of disk space. But if you actually have a highly concurrent workload that's pushing data into the same table, 20 turned out to be okay. It's hard to know whether, it's probably not going to be perfect in every situation. It might not be perfect in any situation, but it's a good number and you really don't as a DBA or so want to end up tuning this visually per table or anything like that. We could make that tunable but you don't want that. Even if you think you want that, you don't want that. I certainly don't want my customers to have that. Because I'll have to go over it all and remove it. But 20 is one of those numbers. It seems to be, it might be that we're going to change the magic number to a slightly more magic number in the future version. But it seems to be doing fine. We then have two tightly coupled features that have changed how our check pointing works. The first one is that IO that happens at checkpoints is now being sorted. It used to be that when your checkpoints ran, we'd generate completely, well, more or less completely random IO all over your disk and just dump the data out. What 9.6 will do is sort them first by which table space they're on, then by which relation they're in, then by which relation fork and eventually by block. So the idea is we'll generate, it's not going to be entirely sequential IO, but it's sequential-ish. It's not super random anyway. It gets much closer to being sequential. And we push that data out. The other thing that we have related to this, because we've had issues, is that issues with the IO storms generated by checkpoints since probably, since we implemented checkpoints. And it's a problem. The whole idea behind checkpoints is we buffer data and then we write it all out at once. And then we have this checkpoint spreading, checkpoint completion target parameter that says let's try to write the data out over a longer time so we don't overload your IO system. And what happened for a lot of people is it's basically very well spread out into these OS kernel buffers and then the kernel will be like, oh, let's write everything at once. And we died anyway. There were ways to do this. Many of you have probably configured the dirty background ratio in such parameters on Linux. Postgres will now have the ability to tune a bunch of these things from inside of Postgres instead of having to do it globally for your entire server. We'll also be able to differentiate different kinds of write, which is something the OS kernel can ever do. So basically, this idea of let's send everything to the OS kernel buffer is what we need to change and tell it to start writing things before we're done. And we have three new parameters to do this that we can configure with Postgres.com. It's going to be platform dependent what you want to do and in fact it's only enabled by default on Linux. But in fairness, that covers most of the really big Postgres installations today. That's not to say we want implementations for other operating systems. The infrastructure is there to build them. We hopefully will. We just don't have them yet. Because the idea is most of the time it's better to get these things written early and not do a massive write caching in the operating system buffer cache. Because if we do, yeah, the operating system decides the right data out and we lose. This way we can control the writing much more. There are exceptions. There's one particular exception that's even listed in the documentation which is if you have a workload that's bigger than shared buffers but smaller than the operating system cache. That means you're basically never doing I.O. because it's all in the operating system cache. Then the OS can do whatever it wants. We don't really need to care. The parameters that we can set are checkpoint flush after, BG writer flush after and back-end flush after. They control the three different kinds of writes. You can probably guess which ones. Checkpoint flush after controls what the background writer does. That will control the writes that are happening during a checkpoint. BG writer are the ones that are happening in the background writer and back-end flush is when you have a single back-end that's writing its own data out either because the buffer cache is completely full or because it's restricted itself into a smaller portion of the buffer cache for data loading. Typically the checkpoints you want to buffer fairly little. I don't think we know if these defaults are going to be perfect. It's going to be workload dependent. We don't have any tuning guide yet because we've only had this feature for a couple of weeks. But in general writing things out from the checkpoint fairly quickly is probably a good idea because we already have the right spreading going on in it. These are the default values. This feature, if you are on Linux is enabled by default. If you are on a non-linux platform, these are no-ops. They don't do anything. But if you are on a different platform and you are actually experiencing this problem please help us write the implementation for your platform. So we'll get it going. Postgres FDW. Who's using Postgres FDW today for something? Good for you. There have been a whole set of great enhancements to how Postgres FDW can be made faster. For one thing you can now control the fetch size which is when you read from a table how many rows are we going to grab at a time. Previously it was 100 and you couldn't change it. You can now set it on the individual table or you can set it on a server. It'll then apply to all the foreign tables on that server. That's a fairly small thing though. What we can do is we can push down joins. So if you are joining two foreign tables that are on the same remote server we can push the entire join operation to the other side, do the join on the other side and get the result back to the local server. Whereas previously we'd get both of the two tables up from the other server and do the join locally. We can push down normal joins. We can't push down things like anti and semi joins today. So where exists kind of queries will not be pushed down yet but the normal ones will be. We can push down ordering. So we can know if the query plan locally says I could run much faster if this data that came was sorted. Then we can generate a query on the other side that contains an order by. So we'll get the data back. And it'll do some costing there but for example if your other table has an index on it then reading the data sorted over there is for free. You just need to remember the index. We have updates for when you're doing updates and deletes over your foreign data wrapper we use to always do a select for update and then do an update. We can now do those updates directly just making things faster. You can now use extensions much easier between the two nodes. What you can basically do is you can on a server using this setting the option extensions you can tell the system that for example these extensions in this example say pgcrypto exists on the other side. And then you run say an md5 function or a SHA hash from pgcrypto previously we'd copy the data from the other side and run the hash locally. Now we know that the functions for pgcrypto exists on the other side so we'll just run the hash on the other side and get the hash back. It's up to you to actually install pgcrypto on the other node. If you do this you just promise the system that it's there. If it isn't there you're going to get an error. So it's a list of what's over there. Does it have to be installed on the original one as well? Yes it does. Otherwise it wouldn't know what the function looks like. And it's only available for functions that are actually wrapped as an extension. You can't point it to an individual function it has to be an extension. Same scheme as that's a good question. I think so. Yes I haven't actually tried that but I'm fairly certain that it would need to be done. So these are awesome performance things. I'm just waiting for Robert looking angrier and angrier over there. We have this other tiny little thing about performance that you may have heard of it. Parallel support. Some of you probably went to Robert's talk here earlier about parallel query. It's been a much requested feature for many, many years. If it was easy we would have had it for many years ago. Turns out it's not. But we do now have query parallelism for CPU intensive workloads. So the old equation in Postgres saying one query equals one CPU core does not apply anymore. It does apply to a lot of queries. It doesn't apply to all queries anymore. There are a lot of parts to parallelism. There are many still remaining. But we definitely have a very useful subset already in there today that we can do. At the core of it we have the ability to do these parallel sequential scans that basically can take a single table split it up into multiple workers run them on separate CPUs. So it will take four workers to scan a quarter of the table. And the worker that it's done in can then run filtering functions and target functions. So if you put a where clause each one of those four workers will apply the where clause to its rows. The result obviously bubbles upwards towards the leader I believe we call it. They will run it, but the idea is if we can push the where clauses down less data gets up to the leader it can run there. If you want this to work for your own functions they need to be marked as parallel safe. User functions that you create are by default not marked as parallel safe. So if you're using your own functions you need to mark them as that. Assuming they are safe to run over there they can't deal with like global state or anything like that. And the parallel sequential scans is sort of the foundation for the rest of the parallelism. That's where we start. On top of it you can run parallel aggregates for example. Aggregates tend to be CPU bound. The idea here is we can do partial aggregation in the worker and then combine those in the leader later. For example if you're doing a sum like these you want to recount. Well each of these four workers can count the number of rows it had or sum the column B and just send that one value up to the leader who then just has to sum four values or well I guess it sums four values for a count as well. This requires aggregate specific support. All aggregates will not magically work. Most built-in ones do. Big exceptions are strings, JSON, XML and arrays. None of that works. But all the ones that work on numbers sums, averages standard deviations, these things work. Also ordered set aggregates and window aggregates and these things don't work. Because if all of that worked from the beginning we'd have nothing on this list for the next version. We can't give you everything at once. We can also do parallel joins. Again it's based on these parallel sequential scans where each partition can be joined individually, each in a separate worker. So we'll join them all differently. We can only do nest loop and hash joins. We can't actually do merge joins yet. Because again we need something for the next version. You can control this in a couple of ways. There's a global parameter called max worker processes. That tells you how many in total of these background workers can be used. And if your query wants to run and it says I'd like to have four processes and they're all in use it's not going to be parallelized. You can also in an individual query or in an individual session set the parameter max parallel degree. That says for this individual query how many workers can it use. But it'll still be limited by max worker processes. You can't get above that. There are, as most things or as all things in the Postgres optimizer, it's cost based whether Postgres decides to actually try to parallelize. There are two main parameters that control this. Parallel setup cost, which is how expensive is it to set up this worker leader communication thing in the beginning. A parallel topical cost, which is how expensive is it to send the information between the different workers. And for those of us who are testing this it's really useful to also have force parallel mode. So you don't have to have a large cost to make it use parallel. That's probably mainly a testing parameter. But people are still going to use it for that. Yes, it will. It will work. It will not be good. But if you want to test these things force parallel mode on if you want to put force parallel mode on in your application go talk to Robert and he will make sure you never do that. You can also alter a table and set the parallel degree for a table. By default Postgres does an estimate based on the size of the table to figure out how much to parallelize. You can overwrite this because again much of this cost limit model is new. It's not correct, but it's an estimate. You can change your functions, alter function parallel is safe if you want to be able to use it parallelized. You probably not always but for a long time you've been able to set the cost of functions. If you ran with the default they're very unlikely to be parallelized because it's probably too cheap functions. We've seen a lot of cases. For example the PostGIS functions are really expensive and they were all running with the default cost. If you up the cost for functions that are actually expensive they'll get the better query plan the more parallelizing of the right thing. I'm running out of time. Luckily I only have one more thing. As you're supposed to do, right? Who in here has used Oracle? Who in here likes this? You don't? Well we have it now. So you can start liking it now. We do actually have so PostGIS now has Snapshot too old. We like to make the Oracle people feel at home. But it actually it has a lot of use cases. It's configured by time so you can basically say this is how old a Snapshot is allowed to be. If it's older than that and someone needs it to go away, like if it's blocking a vacuum, blocking a hot update it will kill you. The idea is this will prevent a large blow to build up. In some cases. We will kill it when it needs to access data. We will start deleting the data. If you never use the data, you're fine. You configure this in minutes. The default is off. Don't turn it on without thinking. Because a lot of people don't like this error message in Oracle. So you don't need to bring it in unless you have to. But there are workloads where this can be really, really useful. I'm running out of time. As always, there's always a lot more. There are developers in here. I'm sorry if I didn't mention your feature. If I didn't mention all of your features they're all good. Probably. Mostly. Most of them are good. There's a limit to what you can do. There are a lot of smaller fixes. There are a lot of performance improvements. Your systems will likely just run faster. So just before we go let me do a quick show of hands. What would you consider for your workloads for the things that you are doing? The biggest feature of Postgres 9.6. Assuming we don't revert any of this. So hands up for capitalism. Vacuum freeze map. Snapshot too old. Clearly we don't have that many Oracle users here. Multiple synchronous standbys. Postgres for a data wrapper will group all of those things together. The weight and lock monitoring functions. Anything else? Well you have to tell us what it is. No. Bloom filter index. This is a contrived module that made it in, I believe. I'm not entirely sure how useful it is. Yeah. But sort improvements was all over last release. We can't do those again. Okay. Well then, thank you very much for showing up. Next session I believe is over to the keynote room for lightning talks. Thank you.