 Making progress. Little check, mic check, one, two, one, two, watermelon, watermelon. Great. Well, thanks to all of you for coming to learn about the history of PostgreSQL. PostgreSQL, as many people know it, is a project that's been around for a very long time. In fact, it dates all the way back to – PostgreSQL dates way back in a decade before Linux was ever released. PostgreSQL was an active project. And in this talk, we're going to kind of talk about the history of the PostgreSQL project, how it came to be, where it is, and how it's evolved over time and kind of look at some of the characters and some of the key moments from that period, various periods throughout history. So kind of a high-level overview is that in 1985, so-called University PostgreSQL began as a project at the University of Berkeley. Then we're going to look forward about 10 years to when PostgreSQL became a community open-source project. We'll talk about commercialization and how PostgreSQL sort of grew a business culture around it, which continues to sustain the development today. And then kind of wind forward through how PostgreSQL's license has impacted the culture of databases and sort of what things look like today. So I'm going to go back all the way to 1976 to Berkeley University. This is a view of the San Francisco Bay area, looking at the tower, famous Berkeley tower out over the bay. You can see, I think, San Francisco across the bay there, Angel Island, beautiful campus. And in 1976, a young professor named Michael Stonebreaker was realizing that if he didn't get his act together, he was not going to get tenure. And the project he'd attached himself to first was not really making a lot of headway, and he was sort of casting around for something else to do. And he came upon the idea of databases as an area that could have some legs, and he began to do research into that. So his first project was Ingress, the interactive graphics retrieval system, particularly tortured an acronym. And he worked with a couple of colleagues there, and he based the implementation on the relational algebra papers by Cod, which later went on to be quite influential. And they had a query language there called Quell. And the Ingress code and people kind of went on to continue one of the other branches of the SQL world. Side-based, later becoming SQL Server, was heavily influenced by Ingress and Ingress development. And that research went from sort of 73 through to 1985, and the first sort of Ingress release was 76. It later went on to be commercialized as well. And Ingress, of course, in addition to being a tortured acronym, was a French painter. Here's one of his more famous paintings. This is the death of Leonardo da Vinci. So Stonebreaker had a good sense of humor from the beginning. In 1985, Stonebreaker had managed to secure tenure, and the Berkeley lab that had built Ingress was winding down. And he said, well, what am I going to do with myself now? And he said, well, I've done Ingress. It's gone off. It's doing things. I'm going to start something new. We're going to figure out what would you post Ingress, and thus Postgres was born. They wrote a paper, him and a man called Lawrence Rowe in 85, which sort of set out the goals of the Postgres project, which roughly are there. And so it's hard for me to read them at this angle here, but I've got them larger there. Essentially, they wanted to take Ingress, which was a pretty straightforward relational database by the standards of the day, and extend that concept to fold in a number of use cases that were not at the time considered to be workloads for databases. So in those days, a database was something that an analyst would use interactively through a terminal to analyze business data or to look into sort of business use cases. And what they wanted to do was actually to add the notion of an application persistence layer and to bring in a rules engine, which we would think of today maybe in terms more like constraints, and sort of fold all that under one roof. And then also they wanted to make the database extensible because Ingress had been sort of a closed system. You couldn't define your own data types or operators. They wanted to create something that anyone could extend and take into different areas. So over the next 10 years, they work on Postgres at Berkeley in 87. They have what they call demo ware working internally in 89. Postgres 1.0 sort of goes out to the world and the first users start playing with it. In 1990, they realized that the rules engine they built was a complete disaster. It didn't work at all. It was a neat idea that it didn't fly. They completely rewrote it. In 1991, they had the same problem again, which was that the second rules engine didn't fly. This is a great reference point for Postgres community members who have been talking about rewriting the rules engine for years, even in the current release. This vestigial feature that came out of, you know, like the header on this paper in 1985 is still something that's being actively argued about in the mailing list today about 30 years later, which should make you feel better about whatever bike shed email thread you're on this week. So basically Postgres kind of starts to crawl out of the muck and become a database in the mid to late 80s. And then in 1992, Stonebreaker kicks off Illustra to try and commercialize Postgres into a new project. And over the next few years, the Postgres lab inside Berkeley ramps down and Illustra starts ramping up. Illustra essentially goes into the like buzz cutter of Oracle's go-to-market plan and is destroyed. You can still buy Illustra from IBM sort of. I'm sure there are some in the same way that you can still write code in COBOL. It's sort of 1994, 1995, and, you know, the Illustra company has gone off and taken a bunch of the people and the code with it, but the lab at Berkeley is coming to an end. And throughout this period, Stonebreaker's strategy, he said in his sort of formal testimony about sort of the history after he got the ACM Touring Award was, well, we would just find really smart students. We would give them a nice piece of hardware, and then they would just die writing code for us. This is, I think, a class of grad supervisor moves and has produced, I think, most of the software that grad students have written over the years. So in 1994, two university grad students, Andrew Yu and Jolly Chen, have watched the lab kind of end and they're moving on to new projects. And they're kind of looking at this code that they've got, which is open source and going, well, I mean, now what, right? We've built this really cool thing, but no one's really using it. It's not really going anywhere. So they just decide to add SQL support to it. I actually looked up Jolly Chen and gave him a call and said, what kind of crazy person do you have to be to think that, oh, I'm bored, me and my friend are just going to go add a new query language to a database? How hard could that be? And his response was, you know, basically it didn't seem very hard. He said, we figured it would be about the size of a graduate class project, which Berkeley is famous for having extremely difficult graduate class projects. And he said, well, you know, it was just a little bit of yak and bison, right? We just sort of rewrote the query process there and that was fine. I thought this was really remarkable in the sense that it's the kind of project that you'd have to be crazy to take on or ignorant because it's obviously such an insanely big project you would never finish and indeed Postgres continues to add SQL features with every release today. But I understood it much better when I realized that they'd essentially been hacking on the internals of this thing for three or four years and they didn't have to do a good job because no one was using the thing and they could just basically get it working well enough that people would think it would be fun to play with. I thought this was a nice summation. At the time Postgres was still on the cloud language and he said, you know, it had become clear that SQL was the future and, you know, we felt like we built this really cool engine but instead of having a steering wheel which would have been SQL support we had all these knobs and levers so we thought, okay, well, we just need to build a new front end to it. So this is what cloud looks like. I have the same query here in SQL. And you can see basically you define a range, you retrieve into a value, and then you put a filter on it. So it's relatively close in form. I mean obviously they're quite different and the quell looks a little alien but it's not unreasonable to imagine that by rewriting some front end bits you could essentially parse SQL queries into an internal form instead of this crawl query into one. So they hack on this for a summer or two or in their spare time. They have other jobs and so on. Eventually they take this and they release it as Postgres 95 which they thought was really funny because Windows 95 had just come out and they wanted to take this Postgres thing that had kind of died out and bring it back. Nothing says modernity like adding 95 to a title because at the time that's how it was. So Postgres 95 came out and from 95 through 96 and then into 97 Postgres 95 was this weird quirky little open source database. And I met some of the early contributors from that period and I said why would you pick this up? I mean it was buggy, it crashed all the time, it had no SQL support very limited SQL support. How could you possibly have used this thing? And all of them had basically the same story which was that they needed a SQL database. They wanted to run it on a machine they had at home and they didn't have the money to buy a license for an expensive project. And since they could compile the code themselves and then run it when they would hit a crash they could sort of hack around it and get the thing working again. And so for a little while this was kind of how it was. Andrew and Jolly had a mailing list and people would try out Postgres, they'd download a tarball, they'd build it on their local machine and then they'd submit patches to the mailing list saying like, oh hey it didn't build on my, jeez I don't even remember what people would have been running, certainly BSD in those days. It didn't build on my hardware, I added an FDF to handle this use case where my system has a different byte format or whatever. And actually I will note that even in 95 before pretty much anything from the modern era had happened there was already a lot of really cool stuff that kind of came out of these original visions of the extensible database that was present in Postgres including Listen Notify which is the ability to basically subscribe to messages in the system and then sort of get a ping on your database connection when an event happens. So the problem of course was that there was this mailing list and people were submitting all these patches but Andrew and Jolly had kind of volunteered their time and energy to get this thing off the ground. But they had jobs and they had graduate degrees to get and other work that people were paying them to do and they just didn't really have the time or energy to follow through on this. And so here in 96 Jolly Chen says well you know what we need is not a whole bunch of people doing a little bit of work each. What we want is to actually have a few contributors who will really kind of like take ownership over this and really learn the code base and go deep with it. And you have to remember that at this point basically there were only a few people on the planet who knew how Postgres worked internally and a fair number of them had gone to a lustre with the exception of Andrew and Jolly. I mean there were a few others but they were the ones who were still active. And so I believe this is Scrappy who basically one day says okay well I'll tell you what we've got all these patches people have posted on the mailing list and Andrew and Jolly don't have time to apply them. I'm just going to stand up the CVS server on a server I have somewhere and I'll download all the patches and I'll apply them and then we'll go from there. Anyone who wants to play with it can pull that code. So you can see he says I'm going to make a mailing list at Postgres.key.net and you send me patches and I'll commit them. To some extent this is essentially the Postgres process today and we'll come back to look at how the Postgres development cycle works today. But it's really sort of founded in this moment. So Bruce Mongin says cool well if you're going to do that I'll make a list of all the bugs and today Bruce still maintains this wiki page that has his list of known bugs and tries to coordinate these efforts. And to this day there's not really a bug tracker for Postgres aside from Bruce making sure that mailing lists posts get answers and that's about it. I mean other people help but sort of the bug stops with Bruce in that regard and this has now been the case since what was 95 so that's about 22 years now he's been doing this job. Occasionally there is talk of implementing a bug tracker but Bruce's answer is always sort of well what's wrong with what we're doing right now? Hopefully he will keep working forever. So in 96 the Postgres project was already pretty global and they met in Berkeley with Andrew and Jolly to talk about the work they were doing and kind of look at things. This is your classic 90s hacker portrait no one's wearing roller blades but you know you kind of I guess that's because it's database hackers and you have Canadians, Russians, Americans Germans already represented so from the very beginning and I know Tatsuo Ishii was active at this point he wasn't there for this photo but already from the very beginning because there had been this gap essentially in the market for a hackable database people around the world had started picking it up and a big part of Postgres' kind of global influence has come from the fact that from the very beginning while there was no commercial incentive for a company like Oracle to add support for say Russian characters to their database the Russians who wanted a database could add support, post a patch to the mailing list and get some feedback from the community and in fact the Russians and the Japanese both contributed to this process. So by 1997 you know the name Postgres95 was starting to feel a little embarrassing and people started saying well what should we call this thing I really call it Postgres because that was the Berkeley project and we're something different how about Postgres v6 and someone obviously went well I mean what are we going to do next release and of course someone says no no please not Postgres SQL it doesn't slide off the tongue very nicely and the next mailing list post says okay well we'll just go with Postgres SQL after dropping the s and the slash and that was kind of the end of it so Bruce goes and just goes through the code base and renames Postgres95 to Postgres SQL and also adds a comment about select null and thus the perfect name was born the Postgres community has always been happy with its name except for the members and the users Tom Lane in 2006 describes this as arguably the single worst mistake the project ever made how many people feel comfortable saying Postgres out loud not necessarily everybody I have heard literally every possible variation of this over the years Postgres SQL is my personal least favorite it's also very common to see people capitalize the G for reasons that I have never been able to adequately explain I just called the project Postgres it saved you a few syllables and it runs off the tongue nicely at any rate for the last 20 years people have regretted that change and it seems like we will continue that for many years to come the kind of transition that was happening in this period was from one where you know you had the Berkeley project which was a funded research project and then it kind of went into the woods and the lust was off doing illustrious things and eventually getting acquired but the community Postgres started to form a community around that project and this is I think a really kind of key moment in the history of both Postgres but I think more broadly it's a good example of how communities can gel right we have kind of this orphaned code base and we have a community of people who need a database or want a database but don't want to buy a commercial one and this is kind of the ground conditions necessary to create the thriving open source community we have today and it really came out of a very simple place which was people were using this janky, crashy, limited database and it was sort of orphanware and that's a particularly unique thing I think that there was both a demand for an open source project but not really anyone who owned it and to some extent that again sets the tone for the rest of the project which is that Bruce and Vadim and these early hackers who were contributing they didn't have kind of moral ownership over the project so essentially the mailing list is created as a place to coordinate agreement about consensus around what should be added to the project rather than kind of having your benevolent dictator for life model and people were just writing things for themselves and they wanted to kind of bring that together so that this work wouldn't get lost and they wouldn't have to constantly reapply patches as other people worked so another interesting consequence of this sort of abandonware was that there has never been a company that owns PostgreSQL if you adopt, say, MySQL then along comes Oracle and buys it and suddenly you have this unpleasant feeling of using Oracle software if you're using Mongo how do I say this delicately? It's not a decision that I would probably make lately in addition to a variety of technical challenges you have a very serious threat to your application in the sense that Mongo could change their business model change their business strategy, get acquired, get shut down and sort of leave you without a community the PostgreSQL community has always been resilient because of its decentralization and its decentralization is a consequence of the fact that it adopted an abandoned code base For those of you who have used PostgreSQL a long time you might notice that there's a gap University PostgreSQL as it's known today ended with PostgreSQL 4.2 around 94 then PostgreSQL 95 came out and then PostgreSQL jumps to PostgreSQL 6 I like to think that the 5 in the 95 was the version number there So PostgreSQL with the community when they renamed PostgreSQL they basically said okay well we had PostgreSQL 4.2 from Berkeley we're going to call the work that we did we'll just reserve a version number for that and then we'll just make the first community release PostgreSQL 6 So to give kind of a flavor of just how early PostgreSQL development was in those days the things that people added during the PostgreSQL 6 release cycle were mainly corruption fixes, crash bugs and like the beginnings of support for basic SQL things like unique indexes or between and in qualifiers even still though as early as like 96-97 Postgres added the just index type the Russians who have gone on to add all kinds of really cool indexes in Postgres came out and said hey here's a generalized just a generalized index you can use this with any data type and that became a foundation for a lot of the cool geospatial and other data type work that came over the years to come it was not until later that things like not null were added right so if you were using Postgres at this point you were really kind of like you had to be desperate because there wasn't a lot of capability in there but at the same time the code base was small enough and the community standards were low enough that it was not that hard to get patches accepted and indeed as I said earlier we added Japanese and Russian support quite early and at this point was when the MBCC model landed in Postgres which is essentially a wall log actually wall was later too now that I think about it but this began to give different visibility to different transactions so that you could be updating a table and reading from it at the same time and get consistent results so I'm not going to make you sit through every release note from the last 20 years but I just wanted to give sort of a sense of a flavor of just how kind of scrappy things were in the early days so let's jump forward a few years I want to talk about today most people who contribute to the Postgres code base are employed full time working at some company that gets value from Postgres not universally so but this is a big shift from the early days where people use Postgres because they wanted a database at home today most of us use Postgres for work in one capacity or another so the first company to offer any kind of Postgres commercialization was PostgreSQL Inc started by Mark Fornier also known as Scrappy he was one of the he was the guy who set up the CVS server in the beginning and they did a little bit of consulting and lightweight work but it didn't go very far the next project was I think a wonderful story though kind of a tragic one the first project was Postgres and the year was 2000 for those of you who have been in tech for a while that's a year that sets off some alarm bells with anyone in 2000 a red hat had just IPO'd and made a whole bunch of money for a whole bunch of nerds and somebody said you know what we need is a red hat for databases that would be awesome and so they raised a bunch of money and you know it's very 2000 like early wired kind of culture they've got a guy named Ned Lilly whose literal title on his business card is VP of Hacker Relations and a bunch of the community members who have been with the project since 95 Bruce, Tom and Jan Wieck joined and their original goal is essentially to do professional support for Postgres and they're going to make their own great bridge DB and it's going to be kind of like the red hat Linux relationship unfortunately they managed to burn through all their money or at least realize that there wasn't a future for the company within 16 months and it's not, I don't think that is a great indictment of the people involved or the business model even so much as possibly just a consequence of the great sort of extinction event that was going on around them the flip side I heard though was red hat had kind of decided they were going to do postgres 2 and suddenly great bridge was in a tough spot where even though they'd taken investment from red hat people now red hat which already had a brand and a stock ticker was competing with them pretty directly and that made things pretty hard so Bruce goes and joins a Japanese company Tom goes to red hat to work on red hat DB red hat eventually decides that they should just shift postgres like everybody else but that's sort of the end of great bridge a few years later enterprise DB starts the same game plan but timing is everything in business and who knows what it took for them to succeed where great bridge failed but they've been around I guess about 15 years now and they're still out there supporting postgres and contributing to it and I think indeed they have been the largest single commercial contributor to the postgres project over the years and continue to employ quite a few full time people working on postgres and supporting it so at this point in our kind of telling of the postgres history we've taken this Berkeley code base we've taken a project that was basically dead and breathed life into it through the bizarre sacrifice of a couple of grad students and then a bunch of hackers from around the world who had nothing better to do with their free time have put about a decade of work into it and sort of this has become a very serious endeavor there are now a couple of businesses out there sort of supporting postgres and running postgres in the way that open source projects sometimes accumulate and something new happened starting in 2005 people begin to take the postgres code base which how many people have looked inside the postgres code base it's one of the most well organized consistent and lucid c code bases on the planet there are very few forks of my sequel and I think there are licensing reasons why that is but there are also really good technical reasons the postgres code base is great and very easy to work in and the community has always been sort of welcoming or at least tolerant of people forking the code base we'll talk about the license a little bit more in a minute but you know from the beginning of kind of companies springing up you know even the very first one postgres dual Inc they all mostly had some kind of what I would call a close fork so basically what it was is some customer they had wanted to feature that the community didn't want or that this group was going to hold back for themselves in order to be the thing that they sold so you know we have you know power grass power grass is postgres with a different kind of crash recovery that one of our customers needed and we saw that as our advantage and this kind of goes on still today but it seems to be less popular but for a long time that was kind of what the forking culture looked like and this was kind of made possible by the postgres license which was free as in beer it's an extremely permissive license it basically just summarized it says please don't sue us like the MIT license and this license was incredibly valuable in the early transitions from the university project to a community project because the community kind of had the license sorry for the pun to do as they please but it also became possible for other companies to take the postgres code and just do whatever the heck they wanted with it those of us who have been doing open source for a long time know about sort of all the fear and uncertainty and doubt that can grow around the GPL postgres didn't really have that problem because the BSD license has basically always just been yeah have fun kind of a funny story I spoke with some of the people who were involved in the project in the early days and they referred specifically to the BSD license as having been both influential and important to the success of the project ironically at some point in I think about 2008 Red Hat has a person on their licensing team just going through all the RPMs and taking a look at each license and making notes about what they are because you know it's kind of the wild west in the early days and they come across the postgres license and they send a note to the postgres community saying oh just so you know by the way we are updating our statement postgres is on the MIT license not on the BSD license because the wording of the license happens to be the MIT copy basically and this was really kind of a surprise to the postgres community and there's always been a historical rivalry between Berkeley and MIT and the technology world so the community was kind of put off by this Stonebreaker himself who started the project now is associated with MIT and he famously said MIT does a lot of good research but they don't really have the lab model the interdisciplinary sort of collaboration model that Berkeley thrives on and so a lot of the MIT software doesn't really have the impact on the projects coming out of Berkeley have and if you go and look at the amp lab or the rad lab this stuff is still continuing today to be really quite significant so the postgres community specifically Dave Page said okay fine well we'll just get approval for our variant of the MIT license and we'll just sort of run that through the process and then we can officially call it the postgres license so when you hear the postgres license in your head you can just pretend the license in terms of how you would explain that to the lawyers at your company and indeed the fedora project people who have to deal with all the licenses sort of snarkily note on their wiki about licenses that this is clearly just an MIT variant but the license as far as we can tell goes back to at least the early 90s maybe all the way back into the 80s so it's a little bit unfair to call it an MIT variant when it probably predates most of them but anyway let me know what you think I want to talk a little bit more about how the BSD culture if not literally the BSD license enabled what I call more distant forks and today the postgres project both draws community contributors from and inspiration from and influences out into a lot of other databases there are a lot of large scale MPP databases in fact I think the significant MPP databases in the market today are based on postgres so netiza, greenflum, redshift aster data, cytos data all of these databases are based on various versions of postgres to varying extent and indeed redshift amazon's data warehousing product is an upgraded version of paracel where they basically bought the technology from paracel and now because it comes from that same it's a branch of the same tree they're basically porting all of the work from the last 10 years since the fork happened into redshift and bringing it up to par so by forking from postgres I think a lot of people in the open source world are uncomfortable with these sort of closed source forks and I think for postgres it's been a real asset because it's allowed all these people to build these businesses and test hypotheses the community might not have had the resources, the wherewithal or the interest to try and then all of this work is now kind of more compatible with the core project and it creates a large community of developers outside and beyond the core open source community who are familiar with the code base who can contribute to the code base and often people move kind of back and forth between sort of the more closed and more open parts of the community throughout their career so I think that's cool and kind of a thing people don't normally think about when they talk about kind of the evil corporations taking the code and running with it and never giving anything back there's actually a lot of sort of ancillary return contribution that comes out of this and a lot of influence that the postgres community gets into other communities because of this so like you can log into redshift with psql even though redshift has been sort of its own thing for 20 years 10 years later cool so I want to talk now a little bit more about the present because I think that kind of summarizes the I think broad strokes of how postgres got to be where it is and the influence it has today it would be worth noting that of course I think aside from all the hard work the postgres community has done over the years to make the database better to some extent postgres' success is also a consequence of mysql's failure maybe that's a little impolitic to say when they have a track somewhere in this building as well but I think to maybe put that a little more delicately the challenging situations that mysql have found themselves in in the last say five years have presented a serious opportunity for postgres and I think the hard work and great features that postgres has built in the last five years have really excited a lot of people and helped them to join the community how does postgres get made? well it's pretty simple there's a mailing list of postgres sql hackers people argue on it a lot and at some point the argument results in a patch then people argue about the patch and there's a little app called commitFest where people can post their patch and then anyone can grab it and review it and look at it and then eventually when enough people have sort of reached consensus on the mailing list that the patch seems reasonable and that the idea is not a bad one then someone who's a committer on the project will commit it to the core trunk and roughly once a year usually in September give or take a release will happen a major release that's sort of like 9.6 10.0 is coming up the last commitFest for 10.0 is running right now and a commitFest basically is you know there's all these patches floating around on the mailing list every once in a while the community will say okay let's switch to review mode everybody kind of pulls down patches, looks at the patches applies things, tries to get their patches landed deals with the feedback from the reviews and then you know as the year tapers off the last commitFest is now this is March and then April through August will be sort of hardening and slipping in things that people still really wanted and then eventually a release will get cut and a new cycle will begin it's pretty straightforward but this process kind of reflects the mailing list culture from the 90s which I thought was really interesting so I want to kind of wrap up by talking a bit about themes in postgres development if you look at the release notes for any given postgres release and in fact I'd encourage you to go to Thomas's talk later today where he talks about what's new in 9.6, 9.6 came out in October or September this year September, so 9.6 arrived on time and there's a lot of cool new stuff in 9.6 but sometimes if you just look at the release notes it can be a little hard to unpack why a thing is important or significant and I think that better than looking at a particular set of release notes the way to understand what's happening in postgres and look more across releases in a theme so let's take an example here of something that's quite material this is from a talk given by Greg Stark the link at the bottom there's a lot of cool graphs in this but sort performance you might see a patch note that says oh sort performance something something something and you might go well okay like so what sort performance seems fine for me for what I do this is the performance I'm running a large table in postgres running the same query on the same data over all kinds of different releases and you can see here that basically whatever the start there 2003 this query here at the top is running would have taken you 150 seconds today that same query runs in about 40 seconds so it's four times faster you haven't had to do anything except keep your postgres up to date that's a lot of work but in any given release it's actually pretty uncommon for a huge change to necessarily impact any given workload if you go back and look at the history you kind of see well it sort of sits in one place for a while and then someone will notice something and make a big fix and then that will get everybody paying attention and then one or two more fixes will usually come in as people kind of work that vein of improvement and I think that you'll continue to see that over time so you know with things like sorting and sort of less exciting features in terms of how you might think about them when you read the release notes you can get a lot of mileage just by sitting and waiting for the next release to come up looking at durability or availability if you go all the way back to 7.1 the right ahead log lands and this sort of changes the game because essentially postgres is now a journaled file system in some regards and then basically every release or two, every few releases some major new capability drops in the system that takes advantage of that and adds new power so 9.4 added logical replication which means that you can take that commit log and send it as sort of parsable logical changes over the network and that means that if you're willing to do a bit of work you can actually write a receiver for that in any language for any database that could take those change sets and apply them to that database so you could replicate to Mongo or to MySQL or to a different version of Postgres or you could replicate part of your data set or you could take that data set and you could sanitize it for privacy reasons and to some extent that really interesting and cool capability was born out of this 7.1 release so many years ago and each sort of incremental step along the way has added more and more capabilities to it Postgres synchronous replication is like this really awesome capability most people don't know about where you can replicate to another database and then when you say commit you can set at a per transaction basis I want to commit this locally or I want to wait to hear from another server that is committed it before I move on and rather than having a complicated mechanism like Oracle has for this significant licensing cost what this does is it looks at the pointer in the right ahead log stream on the remote database and so you can actually synchronously commit at an incredibly high rate of throughput because it's inherited the work from all these previous steps along the way and it's continued to kind of get better so I think when you read Logical Replication this opens up new capabilities but the next is you'll start to see more and more features and capabilities built on top of this that are unlocked by these so there's always this kind of motion like that Remote Data Access again the very earliest versions of Postgres had something called db-link which was basically you could call a function and it would return data from another database so a remote query 8.4 added infrastructure that sounds boring and uncool you can't do anything with that 9.1 add something called foreign data wrappers we've been marching towards this new capability Postgres now supports remote joins, updates deletes, you can infer a schema from a remote database so you can almost treat a Postgres database as a single node in a larger graph of databases and query and join across them and indeed one of those MPP databases I talked about CitusData which I think someone from CitusData speaking at the end of the day will coordinate a whole bunch of Postgres into one sort of virtual super Postgres so this kind of work is building over time Postgres has all of the awesome indexes how many people are familiar with Postgres indexing? Hardly anyone Okay, I'll speak to this extremely briefly most databases have very straightforward indexes most people think about indexing kind of in the binary search tree sense Postgres has support for a lot of different kinds of indexes so for example the GIN index is an inverted index so if you want to find all the documents that have a particular word in it it's really efficient for that Btree is a really bad way to look that up but Btree lets you kind of dig in and find individual documents quickly and inverted index works kind of the other way around which is just wide at the top and then you can pull out segments at the bottom there are spatial indices which let you this is something if you have a background in computer graphics you'd be familiar with but if you want to find all the points in a particular region of space you can't really do that with a binary tree because it's one dimensional a space partition just lets you have geospatial types that pull all this data so this stuff continues to be added release by release the Russians like to name their databases after alcohol so they built GIN they're working on vodka and rum today so keep an eye out for that but all this stuff adds new capabilities that every release continues to add new data types, new capabilities and Stella's talk later today becoming a sequel guru I think we'll talk about some of that stuff that's a good talk I've seen it before and she'll talk about how some of the data types that are in Postgres can give you all kinds of cool capabilities but JSON B can take advantage of all of those index types for example and allow you to really quickly pull out sub documents from a very large collection in an efficient way because all of these pieces of the system work together so here we are today we have an incredibly vibrant global community of people all over the world who are contributing to the database both professionally and at an amateur level you can become a contributor I encourage you to to give it a try I can say from personal experience that the community is a great it can be difficult to break in because the standards are very high to get work contributed but it is a community that is always looking for new contributors and is really grateful for people who are willing to do the work to participate and I have seen people go from being relatively new to see and new to the community to becoming really respected contributors over a course of a few years and it's absolutely possible to if that's something you're interested in can probably advise you start small Robert Haas is a major contributor that he crunched the data and usually people who become significant contributors don't sort of set out to rewrite the query engine they come in with a small patch that fixes something that annoys them and work from there if you want to get more involved try reviewing some patches on the Commits Fest even just getting feedback from the community like you see a patch that sounds kind of cool download it, apply it post back on the mailing list about whether it works that kind of stuff really helps if you are ESL that's really not a problem a significant number of the mailing lists are in English but a significant fraction of the community is ESL and it really is a great community and a really great opportunity to participate in a community so at a high level I just want to sort of come back to the beginning the Postgres project as it was set out has been a success the Postgres project's goal was to create a database that was based on SQL on the relational model that was extensible and had support from new data types and new indexes and brought together sort of business analytics style clearing with application persistence lots of people use Postgres for all those things today I'm going to give them like maybe a C minus on the rules engine part but the rest was pretty good and the fact that it's still here and vibrant and influential you know Linux didn't get started until 91 Postgres has been around for a really damn long time and for something so old to be so vibrant and cool today is pretty uncommon I think the strength of Postgres a lot of it is from the resilient distributed team no one can buy Postgres no one can shut it down it doesn't rely on any one individual I mean there are a lot of people who have a lot of knowledge and experience but it is a resilient team and part of the reason why it survived 30 years is because it's a distributed team the permissive culture has been instrumental to the influence of Postgres over time by having a permissive license and allowing anyone to do anything they want with the code many people have gotten rich on the backs of Postgres developers but those people have in that process actually helped the community and actually contributed back to the community in surprising ways and if you want to understand progress in Postgres don't look at the individual release notes find the topic in that release note and look back at the changes over time and it will give you a picture not just of what this might mean and how you might be able to use it but also where it might be going and where the opportunities are for the future but if you have an idea that's the easy part then you gotta go spend 10 years making it happen so I encourage all of you to get out there make it happen thanks for using Postgres and thanks for hearing the history of the project I will post these slides there's lots of really fun oral histories from people and interesting stuff if you like this history stuff thanks for your time do we have a moment do I have a second left I have two minutes I wanted to actually comment on the logo because I gave this talk in Japan and you know as far as logos come the Postgres logo is a weird choice elephants are not really known as like nimble fast creatures but elephants never forget weirdly in Japan the Postgres logo is a turtle Postgres has had like a weird reputation for performance in the past Postgres with performance is excellent today but I don't think the elephant or the turtle were necessarily great influences there and the reason that Postgres has a turtle logo in Japan is because there was a cult in Japan that attacked the Japanese subways with sarin gas and they would wear blue elephant masks in their videos to the news and that happened right around the time that the Postgres logo was sort of standardized so the Japanese community said thanks but maybe that won't be our logo so they still have this turtle logo today anyway a little bit of trivia on the side okay do we have time anyone have any questions maybe take one or two for quick alright well thanks very much everybody test test where'd it go okay I'm going to get started now so my name is Joe Conway I have been with the Postgres project since probably 1998 so for almost 20 years I'm one of the committers I also am the VP of engineering for a company named Crunchy Data what I'm going to do is cover an introduction to Postgres and one of the things I want to just make clear to everyone up front when I wrote the abstract for this I was a little bit too ambitious and so the number of things I said I was going to cover in the abstract there's no way I could do it in one hour more like a three hour talk so what I've done in this slide deck is you can see here I've got an appendix section I'm going to go through 46 slides which will take us through the beginning of the sequel area so this is going to be mostly focused on more of the administrative part of Postgres getting it up and running and a little bit about the data types and how you form string literals and those kinds of things what I'm not going to get into in this talk is DDL, DML and just standard queries all of that material is still in this slide deck and this slide deck will be posted just to show you there is on the Postgres wiki there's this landing page on the wiki which is basically all of the different conferences that the community holds there's a link to each conference and then what we try and do is on that page we provide a list of all the talks that were given at that conference and we ask the speakers to link their slides so my slides I will put up here this is just scale 15x on the Postgres wiki I'd encourage you to actually find this page anyway because as I said all of the old conferences going back several years they're all posted here so start how do you install Postgres? well most people install Postgres using whatever packaging system their OS has so use either RPMs or use Debian packages there's a Windows installer there's actually two or three of them now that are posted on the Postgres site but you can also install from source and while this is sort of an introductory talk I did want to dive a little bit into something that might seem at first like it's a little bit more complex for an introductory talk and that is how would you build from source because I want to show you it's actually fairly straightforward to do and it enhances your ability to debug and if you end up running into some kind of a problem when you first use in Postgres and you go to the mailing list some query you're right crashes the first thing they're going to ask you for is how do I reproduce this so I'm going to show you how to do that so the first thing is you need to install the build dependencies and this example is going to be based on a Red Hat system so on Red Hat 7x you're going to need and then you're going to need all of the dependencies to build Postgres itself again the other thing about these slides is I've tried very hard to have everything cut and pasteable so if you go look at this later on you should be able to just cut and paste this into a terminal or into packages but basically what you need are Git and Bison and Flex GCC and some collateral libraries that enable things like PL-PURL to be compiled the next step is cloning the Postgres source Postgres source is kept in Git so here I'm just making a director called PG SQL Git CD to that cloning the Postgres Git right here and then I'm going to check out a tree for one of the stable releases so in this case I'm doing Rel95 stable which is the 95 version of Postgres obviously you could do 96 stable is the most recent stable version you could do an older one if you wanted to do an older one the other thing to note on these slides is I've also included URLs for a lot of these things that you can drill in and look at in more detail so now the next thing you typically want to do you're going to want to build Postgres from source but what I find useful is you often want to build from source and make the installation, the resulting installation of Postgres look just like what you would have gotten from your packaging vendor so what you can do here is we're going to capture actually the configuration that was used to build the RPM version of Postgres that was called and we're just going to change a couple of things that aid our debugging we're going to change this O2 is just the optimization of the compiler so that's second level of optimization we're going to change that to O0 which is basically no optimization and dash G3 which is going to include all kinds of extra debug symbols in what we compile so that when we use GDB we can see exactly what's going on and things aren't getting reordered on us so that if we want to step through code it matches the source code precisely this command right here pgconfig-configure gives me exactly the way the installed Postgres was configured when it was built so by using basically a shell expansion here and replacing the bits that I mentioned the bits that I mentioned that I want to replace I can in one step basically compile my or configure my source code directory to be exactly the same as what was installed from the RPM other than these small changes and then I can just do make and make install so that's really straightforward Postgres source it'll take you longer to grab and to get repository than it will to compile Postgres source probably it actually compiles fairly quickly and if you install Ccache which I think I had on the list of rpms to install after you've built it once subsequent builds are very fast I mean we're talking about 30 seconds or 45 seconds typically so it's not difficult to do and potentially really useful so what's the workflow look like normally what you're going to do is you're going to log into pseql and do whatever it is that causes the problem that you're trying to debug you want to find the back end process ID for that and attach with gdb and set some kind of a breakpoint and then go back and re-execute whatever was causing the problem I'm just going to show you a quick example of that just because it sounds a lot scarier than it is I'm just going to log into into Postgres and I can do something like select some nonsense and I'm going to get an error now what if I wanted to see what was going on up to the point of getting this error in this case if you were doing something that was causing Postgres to crash this is where you would want to do it in this other terminal I'm just going to look at some of the processes that are being run by the Postgres user and you can see this line here where it's got basically Postgres, Postgres, Postgres that's basically the user that's logged in and the database they're logged into with this local which indicates that I'm logged in over a local connection and I'm idle because I'm not currently executing anything that's what's called the Postgres back end process so when any connection is made there's always a back end process associated with that connection this is how I can identify the one that I'm logged in to P-SQL with so this is the back end connection that's supporting the client connection from P-SQL and this PID right here is the one I'm interested 15192 so now all I'm going to do is run GDB with a pointer to the Postgres binary which I can see from right here and I'm going to give it the PID of the back end process that I'm interested in at this point what I want to do is set some kind of a break point so it could be that if Postgres is crashing all I need to do is say continue here let Postgres run in the debugger and then go make it crash and we can do a back trace right from here for this example I'm just going to set a break point now here if you wanted to play with this you would need to know a little bit about Postgres internals but there is a there is a function in Postgres called error finish which is basically where errors get called so if I put a break point there and go back into here and run my bad statement you notice it looks like it's hanging and that's because it's sitting there waiting in GDB and at this point I can do BT which is a back trace and I can see exactly how I got to this point in Postgres and this is the type of information that if you go to the Postgres hackers mailing list I don't know how many people in here sat through the last talk about the history of Postgres but everything is done on this Postgres hackers mailing list so if you had a something that was causing Postgres to crash this is the information you would want to post so you can see it's not terribly difficult to do and it can be very very useful once you get into using Postgres to know how to do this what is about that before it's on? you don't have to compile from source if you want to do this using say the installed rpms there's typically a set of rpms associated with any normal rpms that's called the debug rpms basically I forget exactly what red hat calls it now it's like debug symbols so it'd be like Postgres dash debug symbols Debian's got their own kind of equivalent thing but basically the symbols that are associated with compiled code is normally pulled out into a different package because you don't run that on a production system normally so that's the other thing you wouldn't want to probably do all of this on a production system you'd want to have it on a test system where you can replicate the problem so yes to answer your question you wouldn't have to compile from source in order to do that the downside is if you're going to do any kind of really intricate debugging because of the optimization that's normally done again in a normal build the Postgres is going to be 02 or 03 optimization level and what the compiler will do is it will reorder things in order to make it run faster but if you're trying to look at the source code and step through code it'll like jump around in ways that are surprising and there'll be symbols like macros and other things that get optimized the way that you won't be able to see so if you build it yourself the way I just described it will linearly go through the source code exactly the way you're reading it and all of the symbols including the macros will be available to you if you wanted to print them out and see what they were ok so now I'm going to get into much more mundane things the first thing is once you've once you've got Postgres installed you need to initiate a cluster you need to create a cluster for the database and that can be done in a number of ways and how you do it is going to depend on how you install Postgres and what kind of system you're running on also maybe on the policies of your particular environment Postgres itself comes with a tool directly called NITDB if you're talking to Postgres hackers on the list they're always going to talk about initializing Postgres using NITDB these other tools actually use NITDB under the cover but it's pretty simple if you wanted to create your own Postgres cluster you just say NITDB-D and then tell it where you want the data directory to be now something to note here is that whatever user you're logged in as when you do this NITDB that's going to be the user that becomes the default Superuser for Postgres so if you do a NITDB logged in as Joe then the Superuser for Postgres is going to be Joe and one of the things that people will see in a minute that people tend to depend on is when they first log in to PCQL if you don't specify the database it will try and go to the database that matches the username so if your default Superuser is Postgres which I think pretty much every distribution does maybe there's one or two that didn't do that for a while and I think recently they changed but when you first go to log in if you just type after creating your cluster if you did that as something other than the Postgres user there is no database associated with that user and therefore you'll get this error message that says the database doesn't exist and that may be confusing so just keep in mind that if you do NITDB yourself manually it's going to be as whatever user you're logged in as if you want it to be sort of standard with most other installations you probably want to access you to the Postgres user before you did that so if you're using sentOS 7 there's a script that comes with that called PostgresQL95 setup if it's 9.6 obviously that says 9.6 and you just tell it you want to do a NITDB and it's going to create a cluster in kind of a standardized location for a Red Hat system I think Fedora actually has the same thing because basically Red Hat is based on Fedora well Fedora is actually kind of a head of Red Hat so what they do is the released Red Hat is based on an older version of Fedora they kind of use Fedora as their testing ground well if you want more cutting edge then you go with Fedora but if you want more production then use Red Hat or sentOS if you're using a Debian based system they've got a different script called create cluster and so this create cluster 9.5 main is going to create basically the database cluster in this standardized location and what they do differently than Red Hat is they put the config files in a different location under ETC so there's some differences between Debian based, Ubuntu based systems and Red Hat based systems it cannot be run again into the same location so if you try and run in NITDB a second time into the same location Postgres is going to detect the fact that that location is not empty and it's going to throw an error it's basically going to tell you it's not empty so I won't do this and there's really not enough time there's a lot of other options on NITDB you could give it a different location you could give it standard super user names you can give it different encodings there's all kinds of options that you can do and again I'd encourage you to go look in the Postgres manual if you really want to dig into that so in terms of starting Postgres again there's a number of ways to do it at the top here again this is not something you're normally going to want to do but it's sometimes very useful it's called single user mode basically if you run Postgres in single user mode that would be for repairing something that is horribly broken typically sometimes you can't fix Postgres without going into single user mode and that's really the main thing that would get you into that situation is something called transaction wraparound which you may or may not be familiar with but if you're doing lots and lots of transactions and ignoring log error messages you could get yourself into a situation where you really need to go into single user mode to recover but this is how you do it it's pretty straightforward you just dash that single you point to the data directory and you give it a database name and then only that user can be in Postgres in that terminal more standard ways to start it up there's a utility that comes with Postgres called pgcontrol you point it to the data directory you give it a location for a log file and you say to start up but again most people are running Postgres on some kind of a production server you're probably going to want to use the tools that come with your distribution so if you're sent to a 6 you're going to use the service normal services for Postgres probably so you can say service Postgres Postgres QL start Red Hat 7 they started using systemd so this is what it would look like on a systemd machine system control start Postgres QL 9.5 and then on a Debian system they provide another script pgcontrol cluster for starting it so these are all different ways depending on the system that you're on starting Postgres so again similarly there's a bunch of ways that you can stop Postgres if you were in single user mode you basically are going to just hit control D and it's going to exit you out if you're doing a manual stop of Postgres you can do it again with this pgcontrol program that comes with Postgres you have to specify which data directory you're talking about and then you tell it you want to stop and tell it a type of stop that you want to do and I think I've got a slide where I describe what this means further on in Red Hat 6 you're going to use the service Postgres stop Red Hat 7 you're going to use systemd system control stop and for Debian based pgcontrol cluster stop so here are the shutdown modes when you do stop Postgres you've got a couple of options those system scripts typically use this option that's called fast which basically says stop Postgres in a graceful way however don't wait for all the clients to disconnect so any clients that are connected if you do a stop fast which is the normal service control files are going to do it'll cut off those sessions at the knees but it'll cut Postgres down in a graceful way that is the way you would want to shut down Postgres normally because it preserves there are certain things like statistics that Postgres is collecting all the time about its operations and if you use this other mode immediate just kills everything summarily but if you shut down Postgres in that way it's equivalent to having having the plug pulled on the server or Postgres just crashing so what will happen is the next time you start up Postgres after an immediate stop Postgres will go into recovery mode it'll be just fine it'll recover itself you'll have all your data but certain things like the statistics get wiped out so now all the statistics that you have are going to start over again so you wouldn't want to normally shut Postgres down like that this smart shutdown will actually wait for any existing clients to disconnect so that one potentially could wait quite a long time before it actually successfully shuts down Postgres but in some scenarios you may want to do that so now let's talk about how do we terminate a particular session you've got some session that's running a query that has been running for 24 hours and you want to kill it I'm going to show you a different way to do this in a minute but right now it's we want to actually terminate the session that this thing is running in and a minute ago I was talking about with Postgres it's kind of a client server model your client, your application, P-SQL, whatever has a process on its end and it's talking to a back-end process which is dedicated to that client so what we're talking about here is killing the back-end process in front-end process but we can do that in a couple of ways there's actually more ways than I've shown here but basically if we do this P-S-F-U, Postgres and look for the specific database that we know that this session is logged into we can identify the PID and then we can just use kill and we say kill-sig-term that's how I terminate the actual back-end process I could also do that in a more clean way and actually these days in a more officially sanctioned way by doing this query here logged into Postgres I can look for a particular query from PgStatActivity which is this virtual system table that shows me what's running in Postgres right now and it gives me among other things the PID it also gives me a way to calculate how long this particular thing has been running and it gives me a state so in this case you can see the state is called Idle and Transaction Idle and Transaction is a bad thing to see it basically means you've started a transaction you may have done some work you may have inserted some data or updated some data we don't even know, you may have or you may not have and then you just kind of went away and took a coffee break or went to lunch or went on a two week vacation or something and because this thing started a transaction at a certain point in time visibility rules say it should be able to see everything that was seeable at the time that that transaction started and so if I have other sessions that are deleting records those records need to get cleaned up eventually by something called vacuum well that can't happen as long as this thing is open so it's sort of causing everything to get backed up and there's basically no good reason to ever have an Idle and Transaction so this is a good example of something that you might want to go kill so in this case I can use this PGTerminateBackend function on that PID that I got from doing this query and that will also kill that back end and I've actually seen in some environments I've seen people where you know the people writing the application the people managing the database are in two different groups and the database people can't get the application people to write their code in a way that it avoids Idle and Transaction and so I've seen people actually write cron jobs to do this basically go find everything that's Idle and Transaction for more than 30 seconds and kill it not very pretty but it does work alright so if you had something that was just a long running query and you just want to interrupt that query you can actually do a query cancel if you're in P-SQL you can actually hit CTRL C and it does the equivalent of sending the signal but it's the same same basic idea I'm going to grep for the database process that's associated with this back end that gives me a PID and then I can kill it with SIGInt instead of SIGTerm and that's the equivalent of a query cancel so now the running query will just cancel and control will return whatever client was running but it doesn't actually kill that back end so this is really the preferred way if you had a long running query you'd want to probably start here and the same thing you can look using PgStatActivity find the PID and there's a PgCancelBackEnd command that you can run are there any questions about any of that before I move on to the configuration okay so now we're going to talk about the configuration of Postgres first of all there's a file called postgresql.conf which is the standard configuration file where you would edit your configuration there's also something called postgresql.auto.conf I think it came in in 9.5 and that's associated with the setting of these things from the SQL level which I will show you at least I'll mention the command on the slide later on but there's an alter system command that will let you change these things in a permanent way persist a way and those get written out to this postgresql.auto.conf and that basically gets included in the main postgresql.conf so you wouldn't want to edit that thing directly but that's just so you know what that is the format just name equals some value and then what I've shown you here is some ways that you could basically activate configuration changes so if you go edit that file you can actually run this command pgconf reload in a psql session or you can do it from the command line the way I've shown here psql-c will run a command and just select that function you can use pgcontrol which has a reload method and then the service the typical OS service control also I think in most cases if not all have been built to support this reload so you can say reload from the service control and it will do the same thing and then finally you can do again the method that we were just talking about using kill find the pit of the back end process that the postmaster the main process and you can send a sighup to that and it will also cause these changes to take effect so in terms of making changes as I said if you want persistent changes you would want to modify psql.conf you can add some options into the startup script but that's probably not something that most people are going to do and as I mentioned in the last couple of releases there's now this alter system set which will make the change into that postgresauto.conf and therefore make it persistent across restarts you can also, this is something that a lot of people don't probably know about but you can alter a database or a role which in a postgres is equivalent to a user and you can bind certain settings specifically to that database to that role so you know we'll talk a little bit later about some of the settings I'm not going to be able to go through all of them in great depth but there's a good example of something called workmem workmem has to do with how much memory the session has available to it for certain operations like sorting and building hash tables in memory and if you have one particular user that you want to have more memory than everyone else because they're a power user you could actually say alter role and give that power user name and set workmem equal to some large value and have that only take effect when that user is logged in there's also, you can change some of these things from right within a live session now all of them won't take effect that way there are some settings that can only take effect after a postgres restart there are some settings that can only take effect after a reload which means you don't have to actually restart the database but you do have to reload the way I showed earlier but there are some that you can just set on a per session basis that will take effect immediately and only effect that session so the commands that you use for that are set you can do a reset, you can show the settings there's a system view called PG settings that you can actually includes basically it's a virtual table includes all of them you can actually update that view and set a particular value for a particular variable name so a particular configuration you can change on the fly again it only effects this one particular session but you can do it that way you can inquire the value using this current setting and you can set the value using this function set config and if you do that you can actually even decide whether or not that change should be transaction local or not so in other words you can make a change that only affects the current transaction when the current transaction is over it reverts back to where it was so now I'm going to try and go through quickly because we really don't have the time to go through these in great detail and this is not even all of the settings but we'll try and talk about some of the the main ones in terms of connection management this first one listen addresses this determines what what Postgres is going to listen on so if you're listening only on local host here that means no one can connect to Postgres from another server depending on your distribution this may not be the default but if you install from source it's typically the default so you may want to go in there and change this to the specific IP address of the network interface that you want to use for Postgres or you might put an asterisk in there all network interfaces right here is where you control what port Postgres is running on the question earlier was can I knit more than once well the answer is you actually can you can knit into two or three or four different locations but in order for each of those postmasters to run on a different data directory they're going to have to run on different ports so if you needed to do that for some reason you could basically edit this and have postmaster 1 run it you'll see this all the time one of them is running on 5 4 3 2 which is the standard port for Postgres but then you'll see a different Postgres installation that's running on 5 4 3 3 and 5 4 3 4 and 5 4 3 5 or whatever when you start getting into that kind of thing you typically write scripts and there's environment variables that can help with this again I'm not going to be able to go into that so if you were switching between different postmasters on the same machine probably have a script that sets everything up for you make sure that your client is using the correct binaries the correct libraries and so on and so forth because if you're doing this you may be running different versions of Postgres in fact that's something that people develop on Postgres like me do all the time I'm working on a bug fix patch that needs to get patched for all the supported versions of Postgres I might be running everything from master back through Postgres 9 2 all at the same time max connections so this is important if your app needs you're running a connection pool and you need 200 connections you're going to have to change this this defaults to 100 if you find yourself wanting to set this to like a thousand or something go find professional Postgres support please because you probably don't want to be doing that and if you really need to be doing that you probably are going to need some professional support super user reserve connections this is actually the number of connections that are going to be reserved only for super users so that if you did manage to use up all your connections you want to have a couple connections that the super user can get in so that they can run queries like that turn 8 back end that I was showing you earlier and then you can control whether or not you want to use SSL on the client connections here as well in terms of memory management shared buffers is kind of the area in Postgres where when you first read a table off a disk it gets pulled up into shared buffers and then from then on in a if someone else even in a different session tries to read that same data as long as it hasn't been evicted it'll just get pulled right out of shared buffers so this thing on modern versions of Postgres or hardware that might have 512 gigs of RAM or a terabyte of RAM or something you might set this to a much much larger version of what value than what it defaults to but if you're just kind of testing on a development server often this is just fine but definitely one of the first things you would want to look especially at a production instance to tune would be shared buffers lots of advice out on the internet about that if you do go to something very very large you want to make sure you're using huge pages in recent versions of Postgres there's a setting for try which means it'll see if it's available and if it is available it'll use it but if you have lots of memory you probably do want huge pages I mentioned work mem before work mem is a setting that is so shared buffers is once per database cluster so if I have 128 gigs of RAM I shouldn't feel bad about setting that to 32 gigs or maybe even 64 gigs depending on my workload and other things work mem is not only per session but it's potentially used more than once per session so a given query you might have to do a sort and it might have to do some kind of a hash table and or maybe even multiple hash tables so you can end up using multiple copies of work mem per session so go back to what I was saying earlier if you have a thousand connections and you've got this set to some large number and they're all busy running something and all using multiple copies of work mem you can do the math you could basically denial of service yourself pretty quickly if you're not careful so you don't want to set that thing to some huge number in general in the PostgresSQL.com but it might help a lot for a particular query so if you have one particular query you might change it in the session run the query and then it won't affect anything else or you might do that for one particular user and then use that user when you're running those queries maintenance work mem is used for building indexes and for vacuum for postgres96 there's actually a separate setting for vacuum work mem but it defaults to this unless you change it this one is one where you might want to set it higher because if you're building an index for instance the difference between the speed of building it and memory versus building it on disk which it'll do if this isn't big enough can be substantial I've done some testing it can be as much as 3 to 1 kind of bulk loading and index building as part of some kind of data warehousing stuff you may want to change this and again this is changeable for session you could set this to 8 or 10 or 16 gigs and maybe get an index to build all on memory whereas otherwise it would spill the disk you wouldn't want to do that generally and it sort of ties in with auto vacuum in that you might have multiple auto vacuum workers running and so if you change your auto vacuum worker count to 30 and you've got a very busy database and there's 30 auto vacuum workers running and this is set to 2 gigs while you might suddenly consume 60 gigs of memory just because of auto vacuum so again you've got to be careful with this one because there's a multiplier effect on it shared preload libraries is a way that you can preload certain extensions at the time Postgres starts up that can be useful both from performance perspective in that the library doesn't have to get loaded every time someone connects it's also in certain cases it's required for the library the way it's written to actually work correctly right ahead log basically the right ahead log is right ahead log, wall, transaction log X log are all terms for essentially the same thing when Postgres when you make changes inside of a Postgres to a table all of those changes first get written out to this transaction log and then they later return to the main heap and that transaction log is what's used for durability so if we crash that's how we recover as we replay that transaction log it's also used for the built-in replication in Postgres so all of these settings have an effect on how all of that works and the main thing to get out of this is that well first of all fsync you should never be turning that off without your data however if you're doing bulk loading on something and you've got all the data somewhere else anyway and you want the load to go as fast as possible and if it did manage to crash in the middle of the load you don't carry the scrap and start over it wouldn't hurt to turn it off and it will probably speed things up just don't forget to turn it back on before you go to production this checkpoint timeout completion target these are all kind of performance related the bottom line here is you want to have enough wall size available and you probably want to have your checkpoint timeout here set to the maximum value of 60 if you've got a very busy database you would want to have this setup set to 60 and these set to a large size and your goal is to basically minimize how often Postgres does these checkpoints during the checkpoints that's when the data that goes into the main heap for the table gets flushed to disk and that can create an IO storm which will slow everything down so the way you deal with that is you spread those out as much as possible and then this completion target is basically telling Postgres I want to spread that IO over the time between the checkpoints in order to even it out so on a very busy system you'll often go in and set this thing to 0.7 or 0.9 and this to 60 minutes and change these to large values and basically you're going to spread out as much of the IO as you can for the checkpoints I'm going to have to speed up a little bit I won't even get through the HBA.com stuff effective cache size and page cost sequential page cost and random page cost the thing to get out of here is these drive how the optimizer works which in turn has an effect on what kind of query plans you get which might very significantly affect how fast your query is run the defaults are usually not very useful I mean they're based on basically a toy instance so this value should be set to something like half the available RAM just as a very rough thumb rule it's meant to take into consideration the amount of shared buffer you have plus how much the operating system file system buffer cache is going to be these page costs are relative numbers it's basically saying that it's four times as expensive to do random reads as it is to do sequential reads depending on your workload whether you're running SSDs how big your shared buffers are you know if that may not be true it may be that random reads from shared buffers may be just as fast as sequential reads or very close to it so the idea here is set this to some usually I end up setting this down to a smaller number to get it closer to one to reflect the fact that most of the reads are coming out of shared buffers or they're coming off of SSDs or something like that there's all sorts of parameters that control logging you should just know that you can log to standard error but you can also log to the syslog which includes our syslog if you want you can do the csvlog and you can actually do multiple types of logs so you could log to syslog and csvlog csvlog will be a local log that will be in a csv format that's actually very convenient for loading back into a postgres table if you want to do that and in the docs there's a place that gives you the definition of the table and the command that you would use to bulk load the csvlog into the table so that you could do analytics on your logs if you wanted to you can control what the file name looks like you can control all sorts of things in terms of what gets logged and this event log is actually a windows specific thing that's if you want to log to the windows of event log these are all different things that you can log you know you can log every time there's a connection or a disconnection probably the most important ones to know here is this log minimum duration statement what this will do is you set this to a number of milliseconds and basically any query that takes longer than that number of milliseconds the query will get logged along with how long it took so if you combine that with what I was talking about the csvlogging you can basically say I want to see all queries maybe on my production system that take longer than two seconds take the logs, parse them and analyze what's going on now there are other ways to get more or less the same information that can be useful the one caution there is don't set it to zero on a production system and leave it that way for very long because number one it does have a significant if you set it to zero every query gets logged and so there's an impact of just writing out the log in terms of performance the log will also grow very quick if it's a busy system but that can be useful even on a production system to turn it on you can do this do some operation that you're trying to analyze and then turn it off into a reload without basically interrupting anything else and then take that captured data and go analyze it again log line prefix every line in the log you can control what the prefix of that line is in terms of time stamp and other things the process ID that the user that was logged in if you do the csvlog you get a lot of stuff for free essentially but if you want it in your normal log you can control it and there's a lot more options than the ones I've shown here miscellaneous settings the search path this has to do with postgres tables are kept in schemas actually functions are as well what tables are visible to you is dependent on the search path so the default search path is this variable user which basically just maps to the user that's logged in so if there's a schema with the same name as the user that's automatically by default in the search path and also this public schema so if you just go create tables they end up in this public schema for a lot of reasons you may want to have all your tables in a very specific application schema once they are there you can either refer to the tables as schema name dot table name or you can change the search path to put your schema in the search path and now you can just refer to them as the table name you can set a statement time out you can set a specific date style and you can also set what your time zone is so again there's a lot of other options things that you can tune far too much to cover in a one hour talk but hopefully I think I got probably the high points in terms of the things that you'd want to know as a new postgres user now this session is normally supposed to end at 1150 lunches after this I'll go through this as quickly as I can I may go over a few minutes if you want to get up and leave it at 1150 feel free to do so but I'm going to try and at least get through the HPA stuff so the host base authentication file this is similar to the postgressql.conf except this controls security this is going to tell you which hosts are allowed to connect how the clients are authenticated what user names they can use what databases they can access and the URL is there if you want to look this up in the documentation it's read on server start up it's also re-read on reload if you change it you need to do a reload the first line that matches is what's going to be used so if you get confused because you know you have a line that looks like it should match and you're not getting the behavior you expect it might be because you're matching an earlier line and if you don't match any lines basically access is denied this is what it basically looks like you've got a column for the type of connection that we're talking about a database, a user, the address what method to use for the authentication and potentially options in terms of the connection type it can be local which is basically not TCP at all it's just unix domain sockets host which means I'm going to use TCP connection host SSL and and host no SSL are basically saying I need to use SSL or I don't want to use SSL specifically for this type of connection in terms of the database you can have the word all which means basically all databases you can say same user which means this particular line is only going to match where the database matches the user name so I can restrict the user to only login to their own database I can say same role which means the user needs to be in this role there's also when you're setting up Postgres replication there's a special replication user and that doesn't match the all keyword so if you're setting up replication you'll have a rule for the replication user as well and you can say one or more specific database names and you can also have a file that lists the database names that you want to match user is very similar you can have a wildcard all users to match this this rule you can specify a list of specific user names you can say a group users that belong to that group can match this rule or again you can have a file name the address you can specify either a host or a network using CIDR kind of form you can basically say I will only allow external connections to my database from the same subnet or from one specific host so you can lock it down only your app server can connect directly you can say specific host names by name you can also say that match any of the server's own addresses or just match the server's same network and those are just shortcuts there are a bunch of methods that are supported by Postgres trust just basically means no password MD5 is the built-in MD5 password MD5 is generally not considered secure anymore so a lot of companies are trying to avoid it it does support SSL certs it supports various OS methods of authentication including basically anything that PAM supports it supports GSS SSAPI, LDAP, Radius and you can actually have a rule that just says if something matches this reject it and then in terms of the options those are going to depend on what specific type of authentication so here's an example of an LDAP if I set up LDAP my option is actually going to point to my LDAP server this is a trust basically saying anyone connecting locally I'm going to trust with no password and anyone who's logging in from the same network needs a MD5 password so I think we're pretty much at a time and I apologize for not getting all the way through this but hopefully you found what I did cover useful all the slides will be posted basically as soon as I get a chance to sit down with my computer after this I'll post the slides and I'm more than happy to answer questions I'm here all weekend there is a Postgres community booth we'll be there all weekend with volunteers so please feel free to stop by and ask questions I'll be there quite a bit of the time if you want to ask me something specific so thank you you don't seem to mention any of that always alright yeah thank you well sorry for not getting through all the material that's good can you hear me okay? awesome so guys welcome it's one o'clock so I'm going to start first of all thank you very much for attending I know Scale has a lot of different talks going on so I appreciate you being here the title of this talk is Becoming a Sequel Guru and my name is Stella Nissenbaum I work for a company called Yellow so this is going to be my really quick 30 second plug for us we essentially create a recruiting software as a service and our main goal is to eliminate the gaps in the hiring experience for recruiters we do this primarily by being very client focused we have great technology and we know the market very very well that's all I'm going to say about my company so as far as what I'm going to talk about today we're going to do a quick review of the overview of Sequel Syntax I should also say because this is Scale this talk is specific to Postgres usually I don't have to say that at Postgres conferences but here obviously there are a couple of different tracks so this syntax is going to be specific to Postgres a lot of this stuff will translate as far as concepts to other database platforms but probably not all of it and there might be some actual syntax variations so we're going to do a quick review of join types raise your hand if you feel like you're pretty familiar with the different join types in Sequel okay we're going to review set operators again, raise your hand if you're pretty familiar with those okay I just want to get a gauge of how much time to spend on different topics filtered aggregates raise of hands okay grouping sets subqueries a little more a lot less CTs okay lateral joins okay and then we'll go over some questions so judging by the experience in the room if you guys have questions please raise your hand and stop me I would much rather not get through all of the content but have you walk out of here feeling like you understood the concepts that we did discuss then get through my talk and just walk out not understanding anything I said okay not if you agree awesome okay first a quick overview so as we first start learning Sequel we think oh what is standard Sequel syntax oh we have some kind of select statement from some table maybe we have a where condition and then we have an order by great as we learn a little bit more we say oh maybe I can join multiple different tables then as we get a little bit more experience now we say oh maybe I can do various aggregations and have a group by and a having clause as you really dig into the documentation though the syntax looks like this and then it continues to that and then it continues some more and so we're not going to review all of that but that's just to give you an idea of how much variation there really is from the most simple of things that you can do to the point where you're really doing complex stuff so we're going to start with join types it looked like quite a few people here were familiar with them so I'm not going to spend much time on this so basic inner join when we join two tables on some condition the result that is the combination of things that match from both tables then we have outer joins where we also join on some kind of condition but now we have the option of returning all of the rows from one or both of the tables and only the matching rows from the other table any unmatched rows will show up as nulls and the one that most people are not familiar with is the cross join how many of you have used the cross join before intentionally yes how many of you have used it intentionally and then how many of you have used it unintentionally alright the idea of a cross join is that it creates a Cartesian product between your two tables so what does that actually mean it means that we're taking all of the results we're trying to get every single combination that can possibly exist between your two tables and you do this by not specifying any kind of condition on your join so in my really simple example let's say you sell several different products in stores where you do this if I then want to say oh I want to see the total combination of my potential inventory for every single store how would I do this so I might say I want to do a select star from the stores table cross join it to the product table now one thing that I want to point out in response to the accidentally versus intentionally in the older syntax where you just join through commas it was very common for people to accidentally forget the wear condition and then that's how accidental cross joins happen whereas if you do this explicitly you can see that you're doing a cross join and you're less likely to make this mistake so in this case when I cross join these two tables my results that look like this I have all of my stores my store in Chicago and my store in Dallas and each of themselves coffee and tea any questions on join types okay, set operators so I apologize I don't have handouts but try to memorize this data set there will be another one later you're going to love me so let's say we have two different tables we have a customer's table and we have a supplier table each of them has a name, a city a postal code, a country and in the case of the supplier let's say we have their total annual revenue and I want to get something that says I want to see all of the cities that exist between my customers and my suppliers how do I do this? so I'm going to do this with a union and in this case I'm going to start with a union all so I'm going to say select the city from the customers and union all with the city from the suppliers and what happens is I get my entire list it runs the first query output of the second query one of the things that you'll notice here is that Chicago shows up in this list twice that's because we have both a customer that's in Chicago and a supplier now if I want to remove all of my duplicates I'm going to do this by specifying a union rather than a union all and in this case we can see that Chicago is listed only once any questions on union versus union all? then we move on to accept and intersect so in this case much like we can append to data sets we can also do other things so in one case I can say I want to see all of the cities where my customers exist except, meaning removing the cities where my suppliers exist so in this case we remove Chicago from the list and we're left with just New York, Stockholm and Minsk in the opposite direction I can say I want to see all of the cities that has both customers and suppliers in this case I want an intersection of the two and in this case we have Chicago any questions on that? filtered aggregates this I think is actually very specific to Postgres I'm not sure if other data platforms do this but oftentimes we have various different aggregations that we want to do in the same query output and in my case let's say I want to get the total revenue across all of my suppliers but I also want another column that's just my revenue for the US so in the past I would have to do this for the case statement it's doable but it's not convenient to do starting 9.4 in Postgres you can literally specify a filter clause within your aggregation so you just do the sum of revenue and filter where specify your condition simple as that that's just a really neat trick question? I'm going to spend a lot more time on grouping yes yep so the question is can you do a filter rather than where something is equal to something where it's not equal to something so in this where clause you can specify any kind of condition that you could normally specify in a where clause so in my case I could say where country is in a list where country is not equal to something where it's not in a list at any kind of condition you could normally put in a where clause you can put in here with the exception of I don't think you can do a sub query there I'm not sure if you can any other questions? okay so I'm going to spend a little bit more time on grouping sets because it's a fairly new feature in Postgres that came out in 9.5 and so a lot of people aren't necessarily familiar with it how many of you guys are actually on Postgres 9.5 or later? how many of you guys don't use Postgres at all? okay SQL server MySQL anything else at Oracle? okay so I don't know how this works in Oracle or MySQL because I haven't had to do it in SQL server it works very similarly SQL server I know for a fact back in 2008 SQL server I think so it's been around for a while there so the idea of grouping sets is that it allows you it's an extension of the group by but it allows you to group and create subsets and subtotals within your data set without having to do it through subqueries or uniting multiple different queries together so what does that really mean? I'm going to go through it each one by one but as a quick overview roll up is just a variation of a group by it's essentially a shortcut that allows you to define a specific hierarchy and then do your subtotals for within that hierarchy cube is also another shortcut that allows you to define a particular set and then almost do a Cartesian product of the subtotals going in every direction as opposed to in a single direction hierarchy so one more table for you guys to remember I apologize let's say now in addition to customers and suppliers we have an orders table our orders table will have a customer ID to link back to the customer table, supplier ID to get back to the supplier table an order date and an order amount so let's say I want a query that gets me the sum and the average of orders as well as the count of orders split up by the supplier country the name, the order month and the customer that the order belongs to so we have our normal group by for anybody who may not be familiar with I'm just truncating the date to get the month I know this syntax is very post-respecific but it's just getting pulling out the month from the date so this is how our normal group by work our result set would look something like this we have our country supplier order month, customer name the sum, the average and the count that's pretty straight forward now let's say I want to get subtotals for each group individually and display them within the same result set so in my case I want to get an aggregation by the supplier name by the month by the customer and then across all of them together so in this case I specify group by a grouping set and list of the specific components that I want to group by my result set will look something like this so first I have the aggregation grouped by my supplier name I have the aggregation grouped by my customer name then grouped by month and then across the entire set what do you notice with this data set that would be problematic yep so the fact that we have nulls makes it very difficult to understand what's happening because this could be a null because there were nulls in your actual data set or this could be a null because we've aggregated across that field so how do we fix that there's this very convenient nifty little case statement that you can do where you specify when the grouping of that particular item is equal to zero then you use your supplier name otherwise meaning if it's not zero that means that it was aggregated across this field I'm going to specify it as all suppliers you can put whatever value you want in here that's going to be convenient for your users so I added this case statement for all three of my attributes and now when I run the same exact query I see this we have a label that says all suppliers and all months all suppliers and all customers all suppliers all months and all customers all suppliers all months and all customers are there questions on this so if you're familiar so the question I'm going to repeat this for the camera the question was what is this end statement right here in the case statement correct that's your question so when we do a case statement you always have case when something then something else something else and then you have to put an end keyword to end your case statement any other questions so everybody knows how to do grouping sets now okay this is promising alright I'm going to fast forward a little bit okay so now we're going to go over an example with roll-up so like I said roll-up is essentially by the way guys this talk a copy of the actual PDF is posted online so if you don't have to take pictures I can if you go to the Postgres Wiki there is a list of the different conferences and presenters post their talks so mine is posted on there it will be posted for scale if there's a link if there's not there was definitely a link for PGCONS SV and this talk is on there as well so roll-up is the first kind of short hand of a grouping set where we want to specify a hierarchy and be able to drill all the way up through the entire hierarchy so in my case I want to be able to first get all of my aggregates by country supplier and customer name then I want to drill up to just country and supplier country and then up across everything so in this case rather than specifying group by grouping set I have the roll-up keyword and for the sake of space in my actual result set I'm also limiting this to countries for the US and Spain because otherwise my result set doesn't fit on the screen so what does our result set look like first we start out with our most granular level aggregations so we have our country, our supplier name our customer name and our aggregations then we drill up one layer now we have our country and our supplier name but all customers then we go up one more layer where we aggregate just to the country but across all suppliers and all customers and then one more layer of all countries and all customers does this make sense one thing to note both for roll-up and for cube you could produce the exact same result by using the regular grouping set it would just require a lot more typing but you could do it you just probably don't want to okay so cube very similar but again allows you to create the combinations rather than going in a single hierarchy going in every single direction possible so in this case I limited my again for the sake of space I limited this to the supplier name and the customer name and I also have a limitation on customer IDs just again to make it a little bit easier and able to fit so what do we see we start out with our most granular details where we have our supplier and our customer name then we have the aggregation or the drill up in one direction that says okay we want all of the supplier name but across all of our customers then we go in the opposite direction where we have all suppliers split by customer and then we have the aggregation across all of our attributes any questions I would probably want to do a bunch of shorter statements or whatever higher-level programming language I was working in aggregation by customer name underlining or something and make the data a little easier to read have you talked on that with a strategy or a bad strategy or a way to work it into what you've already got going sure, so the question for anybody who couldn't hear was as far as stylistically this kind of approach so does it make sense to break this up in some way programmatically to have it displayed in a different way so to answer that I think it really depends on your use case and what technology you have available to you so if you are let's say developing a web app and you just want specific metrics one by one then yeah it makes sense to split it up for the purpose of display now you could potentially still do aggregation on the back end one and then programmatically pick out the line you want to display by just setting a filter on this but if you are doing let's say you are sending out standardized reports to some business areas then for those purposes where they are accustomed to let's say seeing excel outputs or something like that or some kind of actual bar charts then at that point you might want to split up more in this way so totally preference and use case based any other questions? okay cool so subqueries in Postgres and I think in pretty much most other SQL platforms we have two different types of subqueries that we can do we have uncorrelated and correlated subqueries I'm going to start with uncorrelated subqueries so the idea of an uncorrelated subquery is that it calculates that result set that can then be used by your outer query and it's executed only once so what's an example of this? let's say I want all of my supplier and cities from my suppliers table but I want to limit it only to instances where the country of the supplier also matches a country that we have somewhere in our customer list how do I do that? so in this case I put this subquery in a warehouse we have select country from customers and this is our internal subquery and then we're limiting our country of our supplier based on this output so essentially what SQL will do is it'll run this across your entire table and then it'll compare this country to that list and we'll see something like this when we run this so once you understand that once with correlated subqueries is that your inner query somehow references your outer query so rather than being calculated once at the beginning it gets re-executed every single time for every row of your outer query so what does that look like? so in this case let's say I want to count I want a count of my domestic customers so customers that are in the same country as my suppliers so in this case I would do a subquery like this where I have a select count distinct from my customers table where the customer country is equal to the supplier country and the supplier country is obviously in the outer query so that gives us something like this one thing to note a lot of the time correlated subqueries are written as joins it just depends on your specific use case are there any questions on this? alright this is always a fun one please ask questions if something is unclear because I feel like a lot of people look at window functions and think oh yeah that makes sense and when they look at it again they go wait what happened so please feel free to stop me and ask questions as I go through this what is a window function? so a window function essentially allows you to perform a calculation across a set of rows while still maintaining the row level detail of other rows so what does that really mean? it allows you to reference the current row and also see other rows within your set while it's doing the calculation so when would you want to use this? obviously if you need to do some kind of aggregations while retaining that detail this is the basic syntax of a window function it looks a little bit confusing when you first look at it if you haven't worked with it before but high level you essentially have your function name and then you have your over clause that specifies your window definition and it has an optional frame clause so what does that actually mean? a frame clause determines the set of rows that make up your window frame the default if you don't specify one is going to be the range unbounded proceeding so I'm going to walk through a couple of these examples and the details of it so we can specify a range or rows within our frame clause and then we can specify a frame start and a frame end frame start can be anything from unbounded proceeding value proceeding current row ending can be unbounded following value following current row so the default for the end is the current row the default for anything without specifying a clause entirely is the unbounded proceeding range we'll go through the distinction between range and rows in a second okay basic example let's say I want to compare each supplier's revenue to the average revenue of other suppliers within their country how do I do this? so in my case we have the regular revenue listed and then we have this average revenue over we're partitioning by the country and this is going to be our result set now notice because we did not specify a particular frame clause it did it as a range so we can see that we have let's see we can't look at it from this angle for Canada because there was only a single row for Canada the average is the same as the revenue for that supplier the same thing happens for Spain but for the US now we can see that our average is the average of the two suppliers and we see that right there so the range treats all of the like-minded things as a single unit so in my case we see that when we do an average so in this case I did a running average I removed the partition just to make the point we can see that our running average starts out equal to the $400,000 then it averages these two rows and now when we get to the US it's lumping both of these rows into a single operation to get this average does that make sense? nod or shake your head okay I see some thumbs up now if I change this to be rows the thing that changes is now each of the elements within my order by are treated independently to get a true running average so in this case rather than treating the $300 and the $250 as a single unit it calculates it for these first three rows to get this number and then all four rows to get this final number does that distinction make sense? sometimes we have multiple different window functions within the same query and often times we want them to be running on the same window clause that means we have the same over section so we're partitioning in the same way and we're ordering in the same way in cases like that rather than copying and pasting and typing it continuously and potentially making mistakes we can just use the window clause so the syntax for that is at the very end you just say window and give it a name as provide your definition and within your actual SQL rather than specifying it again just say over and then the name of the window that you gave it pretty simple convenient shorthand yeah so in this case it's ignoring from the perspective that over clause I removed my partition so that it would run across countries and added an order by instead in the window clause so the question was can you have a filter clause within the window function itself? yes you can in most cases there are some potentially I don't remember I'd have to look at it because there are a lot of basic aggregation functions that you can use window functions for like sum and average things like that and then there are specific aggregations that are just window functions that we're going to get into so depending like for example you can't do a window function with a count distinct right so there might be some nuances of some types of functions where you can't do it but as a general rule of thumb yes you can any other questions? so like I said there are some functions that are specifically built out as window functions that don't exist as their own aggregation aggregation so the first one is a row number so in my case let's say I want just a straight row number for each of my suppliers as well as this average that I had before so I have my original over clauses right here the same way on my last slide but now I've added one more row it's row number over and in this case I wanted to go across all of my result set because it's fairly short so once I have this output we see that each of these has its label of 1, 2, 3, 4 it essentially creates an ID there's also rank which is very convenient one thing I will say I've seen many people try to use rank as a row number but I don't I've seen this numerous times where somebody just doesn't realize the difference between the two and they try to get a unique list of IDs using rank it may not work so the thing that you have to remember so in this case I'm doing my rank and I'm ordering by country descending because I'm from the US and clearly the US is the best so so one thing that you'll notice is in this case when we produce our rank both of the rows that have a US listed as a country they have the exact same rank which means that if you were hoping that this would be unique you would be incorrect once you get to the next level it jumps to 3 and 4 the same way as it would normally does that make sense and why you don't want to use this as a row number so my order by is specified within my rank so as far as what value appears for the rank is specified by my window clause I'm actually going to get to that in a second so the thing that determines what value is here is specified within my order by here but in this case I did not specify an order by result set but first what I wanted to demonstrate is that you can also order your overall result set without changing the impact that it has on your actual values so in this case the determination of my rank is still being done right here with this order by country descending but now I've chosen to order my output by supplier name so in this case the result set is now in a different order but the values for the rank stayed the same does that address your question okay well I think there is an implicit order when Postgres runs its queries if you don't specify an order by there is some implicit way that it does it there is correct you may not know the order in which your result set appears but you can still control the values that are generated so it depends on if you care the order in which your result set is displayed I suppose any other questions on this part if the parameter that you get is unique meaning you have no duplicates on let's say country yeah just to repeat the question if the variable that was used for the order by within the window clause was unique would the rank appear unique as well and the answer was yes any other questions so just a quick list of the different window functions that are available in Postgres again I'm not necessarily sure how this overlaps with some of the other technologies because I've only used some of them but these are pretty cool I would highly suggest that you look some of these up they're really useful in a lot of cases kind of analytics it's nice to know that they exist and most likely they exist for your other database platforms as well okay give me a raise of hands again on who's familiar with CTEs very few of you so as a quick disclaimer there is also another talk this afternoon I think it's not the one right after this but after that that does a deep dive on CTEs which is interesting as I go through it please come see Jim's talk I think it'll be very useful for a lot of people so quick overview of what is a CTE a CTE is defined with a width clause and it's essentially like running a sub query and caching the result and being able to reuse the output of that throughout your query so one thing to remember is these can be referencing and recursive we'll do an example of that in a minute but also remember that at least in Postgres I'm not sure about other database platforms they act as an optimization then what that means is that Postgres will optimize your common table expression independent of the rest of your query okay so that can be really great or it can be really bad if you don't realize that and you're hoping that it'll look across your entire query and optimize everything given how it relates to one another well your CTE might be running for 5 hours so I don't recommend that basic syntax you specify a width you can specify the recursive keyword give your CTE a name and then do a select statement and then you can reference that later on so let's do a quick basic example let's say I want to start out by getting a count of customers by country okay so I specify width CTE C is going to output my country and my customer count and this is the query that's going to produce that result then I want to create a second CTE that does the same thing but for my suppliers and then I want to join the two CTs on country to get my full list of countries split by the customer count and the supplier count does that example make sense the recursive example you could so I didn't do this in my output right here because that's determined entirely by this query right here right so I did a full join which left it as nulls I could have certainly put some case statements or done a coalesce I did a coalesce for the country to make sure that every single row listed a country and I could have done the same thing with the others to coalesce it with a zero because I wanted to kind of accentuate the distinction of it we were doing a full outer join between the two sets any other questions so we're going to go through a simple recursive example so let's say I want to get a list of all numbers from 1 to 100 how do I do it other than you know hard coding it so we start out we have with the same as we do for other CTs we have the recursive keyword we have our name and then our output I'm just labeling it as n we start out with values as 1 because that's how we get our starting point then I'm doing a union and selecting n plus 1 from self-printing the CT itself where n is less than 100 and then I'm just selecting everything from that CT so let's break it down how is it actually being evaluated we have essentially it's easier to think of it as 3 buckets so we have our recursive query result we have a working table and we have an intermediate table so we start out by evaluating the non-recursive term if we're using a union rather than a union all we're going to remove all of the duplication and we're going to take the output of that non-recursive term and place it into 2 places we're going to place it into the result set query and we're going to place it into a working table after that for as long as there is something within the working table we're going to follow these steps we're going to evaluate our recursive term and we're going to substitute the contents of the working table for the recursive self-reference so what that means is right here when we had from CTE name the first time that it ran through this loop with 1 because that's what we had as n once we do that again if we have a union we remove all duplication and then we take the output of that we place it into an intermediate table as well as into our recursive result set so we're appending our result set then we replace the contents of the working table with that of the intermediate table and empty out the intermediate table until there's nothing left in the working table so I'm going to go through another example that kind of demonstrates how this works so let's say we have a table that has various car parts in it and we have what is the whole part what other parts are in it and how many of those other parts are in it so in our case we have a car that has four doors an engine has a cylinder head you get the picture so let's say I want to write a query that calculates for me how many screws does it take to assemble the entire car to do that visually you would say oh well let's see there are five screws in a wheel but a car has four wheels so we have to multiply that then we also have 14 screws in a cylinder head a car has one of those so then we add that obviously you don't want to be able to do you have to do this manually every time so we're going to start with our non-recursive term we're going to select everything from that table where the hole is equal to the car because that's the grain that we want to start with and then we're going to add a union because we don't want any duplication and we're going to select so this can be a little bit confusing I'm selecting everything from the CT itself joining it back to our original parts table where the hole is equal to the part of the CT and then we're multiplying the two counts so kind of like we said that if a wheel has five screws but a car has four wheels we need to multiply five by four so once we do this we go through that process and then at the very end I just want to get a count from this new CT which I had labeled list right here and I'm going to put a filter where the part is equal to a screw and that's going to calculate for us that it takes 34 screws to build a car are there any questions on that? so the question is how's performance it depends on how convoluted this is and how much it has to do right if you're running over a table that has a thousand rows versus you're running over a table that has two billion rows that's going to be a massive performance difference it's not as performant as a lot of other things but if you think about how non-performant it would be to do this in some other way then you know from at least from a sequel perspective right this ends up being more performant then if you have to keep doing left out or joins without knowing how many times to do it because that's ultimately the issue you need to keep joining the table back to itself but you don't know how many times you have to do it so this kind of does that for you and enables you to only do it as many times as it's actually mandatory rather than trying to guesstimate and usually trying to overestimate in order to account for the vast majority of your options so it's kind of like if you ask me is it great I'm going to say no but is it better than some of your other options yes any other questions on that okay so another cool thing with these is that they can be writable so let's say I want to take some rows out of my main parts table and archive them into a different table how do I do that ultimately I want to remove them from the first table write them into another so in this case we're storing our archived rows the CT itself is actually doing a delete so obviously please be careful don't empty your entire table by accident put a where clause if you're deleting things but we're deleting from the parts where the hole is equal to the car the CT itself we're just doing a return star and then in our final output we're actually inserting into a different table everything that's within this CT is convenient from this perspective that you can kind of have this cache and move data around yep correct no in this case I'm not so in this case I said oh I want my parts table to only be parts of a car not the car itself let's say I don't want the hole as the car there are just certain rows that I realized belong somewhere else and I want to archive them for whatever reason exactly any other questions here we can do a writable CT with recursion in it using the recursive CT so in this case I have the same exact query that I had before that produces my list of all of the different parts that are necessary to build a car starting with the car level but now I'm materializing it into another table because I decided that for my engineers that's going to be much easier to use than having to perform this calculation every single time make sense yeah yes I did not intend to skip that there we go I did not I was just looking out at the audience while speaking so like the gentleman pointed out there are some caveats and I think that Jim is going to go into some of these in more detail later on today but one thing to note even though these are called recursive CTs they're actually not recursion iteration so they're just looping through a part of the code rather than calling itself I don't really know what spurred the word choice originally but just be aware of it also be aware of things like union versus union all obviously it will change your result set I already mentioned that these are optimization fences there are a couple others so you just have to know what you're doing if you are using CTs especially in production code if you want more detail in some of these talks to me later one more last bit so lateral join series comedian it showed up in 9.3 in Postgres I don't know about other database platforms so what this allows you to do is it allows you to reference other elements of your outer query within a sub query in your join and you can do this with set returning functions or actual sub queries that reference tables or entire box of code so quick example with a set returning function and also please note that when it comes to set returning functions the lateral is implicit whereas with tables and sub queries it has to be explicitly stated so I'm going to show you that in a minute but let's say I first create a table that generates a series from 1 to 10 and then I want to take that new table lateral join it to the same series but this time substituting the maximum value with the initial value that was produced in my table so I can do this in two ways I can specify the lateral key join or I can omit it and it'll perform in the same exact way and it'll give me an output like this does that make sense nod yes so an example with actual queries that probably makes a little bit more sense let's say I want to get start with all of my customers whose name starts with F and then I want to get a list of all of the suppliers that operate within the same countries as those customers so when I try to do this and I have my sub query that limits my customers to those with a name that starts with F and then try to join to a sub query countries that are just matching those countries and I run this I get this error so Postgres knows that this table exists so that the sub query exists it just tells me that I can't reference it from this part of my query so how do I fix that we add this one magical word and voila everything works instantaneously and there are other ways to do this you could do something like this at one of my old companies when I looked at our codebase this is what we had everywhere this was before lateral was implemented so we had these really awesome correlated sub queries within the actual join criteria as you can imagine performance was just really great and reading this was especially great when you had like 20 left outer joins correlated sub queries in the join criteria really human readable so it's pretty messy obviously if you don't have to do it don't use lateral where you can any questions? like I said for people who are trying to take pictures this is going to be online if you go to the Postgres wiki it has a list of the different conferences and you can look at the slides on there thank you this is right there this is your back I changed the title slides and I deleted one slide I just realized I forgot something okay shall we get started good afternoon my name is Magnus Hagginder and I'm here to talk a little bit about Postgres 9.6 but first a couple of quick words about myself I said I'm Magnus Hagginder and I'm here for a company called Red Pill Linpro which is an open source services business we do consulting and training support and things like that based in the Scandinavian region we're in Sweden, Norway and Denmark I'm out of Stockholm in Sweden myself in Postgres I also do a lot of work on the project I'm one of the core team members I'm one of the commuters and I'm serving on the board of the European foundation for Postgres I'm in the European one well let's just leave it at that that's enough about me let's talk about Postgres Postgres 9.6 being the latest version so who's already using Postgres 9.6 in production who's using it, quick up, in production, quick down who's already using Postgres 10 nobody come on let's go about who's using 9.5 4 3 2 1 prior to 9.1 okay one poor person who knows that 9.1 and earlier are no longer supported I think this was probably the best turnout ever with only one person raised their hand on 9.1 or earlier usually there's a lot more people who use the older versions so Postgres 9.6 is the latest released version of Postgres and probably the one that you should be using for those of you who are not following the Postgres development schedule closely we typically do a major release every year if we look at the development schedule for 9.6 it started in June 2015 when we branched off what was about to become 9.5 so the way that we work in the Postgres project is we have the master branch which is the upcoming next version and then for each major release we branch off the stable branch that then lives for 5 years so basically what happened on the 30th of June in 2015 was that we said we're splitting off 9.5 it's done now and we opened up for addition of features into what was about to become 9.6 and then of course we have a development method everybody has their own name for something that's reasonably similar in Postgres we talk about something called commitFest the general idea is we spend a month developing code, writing patches, submitting patches and then we spend a month reviewing and merging these patches and that's the process that we call the commitFest is to review and get these things committed get them merged into the master branch and the 9.6 development schedule was split up into one in July, one in September one in November, one in January and one in March so as you can see the one in March actually went on for about two months instead of one but that's because it's the last one and now we actually have to stabilize everything and around May or so we started getting beta versions we got a release candidate and in September we got the 9.6 release and we've been slipping we've actually been aiming for September releases for a lot of years and for three years in a row we missed September because you know we're an IT project you hit all your deadlines but we did actually manage to get back on schedule and release it in September and the current development work going into Postgres version 10 is also aiming for September but being Postgres if we're not done by September we're not going to release by September we don't release until we feel the project is ready but we have managed at least for one release to get back on track on the schedule let's hope that we can keep that up for the upcoming releases as well we try to aim as I said for one version every year and each major version is supported for five years which is why 9.1 went out of support in I think it was November of last year because it was then five years past the release of that version so 9.6 was released in September 2016 which means it's going to be supported until September of that 2021 so once you start approaching that start planning your upgrade so let's get into what we're actually talking about here which is the new features that came in this version the reasons for you to upgrade I always try to group these it's not necessarily easy but I've tried to split them up into sections of DBA and administration developer and SQL replication and backup and then of course everybody cares about performance so we've got a few things particularly about that of course the whole sort of traditional DBA in my experience doesn't really exist anymore like we're all sort of crossover call it DevOps call it what you want but the general definition I'd like to say is well developer and SQL features these are things that are exposed at the SQL query level and the things I've qualified as DBA are things you do in configuration files and through tools and things like that so let's take a look at some of the DBA and administration features I'm going to start with a very small feature that has been on the top request list I think for probably 15 years in Postgres which is you can now set a parameter for idle in transaction timeouts so that if you have a session that's idle in transactions previously we've had a session timeout like you can kill a query if it runs too long but if you have an open transaction that's not doing anything it'll just sit around and an open transaction that's not doing anything at all can block vacuum from doing a good job causing table bloat all the performance implications that will eventually have so now you can set the parameter that's very conveniently named idle in transaction session timeout because we really couldn't think of a longer name and if you set that to a certain number of seconds milliseconds sorry so in this example I've set it to 5000 that's 5 seconds if you leave a transaction idle for 5 seconds your session will get killed the difference here is you have a statement timeout if you set your query can't run for more than 5 seconds Postgres will cancel that query but your session will still exist but if you run into this idle in transaction timeout you'll see that you have a TCP connection from the client that's because we have no real way to re-synchronize with the client of which transaction state it's in but if we kill it then there is no transaction state and the client can reconnect and try again so it's one of those small things that will save you a lot of your manual scripts that used to do this or your monitoring that used to try to track when this happens so speaking of monitoring it's been a while since we broke all your monitoring scripts so we figured it's time we've changed the PG stat activity view it has much better weight information now previously you had a column that said weighting through or false this column is gone therefore all your scripts that said I'm going to graph how many queries I have that are weighting they're all broken now you're welcome what we instead have are two different columns that will tell you exactly what we're waiting for in much greater detail so not only will we tell you hey we're waiting we'll tell you in this example for example you can see we have the two new ones which are weight event type and weight event saying we're actually waiting for a lock and this can be a lot of different things there is some 20 plus different things that we can list version 10 will have even more things of exactly what they are with the description in the documentation obviously but your previous query which said select star from PG stat activity where weighting is now going to be where weight event is not null so if a session is not waiting for anything these fields are going to be null but as soon as we're waiting for something which can be in this case we're waiting for a transaction ID lock which means we're waiting for another transaction we could be waiting for the wow insert lock for example meaning we're waiting to write to the transaction lock and somebody else is doing that so we can't do it right now many of these are very very short held locks well they're supposed to be very very short held locks and if they show up that might be a problem so it's something that will break a lot of monitoring scripts well we believe it's worth it because you get much much more data about what's actually being wrong another thing that's useful I've never written a query to try to figure out when you had this weighting equals true to figure out what it's waiting for by joining in and all that stuff did you make it work? you probably made it work most of the time but it's actually there are things that were really hard because we didn't really expose all that data well I have good news for you we have a function called PG blocking PIDs which means you can just change your whole query to be PG blocking PIDs and you're done so I will go back and show this example here my PID 4026 is waiting it's blocking on something so what I then can do is I can query PG blocking PIDs for 4026 and it says well it's waiting for this guy and that can be an array of multiple processes if it's waiting for more than one thing and then of course we can take that back into a join back to PG stat activity to see what's this other PID actually doing and this will actually look at all obviously the internal state so you won't miss something by querying the wrong column in PG locks or querying the right column in PG locks and the data just didn't propagate there for everything so that will give you a lot more insight into why your system is waiting we have some other things that have been requested for a long time we have a new feature called utility command progress which is a framework and then we have a single implementation of it which is PG stat progress vacuum which will for every vacuum that's running in your system it'll tell you how far along it is so if you're running a vacuum here you can see well again this 4021 it's running a vacuum it's currently in the face of scanning heap it has scanned 27 pages out of 4425 and you can see how it's progressing you can see how many pages it's actually vacuum and all those things now don't expect this to be linear you can't just take this and say oh it's done 27 out of 4425 I know how long this is going to take that's not how it works but it lets you see how it progresses in this case it lets you see well if it's done 27 out of 4,500 you've probably got a while longer until this guy is done in this case I set a really high vacuum the actual vacuum would take an hour or so because the table has not a lot of blocks it was really hard to capture that but this will show for all your vacuums now the framework supports other utility commands as well we don't actually have any views for them we don't have the data but we hope in the future to have for example create index progress there's no fundamental reason we can't have create index progress it's just nobody's written it yet and vacuum is something that we use a lot furthering on the topics of monitoring we have a couple of views and functions there's a new view called pgconfig and a bunch of functions called pgcontrol something or other the view will output exactly the same information as you get from the command line tool pgconfig today and the functions pgcontrol something will get the data that you can today view with pgcontrol data but this means you can view these in a sequel session you can view them remotely you don't need access to the server to figure out flags for example pgconfig will tell you exactly which configure flags were set when Postgres was built now we can view these remotely maybe your application needs to adapt to certain things in Postgres and the pgcontrol data you can get information about checkpoints and all this stuff it can save a lot of things the thing I think probably from a dba side or a database management perspective side in 9.6 is the by far biggest change or the biggest benefit will come to anyone who has a database with a large number of transactions who has experienced transaction ID who has experienced transaction ID wraparound I'm glad that wasn't very many who has experienced the vacuum freeze at the really inconvenient time a few more so the thing that we have is featured whereby we track all frozen pages so how this works in Postgres in a very simplistic way as you know every row has a version stamped on it which transaction did this and eventually we run out of numbers because it's a 32 bit number the way we handle that is that we move something called an epoch so we move what zero means that means that we can have an open space of 32 bits and in between those we need to run this thing called vacuum freeze vacuum freeze in a standard way goes over all the pages in your table and looks at the rows and says is this row visible to everyone because as soon as it's visible to everyone we don't really care about who created it anymore and then we stamp it with this magic transaction ID called the frozen transaction ID saying this one's visible to everyone and we need to do this once every double a billion transactions because we're 32 bits and when we do this it takes a long time, it generates a lot of IO and it usually happens at a bad time because of you know Murphy so what we do in 9.6 is we track which pages contain only completely visible rows because if they do that then vacuum freeze on these pages will be a no-op, it doesn't do anything but 9.5 and earlier would still have to scan the page to find this out but now we use our disability map and we track the fact that this is the case and we can just skip these pages this means that a vacuum freeze of a large table like you can have a 10 terabyte table but most 10 terabyte tables usually there's a very small portion at the end that's actually getting inserts and updates which tends to be mostly read only which means that this expensive vacuum freeze operation can just skip to these blocks in the end and vacuum only does so we still have to do the freeze vacuum but it becomes much much cheaper and if you have a large database that can really save a lot of time, a lot of IO it works for mostly read only table if you update one row on every page I'm sorry we're still going to have to vacuum the whole thing but that's not a very common use case the common use cases can be helped a lot by this if you're lucky enough to never have run into a vacuum freeze well then you just don't run enough transactions in your database to benefit from this but at least it's not going to hurt you you can be happy for the other people we have some changes to the Postgres foreign data wrapper who's using foreign data wrappers today who does not know what foreign data wrappers is foreign data wrappers is the SQL standard name for remote tables basically so foreign data wrapper is like a database driver so we have a foreign data wrapper for Postgres we have one for Oracle one for MySQL one for Twitter and you can create a table saying that this table here uses this driver and gets the table from over there so basically saying this table exists in another instance and then we'll run queries against the table as if it was a local table and it'll fetch the data from the other apps and there are a few things from the admin side one of the big things I think with Postgres FDW the Postgres specific driver is that it now has support for extensions if we run for example the simple query I have a big table and I run PGCrypto has a function called md5 for example to calculate an md5 hash if I do a select PGCrypto.md5 big text field from this remote table on 9.5 and earlier it'll transfer the whole remote table over and then run the md5 function because the md5 function comes from an extension if I call say the length function it knows that this other guy is a Postgres and length exists so it just sends over the length call and runs it remotely the thing that's new in 9.6 is that you can tell the system in this case you say on the server foo I have installed PGCrypto already and Postgres will then know that hey this PGCrypto.md5 exists on the other side it'll push the function call to the other side and it'll only transfer the result of the function call meaning the actual md5 sum across the network back of course you can tell it that PGCrypto exists on the other server and not install PGCrypto on the other server well then that's your problem you just caught an error because we'll still push the function down so it's only telling the Postgres and executor that this extension all the functions that you have for it are also available on the other side so the question is do you get a meaningful error message yes you will get the error well you'll get the error message saying that it can't find the function md5 hopefully that brings meaning to you so let's move on and look a little bit more at the developer SQL feature side so things that we've exposed through SQL we've added something called phrase searching who has used full text search in Postgres just a couple so this is the basic syntax of how we do full text search in Postgres we have a function called plane2tsquery which takes a plain text string and turns it into a full text query we can also create the actual query with keywords and things then we have the double app sign operator which is up here which says do a full text search and normally we would do that against a column in a table for this example I'm just creating a value by saying 2tsvector take this text and imagine that it was in a table so this query says does a full text search for quick fox match the string the quick round fox jumped and the answer is true it does because full text search prior to 9.6 would look at individual words and both the word quick and the word fox are present in this text so it's true the difference that we do with phrase searching is we just replace plane2tsquery with phrase2tsquery and we get the value false we could also replace it with false the idea is when we say phrase2tsquery now it's looking for specifically the word quick followed by the word fox they don't just both have to be there they have to come directly after each other and they don't because somebody put the word brown in between them so they don't match and that's the simplest form of doing it of course this being developed by our Russian indexing experts there are many forms that are not as simple and that are really cool this is actually a shorthand for writing a query that looks like this tsquery notice we're not saying phrase2tsquery we're just saying I'm writing the raw tsquery and it's actually quick less than dash greater than that's what it transformed this thing into this is functionally equivalent to the thing down here those two are exactly the same thing so of course we still get the false but the interesting thing that we can do with this is we can replace this dash with a number so in this case we say less than 2 greater than fox that says the two words quick and fox has to exist within two words like there can't be more than two words between them suddenly it's true or you can say they have to exist within 500 words and then you'll be able to control your phrase searching very carefully the most common thing is obviously we want the words when they're right next to each other but being able to specify this distance operator can help you exploit the phrase searching much more and you can for example you can do step-by-step searching saying oh there was no matching when they were next to each other and see if there's something there to sort of move along and give the users some feedback through that process and for those of you who are SQL developers I'm sorry but that's actually the only real sort of SQL level thing that I'd say was in 9.6 there are of course other SQL level features in 9.6 but you know there's only 45 minutes for the talk and there are many other interesting things happening so instead let's take a look at replication and backup the fact that we broke some of your monitoring scripts of course we're also going to break your replication scripts if you set while level to hot standby before to make your replication work we've removed that it's now called while level replica while level replica is exactly the same thing as what we used to call while level hot standby you can actually still write hot standby in your config file and it will work but if you then query the system what it was it will say it was replica so if you have like automated configuration that monitors that these things match they're no longer going to match and you need to change it to be replica we also used to have a while level called archive that has been removed if you specify archive you will also get replica I think it took us a few years to realize that we originally added these two levels of separate because we didn't really know if there were scenarios where the overhead of having hot standby versus archive was going to be really bad and you know with however many, six, seven years worth of experience we know now that it's not so there's no need to keep them both which also means if you're running a Postgres prior to 9.6 don't bother setting while level to archive just set it to hot standby and be done with minimal still is an acceptable optimization that helps you but if you're thinking archive just set it to hot standby it gives you more features for very close to no overhead and whatever little overhead it is it's going to be worth it to do that on the replication side we also have a new view to help you monitor with something called PGSTAP wow receiver which exists on your standby note previously we had PGSTAP replication that you've seen on your master note it's all the connected replicas how far ahead in the stream they are how far behind what they're doing PGSTAP wow receiver exists on the replica and tells you about which master it's talking to so that you can monitor it so that you can find out if you're connecting make sure which of your clusters it's connected to things like that it will have zero or one rows the view still exists on a master but it will always be empty it will have one row if it's a standby that's connected to a master and Postgres doesn't have multimasters so it can only be connected to one we have some improvements through replication slots if you're on Postgres 9.4 or newer and you're using replication but you're not using replication slots go read up on replication slots you probably want to use them they will get rid of this problem whereby your standby dies because it fell too far behind the news are that PG base backup can now use replication slots so you don't fall too far behind while you're taking a backup thus rendering your backups broken if you do that it's still in 9.6 it still requires you to create the replication slots yourself and it requires you to remove the replication slots yourself don't forget to do that or you will run out of disk space because we will never recycle well Postgres version 10 will actually remove that requirement and it will be able to use an ephemeral replication slot where it creates it that PG base backup begins and then destroys it when it's done but in 9.6 you will still need to remember to clip that thing yourself if you're using the function or if one of your tools or scripts are using the function PG create physical replication slots it is now able to reserve well directly which I'm sure who is using that function today probably none of you the idea is it closes and it's a race condition where previously you could create the replication slot and then as a later step you had to activate it and if you had a really high transaction volume on your server when you activated it it might already be too late so in this case you can activate it automatically as it's being created and finally we have two large things in our replication we have the ability to support multiple synchronous stand-by 9.5 and earlier from 9.1 to 9.5 you could have asynchronous or synchronous mode if you had synchronous mode Postgres would pick one of your stand-by and ensure that any commit you did was made on the master and on one stand-by and any other stand-by nodes you had would all be asynchronous and you would write something like this you'd say synchronous standby name equals node 1 that would tell Postgres and it has to be acknowledged by node 1 and that's it the new thing in 9.6 is you can say something like this where you say synchronous standby name equals 3 parenthesis node 1, node 2, node 3, node 4 whereby it requires 3 of these guys to acknowledge the commit before we consider it committed and the way it works it does it in a priority order meaning if all of these nodes are connected it will wait for node 1, node 2, and node 3 if node 3 drops out then it will wait for node 1, node 2, and node 4 so it lets you get for example sometimes you might want to say well I want a synchronous replica locally and I want one in my other data center now you can do that by just listing two nodes in it as always when you're doing this make sure you have at least n plus 1 so if you're using this syntax you need at least 3 nodes if you have 2 nodes and the standby node goes down then the system freezes and becomes read-only because you can't acknowledge the commits if you're setting it up like this with only 3 nodes assume node 4 didn't exist you say 3 nodes well then if any one of those 3 nodes goes down your system becomes read-only because we're waiting for all 3 of them so make sure that you have spare nodes so that your system doesn't stop because you lose a standby node it's bad enough that it stops if you lose the master node we also have a new way an extra level of doing synchronous replication 9.5 and earlier synchronous replication would in its full synchronous mode would send the commit to the standby node write it to disk sync it to disk and then acknowledge to the application that the commit was done and then the actual replaying of this so that the insert you did for example was invisible in the database that was not synchronous that would be asynchronous but we secured your data but you might not reach it yet with the new synchronous commit mode equals remote apply we wait for it to actually become applied and visible in the database this obviously is slower because we're waiting for more things in benchmarks I'd say it's actually been surprisingly little slower but it's still slower but it gives you the guarantee that once you've had the commit acknowledged it's actually visible on the other side and it's been reapplied into the database if you're using this you need to be careful with this thing called standby streaming delay whereby we intentionally can delay replication waiting for local transactions on the standby because that will then delay your synchronous replication and that can make everything slow and everybody sad you can combine this of course with multiple synchronous so you can have multiple replayable synchronous replicas and you can combine it the same way you could do previously the synchronous commit variable can be set to different values in different transactions so you can say these transactions are important these are semi-important and these are not important at all but let's switch to performance everyone likes performance right is there anyone in here who would like your new version of the database to be slower not you, I don't trust you I've had an acceptable answer nobody trusts Jim but from somebody who actually did like that he was selling hardware if you're selling hardware it's good if the new version is slower because you get to sell more hardware the rest of us would really like things to be faster and every new version of Postgres comes with a lot of new performance enhancements the good thing is that most of them are actually free as in you don't have to do anything it's just faster that's the best kind some of them will require you to tune things some of them will require you to change your application a little bit usually still worth it but the ones that are free are obviously better one of my favorites one I'm so surprised that this existed data type data type output for time based data types can be up to twice as fast that is 2x faster I was very surprised that low hanging fruit like that still existed in Postgres now this is basically the text output of anything that is time stamp date and time and if you had a scenario where you had a table with a single column and this column was time stamp with time stamp time stamp with time zone or without and you did a copy of that and just copied it to a RAM drive which is obviously something you do all the time that is actually slightly more than twice as fast you probably don't do that very often but what it means is in every single place where you're reading time based data using the text protocol it's faster if it's a lot of it then it's a lot faster and you don't have to do anything it just runs faster which is obviously a good thing we have a whole bunch of locking changes you could probably spend a couple of individual talks just talking about that other people do that but what it basically says if you have high concurrency loads where you have many sessions working in the database at the same time it runs a lot faster you've got a lot lower locking better multi CPU support for those by exploiting low level functionality in the CPUs we also have better lock tracing for example this weight event thing in PG stat activity you can see more about it so not only is it faster we can still monitor it and it's still faster this is a pattern that we've seen through postgres releases now that there are major changes to these central locking systems in pretty much every version because we're seeing more and more massively multi core machines and we need to handle more concurrent activity so we're getting there step by step and again you don't have to do anything it counts for free relation extension lock is a classic postgres issue where you would bottleneck if you're loading a table the thing is traditional postgres a single query can only use one CPU so if you're loading a large table what you do then is of course use multiple sessions and load in parallel and that didn't help at all because of the relation extension lock and what that worked was when postgres was appending to a table it would append one block of 8 kilobytes at a time so what happened was I'd run out a table space like oh I have to increase this table I'd lock it and add one 8 kilobyte block and then I'd unlock it and keep working which meant that all these processes loading data time wait for somebody who was locking to add one page the new thing in 9.6 is that when this happens the system will actually detect and see how many other sessions are currently trying to extend this table and it will use the magic constant of 20 it is magic, it works so if 10 other sessions are trying to append to the same table we will expand by 200 blocks at once so 20 times the number of people who are waiting the idea is we don't want to expand too much because if we just want to add like 10 rows we don't necessarily want to expand by a gigabyte in the table but we want to expand by enough that we don't spend all the time waiting on this lock and so far I don't know, interesting, 20 seems like good number you get in most use cases diminishing returns beyond that and then it starts becoming overhead instead but for most workloads this has actually just solved the problem of parallel load we're not pushing your problem now is that you are not spending enough money on IOP and that's going to be your problem forever we're doing a couple of functions that are related to each other both very very low level we're doing checkpoints sorting previously when Postgres run this background checkpoints what it did was it had just write all the data to disk in the order that it happened to be in memory and then it would flush it all at the end the new thing is it will now sort by table space then file node then fork and then lock meaning that it will maybe not generate perfectly sequential IOP but it'll generate a much more sequentialish IOP because we'll do one table from top to bottom then we'll go to the next table and go top to bottom so we'll generate a more sequential workload and this comes and this alone actually made things slower along with this second change that also made things slower but when you did both of them it made things much better and the second thing is dealing with kernel writeback policy who in here is a database with say more than 16 gigabytes or so of cache couple of you you've probably run into this the general idea is particularly on an untuned Linux system you would end up with Postgres writing the data to the operating system the operating system would happily cache it because we have lots of memory and it would cache a couple of gigabytes then you're like yeah I'm done now and then it would write it all out at once as fast as it could and you would get these wonderful IOP spikes where everything just stopped this is sort of the second generation we used to have a similar problem with our own writes a long time ago before we had spread checkpoints and now we just move to the kernel which has sort of the same problem we could configure a bunch of these around some of you may have hacked around with settings and these things in the kernel to work around the problem but the problem is that those are global they hit everything the new thing here is we configure this in Postgres.conf exactly how it works is going to be platform dependent right now it's only enabled for Linux but that is the majority platform there's no reason we couldn't do it for the platform it just hasn't been done yet and the idea here is we flush early we can write the data that we know we're going to write we write it earlier so we can spread it out over time so that we get control over how we write it out there's one big exception where this is not a good thing to do that if your workload is bigger than shared buffers but smaller than the operating system cache in that case it's actually better to just wait forever to write because you never have to read anything the problem with these IOP writes spikes is while the kernel is flushing all the data we can't read data because our IOP is busy but if our workload fits in memory we don't need to read data but in pretty much every other workload this is an advantage the default values for these are small so it says for checkpoints we write after 256k background writes to writes after 512k and the backend flush after 128k compared to maybe 4 to 8 gigs that you had before this means it's not going to be a storm are these numbers good? well we'll find out there are no tuning advice for these parameters yes there will be but it's not there yet we just don't know yet so if you have a workload where you used to have this problem please let us know if tweaking these work we have updates again to the Postgres4 and data wrapper it now supports pushing down joins so if you have two of these remote tables they're both on the same other Postgres server and you join them we'll actually detect that and generate the join statement on the other node and bring back the results previously we'd have to bring both tables back and join them locally it only does it for normal joins not like not exist anti joins and things like that we can push down ordering so if you have an order by query we can do the ordering and then send it over which means you can also use these to drive like a merge join which needs ordered data it can run the order by on the other node and then know it comes back ordered just better direct updates if you do update and delete we no longer first do we select for update and then immediately do an update we just push the update immediately so these are things that will just make it faster and then of course we have the big thing that everybody talks about about 9.6, right? query parallelism this is good for CPU intensive workloads because if you've been using Postgres for a while you know the general limitation single query, single CPU core you have your nice 128 core machine and we use exactly one of those for a query nobody has a single core machine you can't even find them anymore I guess maybe in the virtualized environment and even Amazon free tier I think has two cores but beyond that that would be the only case so the different thing now is we can use more than one core parallelism is made up of a lot of different parts and most of them are still remaining so we can't claim that Postgres is a parallelized database we can parallelize a lot of simple things and that can still be very, very useful at the bottom we can parallelize the sequential scan so basically if you're scanning a large table we can just assign four processes to scan your table and as long as your IO is fast enough that'll make things pretty much four times as fast and what we can do into these workers as we call them is we can push down things so we can say if you have a filtering function a warehouse with an expensive function we can run that function in the worker before we bring the results together same thing if you have a target function just the example of calculating in MD5 is something that scales perfectly so it will really be four times as fast because we push that work down into the workers for pretty much every function that is marked parallel safe so if you are using functions of your own if you want them to be parallelizable and partake in a parallel query you have to mark them as parallel safe before you do that you have to look in the documentation to see what that means everything is not parallel safe everything was safe we wouldn't require you to tell us that and the parallel sequential scan is also the foundation of everything every parallel query in Postgres 9.6 will start in a parallel sequential scan and then it'll do something on top of it Postgres 10 will have parallel index scans but 9.6 does not we can also push down aggregates into this parallelism and that's very good because aggregates are often CPU bound they do a lot of calculations what we end up doing is we do a partial aggregation in each individual worker then we bring the results together and do a final aggregation for some aggregates that's very simple so you're doing a sum well you can split the table in four do individual sums and then you sum the sums for aggregates that can be more complicated Postgres has parallel aggregates support for most built-in aggregates except for string aggregates JSON aggregates, XML aggregates and array aggregates it's really hard to figure out how you're actually going to do that in a parallel way just based on how the aggregates are defined and we also don't currently support it on ordered set aggregates that's probably a solvable problem it just hasn't been solved yet but for all your standard aggregates it just works they will parallelize and we can do parallel joints to a limit they're all based in a parallel sequential scan but if we take this big table we split it up in our four pieces these are not partitions as in partitions these are the four pieces we split the table into we can then have each one of them in a separate worker join to a smaller table for example and then we bring the results together in the end to the master process through what we call a gatherer and get the results back we can only do it for nested loop and hash joints again it looks like Postgres 10 will probably also have merge joints work in progress there are a bunch of other restrictions don't expect all your queries to suddenly parallelize there are many things that will prevent them from being parallelized some of them are things that just can't be parallelized ever because it's logically impossible many of them are things that are not written yet and it's something that people are working on there are a few things you need to do if you want to use parallelism it's off by default you set max worker processes that's the total number of parallel workers in your system that you can allow then you set the wonderfully named this is named by committee max parallel workers per gatherer it's a beautiful name there was a lot of bike shedding about the name of that variable but this is what it is and that says for each individual query how many background how many of these max worker processes can we use for each individual query actually setting that to one means that you can use a maximum of two processes for each query because one is the main process and one is the background process we still use the main process too and this one is obviously limited by the maximum worker processes but this will let you for example this one is a global parameter that you have to set and restart this one you can set individually per session or per user or things like that so you can say I want this thing to use parallelism and not this in theory once the Postgres Parallel Optimizer is perfect it will always pick the right choice we're not there yet it will sometimes pick parallelism when it's not a good idea and it may not always pick parallelism when it is so you may need to control these things I'd expect every new version to get better version just remember that there are a few things we can control we can set parallel setup cost and tuple cost this is how we assign these parallel operations a cost compared to the non-parallel versions if you set force parallel mode will parallelize your query even when we think it's a bad idea don't do that in your Postgres.com but this can be interesting if you're looking at a query and saying this should be faster parallelized Postgres doesn't want to parallelize it we're setting this parameter run it and see if it actually was faster if it still doesn't parallelize when you set turn that on that means that it's a query we just don't know how to do there are things we can't do you're calling functions that are not safe for parallelism we're not going to parallelize those otherwise if you set it we're going to do it even if we think it's a bad idea you can on an individual table control the number of parallel workers on it so by saying alter table you can say for this table you can never use more than this number of workers by default it will calculate it based on the size of the table but you can for example say I have this massive table but I really don't want more than one to use it anyway because I want to constrain performance of things querying this table then you set this table if you have your own functions again alter function set the flag parallel safe otherwise as soon as one of these functions is used the entire query will not be parallelized this is the documentation of what this means an alter function setting the cost you've been able to set the cost for many years in postgres with parallelism it becomes much more important to set the cost of your functions correct in order to trigger parallelism if you have an expensive function you want it to be parallelized more you need to indicate that to the system okay that will be enough about that one except I'm going to add one more thing so who in here has ever used Oracle a couple of you or a 1555 snapshot too old do you love it? that's the best thing about Oracle well we have it now so now we like it right postgres 96 does implement snapshot too old it's not mandatory don't worry about it if you left Oracle to get rid of snapshot too old you're still safe but you can enable it you configure it by time by setting old snapshot thresholds in minutes notice most times in postgres are in milliseconds this one is in minutes and it's there it will terminate all transactions in repeat or read or really long queries if you're in read committed and preventing bloat buildup basically setting well if you try to query data that's older than this number of minutes and has been deleted or updated postgres you're allowed to do that because we keep the data around as long as somebody wants to see it with this we just say now we're going to keep it around for 10 minutes beyond 10 minutes we're going to kill the person who wants the old data in order to let vacuum get through and clean up your tables if you did not have a problem with this before don't turn this on it's off by default there's a good reason for that but turn it on if you're having bloat problems you'll be able to see exactly how this will help obviously there's always a lot more things in every version it's kind of hard to cover it in just one talk lots of smaller things if there's anybody in the room who submitted the patch to postgres and I did not mention it I'm sorry there are many performance improvements there's no way to cover them all I have a few seconds left so I'm going to do my normal questionnaire all of the 9.6 features hands up for who thinks for your workload for the things you're doing that parallelism is the best feature of 9.6 what about vacuum freeze snapshot to old nobody multiple synchronous standby postgres 4 and data wrapper all things mixed together the weight in the look monitoring anything else and note you also have to tell me what it is okay cool I am out of time I'll be happy to take questions but I'm going to have to take them outside to give room to the next speaker thank you very much afternoon everybody ready for beer two more talks and then you can drink so today I'll be talking about common table expressions CTE's as they're known in the postgres community the magic mystical how many other people know them and there's there's really essentially three things that need to be discussed when it comes to CTE's first of all is what not to do with the CTE they are really good for organizing your code and unfortunately they're really really bad for performance when you do that so the second thing is that you can use CTE's to chain data modification in multiple steps so you can have an insert that returns rows that you then use the rows from that insert to do something else with you can build an arbitrarily long chain doing that and finally and this is going to be actually the bulk of the talk about recursive CTE's how many of you have tried writing a recursive CTE how many of you got it right on the first go wow okay you should come up and give the talk then because but yes recursive CTE's they can be quite challenging but there are a few things that once you understand a little better how the recursive feature is actually working under the hood it actually makes it a lot easier to to understand how to write those queries so first what not to do the problem with using a CTE to reorganize your code is that currently in Postgres CTE's materialize everything so when you have a query that query is going to run by itself completely independently all of its results are going to be written into a temporary tuple store and then that tuple store is going to be read from for the rest of the query this also has the effect that where clauses will not push down into the query joins will not push down into the query how many of you have used the offset zero trick so this is CTE's are similar to the offset zero trick in that regard and that they are an optimization fence but they go one step further whereas the normal offset zero that will not let the planner push things past that offset zero that sub select but the offset zero does not force the planner to run that query store all of its results and then hand those results off to the rest of the query that is what happens with the CTE so this is an example where you can see that there is a fairly non-trivial query this is obviously a contrived example so it's not super complicated and I had to fit it on the slide but I understand why people like doing this because you have this nice logical block here and you have this nice logical block here and hey, this is something to do with tables and this is something to do with columns and then the query itself just becomes extremely simple the problem though is that this is going to be completely materialized and then this is going to be completely materialized and then this will read from that materialization now there is another way that you can organize this code to make it more readable without having the same problem and that is as I was saying you get these sequential scans what does not show up here very readily is that the output of this is being stored into a tuple store so not only do you have these sequential scans there's no indexes being used but it's also materializing this stuff so the other thing that you can do is you can use subselects so we still have two independent blocks of query code so you've got some amount of segregation if you know where you're looking oh, as tables, as columns well that gives you some idea and of course you can always put comments in here to help explain what's going on and you know so this is not quite as nice looking code as the other example but it gets you most of the way there and the important thing is that now we're using index scans it happens to use a sequential scan in one place just because there's no point in using the index everywhere else it's using the index scans not only that but it's not materializing all that data so any questions on that as I've mentioned CTEs they store the results in a tuple store so you can think of a tuple store kind of like a temporary table but you don't have the option of putting an index on that temporary table the only way to pull data out of a tuple store is essentially to do a sequential scan there is a slight advantage to a tuple store though it does have less per row overhead than a temporary table post-guest tables they have a 24 byte per row overhead that you basically cannot avoid and that's true even on temporary tables even though a lot of that is more for MVCC which doesn't apply as much to temporary tables but the structure of temporary tables is still the same so a tuple store I do not remember offhand what the per row overhead is I think that there is one but it is significantly smaller than that so if you have a query where the optimal way to execute the query you've determined is okay I actually need to take this portion of the query and materialize it and I need this to happen first and put it in a store and then the rest of the query will operate off of that a using a CTE and making use of the tuple store can potentially be faster than doing that separately in a temporary table and then having a query at the temporary table but do keep in mind that with a temporary table not only can you create indexes the other thing that you can do with a temporary table is you can run an analyze on a temporary table so if you do that if you create a temp table you run an analyze on it the planner now has statistics on the data that is actually in that table so it can then potentially make a better plan choice down the road the second thing that you can do with CTEs is you can use them with data modification so how many folks have used returning how many folks don't know what returning is so returning in postgres when you do insert update or delete when you do a dml statement you can put a returning clause on the statement it will then return that data back from the command so for example if you're deleting rows out of the table if you wanted to actually move rows from one table to another you can say delete from table where and you put whatever your where condition is returning and you can use that then in an insert statement to insert the rows into another table with a single command so it's a very powerful capability in certain circumstances and you can do that without a CTE but what CTEs allow are you can actually you can do this chaining so in this example I just got a simple invoice and invoice item tables and I'm now going to insert a single row into the invoice table and I'm returning the invoice ID now that I have this information I'm going to run a second insert select from invoice so invoice is the CTE and I'm using invoice ID or I'm sorry I'm inserting an invoice ID I'm selecting the invoice ID line number item and line number item is just coming from this values clause down here but the invoice ID is coming from up here so this single command is going to create both an invoice and the items for the invoice and then because I have a returning on this entire command I'm then going to get the data back when I execute this so looking at the results of that we can see that I've created an invoice ID of five and I've got two lines one for pizza one for soda apologies to those from the north that call it pop I moved to the south so now I call it soda so that that gives you the ability to do these multiple commands in one shot the other thing to keep in mind is that when you use a CTE to do DML so I'm doing an insert here returning and in this case I'm just doing a select but I'm doing this select limit one so I'm inserting two two rows but I'm only selecting one and in fact I only get one row back but if I now look at the invoice table give me the max invoice ID I do get seven so both invoices were in fact created even though the select only returned one row so that is a difference in operation with the CTE is it's when you're doing data modification it is always going to execute the first query in its entirety no matter what because it specifically wants to make sure that your DML runs completely questions on the DML so recursive CTEs the critical thing with trying to write a recursive CTE is you really really have to think in terms of sets you in essence every recursive CTE you're going to start with a single set of rows and you're then going to union that with a select statement that is referencing the recursive CTE so the first time through the loop it's going to be referencing that first initial set of rows but after that it's going to be referencing what's being output continuously by the CTE it is really important when you're writing these to always there may be some cases where you don't necessarily need a where clause but usually you're going to need a where clause and you will find out that you need the where clause when you go to run the recursive CTE and it's just sitting there while it's just sitting there what's happening is it's generating and generating data it's putting all that data into a materialized tuple store and eventually it will run on this space but until then it's going to keep running because you don't have any condition within the recursive CTE that's going to allow it to terminate so anybody want to take a guess at what this query is going to output so the numbers between one and five any other guesses? the trick is that this where clause is being applied to the input not to the output so let's walk through this one number at a time so we start with select one and then we're doing the union the query in the union we're selecting from the recursive CTE so we started with select one this from is going to say one, I see a one well I'm going to return that plus one so now it's going to return two and it's doing this because when it ran the from sequence was less than or equal to five because we started with one so then we get two, then three, then four, then five so now we're down to this line so we're selecting we've got five where sequence less than equal to five well that's true so it's going to output sequence plus one, we get six now we run the from we take six, where sequence less than equal to five well six is greater than five so it's false and that's what terminates the recursion so the first time that your where clause doesn't return something it's a little bit more complicated but essentially the first time the where clause returns false that's what stops the recursion so this kind of construct this you know sequence less than equal to five it can often be useful with a recursive CTE to have something that's indicating you know some kind of a level something like that some kind of a very high limit at least then it will trip out on that instead of running forever and you're confused as to what's going on or worse it runs forever in production and runs your server out of space guess is on what this is going to return so here the only real difference is that we're adding two but again because we're pulling the data we start with one we add two we get three five we add two we get five that's less than or equal to five then we output five plus two we get seven well seven is not less than equal to five is that questions I want to make sure that you guys are understanding this because this is the critical part this is what makes writing recursive CTE so difficult is being able to understand what's going on in that second part so this is an example here this is never going to return notice the difference in where the where clause is so just by moving this where clause doesn't work no, no because there's no stack that's involved so prior to having recursive CTEs the way that you would do this is you would write a set returning function and the function would recurse well eventually there is a stack limit on how deep how deeply how deep you can nest function calls which by default I think is a thousand so in that case you don't run into this problem of well why is it just sitting there running because you will eventually hit that stack limit so this is just this is just what what's going on under the covers is you've got this recursive union and this is what's allowing things to work so again going back to the previous example previous example just putting the where clause in two different places you can see this in an explain there's a filter here underneath the recursive union here there's a filter but it's above that recursive union that's why this will run forever so I'm not sure I can't really think of where you might use this in the real world but you can actually have multiple unions as part of a recursive CTE so in this case we're starting with one and two in this addition the recursion only happens in the last portion of the union so this portion is going to be run once and then this is the part that's going to recurse so we're going to start with one and two that's going to be our starting condition for S we're going to have two rows now instead of just one row coming in and he guesses as to what this will output so we're getting 11 rows and another example here so this is the same exact query the only difference is I've added this which column that says well where did this come from so one is A two is B and actually yeah we're going to keep that data as we do the recursion so this should make it a little clearer why we're getting this 11 row output we start with one and two A and B and then we get two and three A and B four and five A and B now if I make a slight change to the recursive portion we get this output and now you can see how this is being built up so we start with A and B and then we have an A and B and then we've added C A and B and we've added two C's A and B and we've added three C's four C's and then finally five C's and he guesses as to what the change was that I made here so you can see right here all I'm doing is I'm appending a comma C so every trip through the recursion we're getting disinformation now if you're working on a you're trying to solve some more complicated problem using recursive CTE's this can be extremely useful to try and understand well why is this bloody thing not working this lets you get information on how is the query actually executed so this makes it a bit easier to see well okay why are we getting 11 rows I mean we start with two and then we recurse how did we end up with an odd number and the reason is because we've got the A and B A and B here's our last A and B we've got five and six and we've got five and six because A started with one and we've been incrementing every time B started with two we've been incrementing so on this path through A was four and B was five A is now five, B is now six we've added one more C we make one more pass we get one row because B would now be seven if we incremented it to be more correct now that I've just tripped myself up we were using the where condition sequence less than or equal to five so because of that where condition on this path through only A matches that condition by the time we're outputting this row the B the initial B that we're being fed in that's been incremented up to six so it fails the where clause so we get one row two rows and then we're done we can get in the homework and talk y'all should be excited it's just like school so this is you know this is an example that I wanted to put together that's something a little more realistic not necessarily realistic but it's at least dealing with tables at this point so and we do have we've got a good another 15 we've got 25 more minutes so we do have time to actually go through this if any of you would like to go through this on your own I will get this started we can take Q and A's and then anyone who wants to stick around I'll actually walk through this but what we've got here is this recurse table if the the example that always comes to my hand to my head of doing recursion is if you think of a web form like a you know bulletin board form where you can have nested conversation threads happening so that's essentially what we're sitting up here is we've got a surrogate key and we've got this parent ID and it just references the same table now of course you'd have a bunch of other fields I just didn't put that in because we don't actually need that for what we're doing so create the table and then I run three separate inserts to load data in there notice that we're using CTE's and chained DDL so question how many rows after this very first command how many rows do we end up with in the table we're doing an insert and we're returning the values we're returning the results of that insert we're then using that return as I we're using that in this select well that select is going into an insert so we've got this status command well that status is coming from this insert it's not coming from this insert so it's saying inserted one row that zero is if there were OIDs in the table please don't create tables with OIDs by the way so actually there's two rows in the table at this now at this point we're running a similar command so again I'm starting off I have this DML CTE it's inserting a single row I'm then using that row to I'm using the surrogate key that was just created in here I'm inserting that into parent ID the only change is I'm now using generate series to create three rows so when I run this and then I've got the returning just so I get the data back so recurse ID is starting at four from the data that's coming back better with two rows in the table so we would have had one and two in there and now we're starting with four anyone see where the extra row is coming from because this insert is being referenced in this insert and remember we're doing the returning here so it's being referenced and it's being referenced recurse ID is being used as parent ID well notice that the parent ID is three for all of these that's the extra row but you don't see it in the output because we're only this insert is only creating three rows does that make sense alright so I'm more than happy to keep walking through the homework there's another six slides if you'd rather go through it on your own if you're now just sick and tired of CTEs and you know them or want to hear about them again if there's any questions I'm happy to take them now that is where this presentation is at if you do want to work through it on your own later did I do that good of a job or is everybody just no well I'll keep going please if I've beaten it into your head enough feel free to take off I will not be offended another insert and we're doing the same thing and again we've got this inner insert in the CTE that does not show up in the output but we can tell that it inserted ID 5 because that's what we're using for our parent ID so at this point we now have actually ten rows sitting on the table so now we now we want to actually recurse through this data so what is the thought process that I use to start creating the CTE well the first thing that you need is you need your initial set of so the query that I'm writing here I want to just start I'm just going to run through everything in the table so my starting point is going to be from what well from the table but that's the critical bit right there where parent ID is null yes sorry I probably should have spelled it out as level yes that is an L so I'm selecting from that table that we've created and populated where parent ID is null so the very top level if you think of the first post on a thread we're going to have a parent it's the first post on a thread so we've got our initial set we're going to do our union what comes next we're going to be selecting certainly from L and we know that we're going to be pulling data from the table itself so I know that I need to select from the recursive CTE and I know that I want to join back to the table because ultimately I want the data coming out of the table so now I've got my from and I've got my join that was that's kind of the easy part now I need to decide okay what's the join condition supposed to be so I'm trying to find up here I've got my top level stuff I've got the first post the ones that have no parent well down here I want to find our children so the way to find the children is hit the recursive table where the parents in the table is equal to the recurse in the CTE so I'm starting with parent null and one of the fields I'm getting back is the recurse ID and now I'm using that recurse ID to say hey I've got this recurse ID give me all the rows where that is their parent I'll take a second to also mention some of the other fields that I've added here I added one as level so I've got just a counter essentially so this is going to tell me hey these are the these rows they're the top level they came their level one pardon yes yes next time I'll make sure to find a font that just uses the same glyph for both so it's even more confusing yes that is a one that is an L so the other thing that I'm doing is I'm saying okay I'm going to keep track of who the top parent for each child post is and of course at the top level it's just the recurse ID so this is just the parents so now so I've got this select clause figured out and by the way I did kind of iterate on this I started off with just the star and then I added stuff as I said oh hey I can also do this so this is how we then go and reproduce this clause down here now the recurse ID and parent ID that's I was lazy I used star don't do that but the recurse ID and the parent ID that's just duplicating what star is doing but notice that these are coming from the table they're not coming from the CTE they're coming from the table level plus one so every time we run through the loop we're just going to add one and then top parent this just stays the same because we don't we always want that to be saying oh hey here's the parent that I came from okay so once we've got that and we've got it written go run it and this is what we get and we can walk we can actually as it happens this query and by the way you cannot depend on this this query happens to be giving us output in the order in which the CTE is actually executing if oh the part that I haven't shown is the actual select but it's just the plain select from I'm just doing select from L if that select from L was doing something more complicated like maybe I was grabbing this result and then I'm joining to something else the optimizer is free to choose a different query plan and these results may no longer come back in order so if the ordering is important to you that's where level comes into play you can also you can do a row number so there's a a window function I think it's called rows but it will give you a sequential number is it row count you could do that here it's going to give you a count but then doing it down here I think you'd have to do the row count but then you'd need to add the max from L so you'd need to find the max row count that you originally created and then you add that to the row count here because I believe the row count is going to start over from scratch every time but if you did need to do something more complicated like downstream and you wanted to know what the execution order in the CTE was that's that is one way you could do it the other way that you could that you could do it and I'm fairly certain this would just flat out work because you can create a temporary sequence and then you can just call nextval on the temp sequence in the recursive CTE and every time it hits the sequence it's just going to get the next number and that will tell you what the ordering of rows being generated in the CTE was in our case though none of that's a concern so we start with the top levels and the recurse IDs for the top levels now if we go back so that parent's ID is 5 that parent's ID is 3 and then the very first one was 1 so the first one was 1, 3, 5 oh this is just I think this was just a left over one but these these are our top parents so they have they have no parents themselves and then it just started walking through the CTE so we get the top parent of 1 we get its single row and then we get everything for 3 but you'll notice that we have some level 2 so we have 4, 5, 6 and then we have 8, 9, 10 well 8, 9, 10 their parent is 5 and 5 is right here 5's parent is 3 3 is up here so if you you can use this to you can study this to create your own to see how was the data actually flowing through the CTE and that is all I have that is the last slide so any questions well thank you very much for your time there's a reason that we have documentation exactly thank you you know one of the things you mentioned was the fact and the person one of the previous keep in mind that doesn't play for CTE that's only for function I think I'm just confused one of the previous presenters mentioned something about being done iteratively anyway and not recursive the right so I'm going to read up more on that yeah thank you okay are you done? yes great testing 1, 2, 3 testing 1, 2, 3 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 yeah so huge crowd last talk of the day woo hoo so I'll get 30 seconds before we start but I'll get started and maybe we can all get to the bar a little early so we'll see how that goes my name is Scott Mead I'm a senior architect with OpenSCG the open source consulting group we do a lot of work in and around Postgres and what I'm here to talk to you about today is strategic auto vacuum strategic auto vacuum if you've been in, on, or around Postgres at all you've heard a vacuum and you've probably cursed at it quite a few times and you've probably heard of auto vacuum and really cursed at it very hard a couple of times so what I want to do is walk you through the auto vacuum I want to give you some of the basic understanding of what's going on in the system why it's so important and then I have a strategy that's really going to help simplify your life when it comes to working with auto vacuum so why do we have to vacuum what in the world is it now to do this the preamble to the vacuum discussion is like a three day training class in postgres so I'm going to try and do it in 15 minutes we are going to gloss over some stuff and there's a couple of you in here who know a lot more about it than I do and there's going to be things that I intentionally kind of gloss over but the point is I want to show you why vacuum is so important and that's just going to help give you the understanding and as we get to tuning auto vacuum and kind of procedures for it you'll get to see some lines so why do we have this vacuum thing at all it's something that DBAs deal with on a daily basis in postgres and it throws a lot of variability into the system so why in the world do we have to vacuum so everything that postgres does all our SQL statements are in transactions it's a transactional relational database it's an acid compliant database and it's a tool that each transaction must follow in order to be acid compliant so if is there anybody that's not familiar with acid so Adamicity Consistency Isolation and Durability basically this simple transaction right here inserts two people into my little users table it's either both of those are going to go into the system or none of them we're going to be isolated from other transactions we're going to be consistent and once we get to that end or that commit it's going to be on disk unless we've configured postgres to be irresponsible with our data if we crash right after the commit returns will be good so the acid properties apply to each of the transactions you can actually tune them there are some tunables for it postgres implements this with basically the way current transactions concurrent things happening and the way that MVCC happens that's actually what causes vacuum we guarantee you are acid compliant if you look it's a really cool thing is that if I'm writing data and somebody else is reading that data simultaneously we're not going to block we're not going to prevent one another from doing our work simultaneously and the way that we do that is actually what causes us to have to do vacuum so when you do an update or a delete we're changing the data in the database and I'm reading that concurrently so how does postgres actually deal with that right? this is one of the more complicated parts of any relational database system so postgres to illustrate it let's create a little users table very simple little one we'll put three rows in, we'll put myself we'll put my dad and we'll put Sally Smith solid sweetheart Sally Smith so when we get into the table and you can actually run this query a lot of people don't know is that you have a number of hidden columns on every single postgres table there's more than just this but you have to ask for them explicitly if I say select star I'm not going to see the X min and the X max but if you ask for them postgres will show you so a couple of definitions here and XIDs basically your transaction ID in postgres for the most part when you see an X it means transaction usually so things like the PGX log is the transaction log, XID is the transaction ID etc etc so the X min, what is that basically the minimum transaction ID at which this row was alive in the database let's talk about it it's the XID that placed the row in the database so in our case when we did our little insert statement here you'll see I used a multi value insert statement it ran in the same transaction and they all went in kind of like a bulk load so they all have the same X min the X max is the the transaction ID of the transaction that that row or made it dead then the X max will get populated so what does that actually look like let's do the marry me transaction right so we're going to update users we're going to change the last name of Sally she now has the name of a paper company but we're not related so none of the money associated with the mid paper company and this I just used a standard query so what happened the X min which is the transaction ID of the transaction that put the row in the database gets changed to 1,000 that's when Sally's update took place the X max is still zero or null which means that these are all live rows and we don't see anything else great that's what we expect from the database you don't even think about this kind of stuff but here's the monster lurking let me tell you this a little bit the old row is still there so in Postgres it's a non overriding storage manager that old row is actually still in as a rollback data so in Oracle you can be using undo segments or something to store this rollback data in Postgres it's actually stored in the main table you can't run this query and see this old data the point is that data that 500, 1,000 actually see it this I had to make in a text editor I do have a patch I lost it years ago that one of the core team members gave me that you could flip on and all of your data is completely inconsistent and the database is unstable and everything else but you can see this I believe it was only like three lines of code that's kind of close the idea is what's happening here the database is basically saying the transaction ID 1,000 every transaction that runs is going to look at these X-Men and X-Max think of it this way the X-Men so the X-Men is the minimum transaction ID that I need as an operating transaction the database to see data the X-Max is the maximum transaction ID that that is available to so if we put some concurrency on a little timeline here transaction A, let's take a look what are our transactions we're going to read all users transaction A is just select star from users and transaction B is going to be the Mary Me example so we're going to go from Sally Smith to Sally Mead right so what happens when we do this now we've only got three rows so is it really going to take this long? no it's not, this is going to be a fast transaction but if we're talking millions of rows this is where you would see a play out so transaction A starts I'll see if I got my laser pointer working so transaction A starts here and that's when I start select star from users and it's going to start reading it when it starts, its transaction ID is 600 and so when it looks back at our table it's going to say okay I can see all rows we have an X min or less than or equal to 600 and an X max that is less than sorry that has an X max that's visible that hasn't expired yet so when we look at what's going to happen we're going to start reading with an X ID of 600 when the update runs it gets a transaction ID of 1,000 which is greater than 600 so that transaction can read everything so we basically can't see it we still see the original data this is how Postgres manages multi version concurrency control now again I know I'm going fast but the point of this talk is in multi version concurrency control the point of this talk is vacuum so after all that we talked about the internal database why do I have to vacuum well that Sally Smith that original row is in the main table nobody can see it as soon as we move off of its transaction ID it's completely invisible you can't see it unless you open up the blocks in a hex editor and go digging for them and they're taking up space they're taking up time they're part of the index or everything else so it's fragmentation it's blow at this point it's extra data that I don't need laying around so a delete will do the same thing it just doesn't create the new row so it leaves that old slot laying around by the way I didn't say this and now there's some more people here thank you but if you have questions please ask I really hate being a talking head and when I said that I realized I had my TR on this is a surprise alright so it's a dead row so why are we vacuuming well vacuum takes care of these dead rows for us when it marks them and it can make them available for reuse later the other reason that we need to vacuum is something called transaction ID wrap around so we're using these transaction IDs every single XID picks it and I do them very simply as I start at zero it actually starts at three I believe is the first available transaction ID and it goes on it's a counter but wait a second how big is this counter well the problem is it is an unsigned 32 bit integer so you've got four billion transactions until it has to start again and if we don't protect ourselves against that if we did the update of Sally Smith at transaction four billion and then the counter wrapped back to zero I can't see that anymore I have complete and total corruption and failure and my database is useless so Postgres has a method for dealing with this it's very complicated under the covers but what you need to know about it is you need to do a vacuum freeze that takes care of it for you so vacuum is the reason why we have the vacuum is because of the way the storage manager works for deletes and updates it leaves the old data there and to deal with our transaction ID wrap around issue if you've ever run into transaction ID has anybody ever had a transaction ID wrap around problem? couple of you it's really nasty the database stops accepting connections you have to shut it down load it in single user mode and vacuum and you usually have five levels of executives right here staring at your keyboard while it's happening there's no progress bar on that you don't know how long it's going to take so making sure you don't hit it is important so what is vacuum so it does all this stuff but as a DBA I love free space systems administrators DBAs we're doing storage we deal with data and expense and everything else I love free space because I don't have to ask for more I don't want to go back to the opportunities I don't want to go back to the disk and allocate it free space is great too much can be a problem I mean this one row in a table but the table is reading as 150 gig that's a problem when the row is a few bytes that's a big issue that's a little too much free space even for me but we do want free space because as we're doing really high transaction volumes we don't have to allocate it again so vacuum finds these dead rows that are no longer visible to anybody and it marks them, it tombstones them I don't remember the guys that wrote the paper on this but the actual paper for the algorithm itself refers to it is tombstone it finds the tombstone row and it marks it so that future transactions future updates and inserts can actually use that space it uses something called the Flying Spaghetti Monster the FSM the free space map so every table and index in Postgres has a free space map it's basically a small bit map that says this block has free space so future updates don't have to go back to the OS and get more we can take that fragmentation and use it and that usually if you look at databases you kind of load them up you get to a certain size and then as you start doing updates the size will stop growing it will kind of hold steady and there's always going to be a little bit of bloat and fragmentation in there especially for update delete workloads but that's good, that's healthy because you're not going back to the well the other thing that vacuum does it's a special operation special mode, it's called freeze when we do a freeze that's actually what fixes up our XIDs so that when we hit 4 billion and change we don't lose all of our data the actual way it does this it's mindbending it's not mindbending but essentially there's a special XID called the frozen XID so I told you that but we don't start counting at zero the first transaction ID I believe is number 3 integer 3 and 0, 1, and 2 are reserved for special things and I believe the last time I checked the number 2 is the frozen XID so when I do a vacuum freeze I read every single row and that Xmin value gets set to the number 2 so when I wrap from 4 billion I wrap to 3 hey, 2 is less than 3 I can see the data Postgres protects you because if you get to that wraparound situation it shuts down so you don't lose your data but you do have to do maintenance to make sure that you can get the system up and query it again, I know I'm going really fast through the transaction ID these are a little more complicated than that because they use module 2 arithmetic so there's always a certain number of transactions in the future always in the past and then in the middle there's this weird dead zone and all kinds of stuff of time with you guys, so we're going to skip over that but what you need to know for vacuum is freeze is the magical way to save your job because transaction ID shutdown is usually a resume generating event and we don't like those so we've talked about vacuum we've talked about the fact that when Postgres does updates and deletes we're fragmenting the database, we're creating this dead space for these old rows we've talked about how vacuum finds them and makes them available for new data, that's great we're also going to talk about transaction ID wrap around, that's great and in the past way in the past, I'm getting old now but back in the 8.1 series I don't remember when auto vacuum got flipped on by default, you're basically responsible for vacuuming yourself there was auto vacuum but it didn't work that well, it was disabled by default and there wasn't a lot of tuning for it so you were basically responsible for cron jobbing a vacuum yourself for every single table and every single one of your databases now we have auto vacuum so what is auto vacuum? this really what we're here is to be strategic about our auto vacuum strategy well auto vacuum is basically a pretty lightweight wrapper around the vacuum command it's a background process that runs when Postgres starts up and it can actually kick off a certain number of variable number of vacuum tasks for you and it runs them as we go there's a lot of configuration available auto vacuum does use some algorithms for figuring out which tables need to be vacuumed it's not the most intelligent thing in the planet but I'm going to show you what it does and how we can kind of tune it tune with it why am I doing this talk these are the things I hear about auto vacuum all the time people are like man I saw this auto vacuum I was running for like 5 days I didn't know what it meant so I just killed it like oh that's not good a really common one people will say hey my system started running slow all of a sudden I do a lot of updates things started to slow down that was awful we had an outage where none of our queries were returning so I just did a dump and restore I dumped the database, deleted it, restored it for 3 months well that's not a great strategy it actually is a good strategy for smaller databases I'll use that because you don't have to deal with any of this vacuuming stuff but it really stinks in general so these are the kind of things you hear this is what makes vacuuming auto vacuum very frustrating you don't want to disable it the big thing is if you're not vacuuming frequently enough with that XID wraparound it's really bad so you do want to be able to work with it we don't need to be scared of it because it really is our friend it's maintaining the database it actually it uses usage metrics but it also gathers new metrics so as my tables grow and get bigger the optimizer is using statistics the histogram about how big the index is what's the cardinality in the count and everything else and keeping our stats healthy that's important it can be throttled I'll show you the throttling mechanism it's kind of its own little Goldberg machine it's kind of crazy but it works and again it'll do that freeze for you as you need to a couple of misconceptions about vacuum it doesn't actually take locks so vacuum in general is heavy it's using a shared resource so it's going to beat up your disks but it's not locking anything it could slow stuff down there is one mode that we're not talking about which is called vacuum full that will lock your database and kidnap your children and steal your car but auto vacuum doesn't do that it can't kick that off it can however prevent others from taking exclusive locks so if you're doing a vacuum and your application workflow depends on an explicit lock table or if you're doing a vacuum and you want to alter column or alter table add column something like that auto vacuum running on the table will prevent those things from happening but it's not going to block it's not going to lock out your normal everyday traffic what we're not talking about again is we're not talking about space reclamation if we do a vacuum on a table as Postgres is getting more modern in 9.6 especially there are some optimizations where it actually can in certain cases I don't know them all but it can actually make the size of your table smaller but in general auto vacuum doesn't do that so back to the example of a one row table a table with one row that's 150 gig it's not going to be able to reclaim that space that would be a vacuum full or PG repack PG repack is a really great module if you used it and you're using Postgres you should it saves the day and can online repack your stuff without locking it's really nice so what is auto vacuum it's a DBA in a box it's basically what would we do or what would a DBA do if they were tasked with manually vacuuming the whole database you're going to look at your busy tables you're going to say who's the busiest who needs to be vacuumed right now so what is it that auto vacuum is doing we have a couple of defaults I think they're extremely conservative and this is normal with Postgres in general it's very very conservative default by default it doesn't even listen on the network for connections right so what do we have we've got the vacuum threshold this is basically the number of changes updates or deletes that have to take place in a table before the table is going to be considered for vacuuming so if you have a table that receives one update auto vacuum is not going to consider it until you hit the XID wraparound case it's just going to ignore it completely so if I want auto vacuum to be active on a table that's what's going to start it and the scale factor it actually says okay it's 50 plus 20% change of the table so it's not quite as low as 50 but these two things play in once when you configure these that's what kicks it off and then the cost delay of 20 milliseconds I'm going to show you how the throttling works but this is actually how auto vacuum is throttled this vacuum cost delay basically it says do some stuff and then sleep for 20 milliseconds so if you set 20 milliseconds it's going to throttle it pretty hard it's going to be slow if you set that to zero it's going to just go right at the disk and vacuum vacuum away so there's a lot more configuration options we're going to get to them in just a minute but what kind of things do we configure with auto vacuum when to vacuum I just showed you how many changes to a relation have to take place before a vacuum how to vacuum the number of simultaneous vacuums I can run this is a really nice feature instead of auto vacuum doing one thing at a time it can do N and I get to configure that the rest of the vacuum are throttling and then the most important one because I'm talking last in the day, the last talk of the day I felt that the last slot was the most important but logging logging and monitoring it's not only key to the algorithm that I'm going to show you but personally I think it's the most important thing you can do so all these are highly configurable in auto vacuum there are a lot of other settings take this query and these slides will get posted online you're wanting to actually start playing around with this you'll find them but this query will actually show you all the settings that your instance has available for vacuum and there's controls available, things that I haven't shown yet of when to analyze when to freeze, how much memory to use again there's a lot of throttling stuff that we'll talk about so again when to vacuum we talked about these right 50 updates or deletes to the table and then we're going to scale it by 20% of the total relation that's the number of things that have to change before auto vacuum is going to put a table in the hit list as it runs so it basically wakes up once every 60 seconds by default it looks and says show me the list of tables that have been updated and it says has this one 50 if it is then great, I'll consider it for vacuum in the way it goes how to vacuum default is 3 which seems a little low but you have to be very careful with that number and then the nap time how frequently do I wake up and look for the list of tables why do I have to be careful with that number 3 remember vacuum is actually scanning your tables so it's going to start up it's going to say I have in my list I see that 1 billion row table got 10 million updates since I last looked at it I'm going to go scan it and it's going to start reading the relation optimization and postgres so it doesn't read every single row every time it runs but you are hitting the disk and what's going to happen you're going to throw the buffer cash out of whack you're doing a lot of things when it happens so the number of workers if you've got a lot of people set that to 10 say hey I've got a thousand tables or a hundred tables I'll set that to 10 they could all be running on very large tables at the same time so you do have to be careful with that and then the nap time is 60 seconds so I say you have to be careful okay I'm not to that slide yet so vacuum isn't free talk about the throttling so when we throttle vacuum we don't have a gas pedal we don't have a twist grip we have a twist grip a pedal a wheelie gig and a beanie at the same time we've got millions of different configuration options for it you really get to see what postgres is doing down at the core essentially it uses a cost based throttling system so what happens if vacuum is running on a table and it finds that data it hits it it's in memory okay that costs me one unit one widget one unit if I'm vacuuming and that page is a miss that's going to cost me ten units so if I do one page is in memory that's great that's one the second one is ten that's eleven that I'm up to it's accumulating it starts adding every single page together and then the cost of a dirty page a page that's in memory but it's been modified is twenty because I gotta make sure that we're consistent and we're safe so that's a very expensive operation you can change these so if you've got some wicked d's underneath you could lower the cost of that make it less if you've got some horrible nasty 5000rpm data drives you could increase the cost of these and kind of mess with the algorithm a little bit but as we're going along what do we do so I'm adding these numbers up I get to ten I get to eleven I get to fifty-two whatever what do I actually do with that once I hit once I hit two hundred which is the default so I've done something I hit that number two hundred then I'm going to sleep for this many milliseconds so vacuum is going to be operating it gets to two hundred okay now it's time to take an app for by default twenty milliseconds that's very very throttle you start dumping this value down and your auto vacuums are going to run faster now just as an anecdote I've done going from twenty milliseconds on vacuums that we're taking two or three days you drop that to zero and it will run in like an hour so it's a pretty significant throttle it really depends on the other workload and everything else that's going on but it's a that's why you've got so many configuration options for it okay so that's that's how we throttle it logging again super super important the min duration so basically what this setting says is when an auto vacuum has run for this many milliseconds I want to write a record of it in the log I always set this to zero log every single vacuum that you do it's not that much data in the Postgres logs it's not creating a whole huge log stream and the data there is invaluable it's really nice if you've never used PG Badger it's a really great tool for analyzing Postgres logs it picks all that stuff up and shows you graphs and charts and averages and everything else it shows you the tables, how they've been vacuumed so having this data is well worth the few extra bytes or kilobytes that you end up with in your log after a month and personally I think it's really important and that's off by default it's set to negative one which means log nothing for auto vacuum so just go immediately log into production and set it to zero trust me it's the right thing to do okay so we've talked about questions first of all because we've been talking about different options that we have but we haven't actually talked about the strategy yet that's the whole point of this any questions about any of these tuning things that we've talked about this is a good question just for the video the question was with that default of 50 don't you have to be careful about system resources because you have a lot of system resources and it seems like it's a very static number so yes you do and actually as we talk through the strategy you'll see how to not just set a static number but to pick the right number so that's one of the bigger tuning ones just give me a few minutes if I don't answer your question make sure to bother me again but I think we're going to answer it here for you any other questions before we move on okay so what's the magic strategy the four step plan it was vacuum again basically start with the default if you've never, if you've got nothing going on you don't have anything set just start with the default the default is the starting point you want to monitor your workload and there's a script in here there's a million scripts to do it the Postgres system catalogs because vacuum needs it tell you exactly how many inserts updates, deletes, index selects everything you could ever want to know about your table it's all there and you can analyze that and I'll show you some query support and you guys can pull this deck later but you want to monitor those workloads and look for, basically sort it by the number of people that are doing the updates and deletes at that point I've never adjust the globals to start, I never do I always start per table so I look at my workload analysis and what I've found I've done hundreds of health checks with Postgres databases I've looked at tons of them and there's a pattern that develops and usually you have between 3 and 10 tables that are the top 1% of your traffic and then everything else lags by orders of magnitude it's just the way that systems work for the most part if you've got something where every single table in your database is taking all of the update and delete traffic then I'm sorry sounds horrible but we're going to go per table so we're going to start taking these high flyers I call them and we're going to start adjusting the thresholds for not 50 let's make it more let's adjust for each table once we've made those adjustments per table I'm going to go back to monitoring the workload again but what's really critical now is to keep an eye on my vacuum behavior and the logs as well as my bloat behaving with this table as it's getting all these updates so I've done one round basic workload analysis make some adjustments based on that and then I'm going to go back watch that workload make sure that it's still matching up and then see how my bloats behaving and start fine tuning from there so I'll go back adjust it further and then start over and we just kind of do this and it takes a while but after you get into it all of a sudden you're not worried about the great thing is is once you get this down into a science and you look at those auto vacuum reports hey do I have any long running vacuums if you see one you can usually tie it to like a new application deployment oh they updated the app and now they do 10 million updates where they did zero before usually that's what you see so number one monitor your workload how we do that so the big reason to do it again we want to single out those heaviest modified tables I say they monopolize auto vacuum if you've got a table that gets three updates a week and you've got a table that's getting thousands of updates a minute then that table that's busy is kind of it's always going to be stealing vacuum time it's going to be getting vacuumed a lot it's going to be spending a lot of time in vacuum your other little tables are going to get ignored and they're actually going to be the ones that cause your XID wraparound problem because they're going to be ignored so it's important we want to basically take those guys and treat them special you get your biggest client they always get special treatment why not our biggest tables too so we're going to go through we're going to look at our table report and we're really going to look for the high flyers it's usually typically it's an order or two magnitude as far as the updates and you'll actually see it with PG Bench in a second once you do that you say okay there's a couple of strategies to deal with I could say I have one table that is a hundred or a thousand X busier than every other table in my system what's the strategy for dealing with that I could go from three auto vacuum workers to four just by increasing at that one I'm not really dedicating but I'm kind of dedicating an auto vacuum worker to that one table and then it can deal with it that might be a simple strategy so maybe I want to increase to say go up by the number of large ones there this is the one that usually gets people a little bit the other thing is to increase the vacuum threshold for that table so if I've got one table if I can see very clearly in my workload analysis it does a thousand updates I could say okay that's pretty consistent why don't we move that threshold from 50 to 10,000 or 20,000 it's so busy that we're going to spread it out a little bit increase that threshold just for that table it's going to basically it'll get vacuumed as it needs to be but it's going to give auto vacuum time to deal with all the rest of the tables in your database so it gives you some breathing room so so then we're going to go back to monitoring so it's like okay we're going to say I'm going to adjust this one table he's not going to get vacuumed as frequently or this table she's going to get a dedicated auto vacuum back end basically so then I'm going to continue my workload monitoring the next thing to look for is tables that aren't getting vacuumed this is explicitly in the next step but I want to say hey that's a good question from auto vacuum so I've looked at my high flyers and now I'm going to say let's just say 7 days it's an arbitrary number but it's a week we operate on business weeks the database is kind of follows for the most part so let's just say hey show me all the tables that haven't been vacuumed in the last 7 days those are the ones that are being completely ignored we have to figure out why they're being ignored and optimize the algorithm so that they get vacuumed the other thing we're going to do is look at the logs we want to see from our PG Badger reports how long are they taking are the auto vacuums behaving as they should are they taking too long are they too frequent if you've never used PG Badger do it it's amazing we're also looking at bloat so I might have a smaller table but it's getting updated so frequently bloats getting out of control and we want to watch that and long running auto vacuums the other thing you can do the easy thing is just watch the IO subsystem we want to see hey am I hitting the disk too hard and does it correspond with some bad vacuum behavior because if it does then we need to tune that somehow and then we're going to start adjusting so we'll say okay let's take a look at our workload analysis and everything looks good our thresholds are okay I'm not really pushing the IO subsystem too hard so my table I moved it to a thousand updates before it kicks off and my vacuums are reasonably they're happening quick and I don't have a big a lot of IO taking place on the system maybe I want them to happen a little faster so lower the throttle for that table now open it up a bit let it go give it some more gas a little quicker and then you go back you watch your IO and you see how auto vacuum is behaving if you notice that at that point oh man my big table went from being vacuumed as frequently as it needed pretty frequently to now it's getting ignored lower the threshold you went too high so dump it back down that's kind of the basic strategy super ultra high flyers so if you're at like 10,000 or 100,000 or a million x the number of updates over everybody else they do need special attention and usually you're going to pull them out of auto vacuum and do it manually with a crown job or back to something else that you've written the best man at my wedding has a strategy for this it's Vlamph vacuum like a madfellow it's just vacuum constantly and what you'll see is there are some tables where hey if I vacuum them if I vacuum it back to back to back the vacuum on this table takes 5 minutes great no problem if I miss a sweep for 15 minutes now that vacuum could take 2 hours you've seen that kind of behavior so basically once you've gone through a couple of rounds of adjusting those thresholds you're adjusting the number of updates that it takes before vacuum will take place on a table and you're adjusting the delay the cost delay once you've gone those rounds and you can't really get it to do what you want it's probably time to look at an outside strategy for dealing with that one table take it out of the list let the rest of your tables be managed with automation and just put that sucker on a crown job to do its thing and when you're talking about crown jobs I've got business California time ends at 6pm California time so at 8pm I'm going to start a vacuum on that table and let it run until the morning the nice thing about vacuum is if you start a vacuum and you kill it after some period even if it doesn't complete you're not losing everything so you can do that if you've got a long running vacuum and you cancel it you don't want to continually cancel it but you can cancel it and you're not completely losing the effect of what was going on other tables logging tables I really hate logging tables it depends on the logging table if it's pure insert woohoo I don't have to vacuum it you do for XID wrap around but it's not going to be as big an issue the problem is most logging tables have a purge associated with them making millions of rows a day and then once a day you delete a ton of rows now you've got all that extra space that needs to be vacuumed it can cause some problems so how you deal with that partition if you can for logging tables if you get them into partitions instead of using a delete statement which creates all this churn and all these extra dead rows and everything you can just truncate from 90 days back I could just drop the tables and they go away, no vacuuming required it's a great easy way to do it so it's beautiful so that's how you kind of do it so again, we've gone through let's look at the strategy just one more time here so from a high level monitor adjust per table monitor your workload vacuum and blow and then go back to do some more adjusting and throttling once you've gone through this a couple of times then it's time to start looking at things like logging tables, it's time to look at your really high flyers and move them to separate jobs but by the time you get there you're going to find that you're a lot less worried about vacuuming in general so that's the strategy, that's wonderful it's all sunshine and rainbows that's my terrorist says but how do I actually do it so I'm talking about all this monitoring so remember vacuum or holes are created in the database fragmentation is created for update and delete so it doesn't do anything to happen for select it doesn't happen for insert so what I'm really worried about is delete traffic and update traffic so how do I do that so take a look at, there's a couple of tables in the database PGSTAT all tables this is my second favorite table in all of Postgres it shows you the number of tuples inserted, the number of tuples updated deleted, and there's a lot more rows here it shows you selects, index selects sequential scans, everything against the table you could ever want to know but this actually gives you the data that's important now the other note that I make here there's actually two catalog tables there's PGSTAT all tables and then there's PGSTAT user tables which is a view on all tables what's the difference? so the PG catalog table PG class, things like that why does that matter? because every time you do an alter table it's an update to the PG catalog if I create table drop table, any of those operations do updates and deletes in the Postgres catalogs and guess what they're not special they are still, they're taking blow too so my favorite was the guys that were doing transactional DDL at the rate of thousands of tables per second they were creating and destroying and their catalog, their PG class table was a terabyte they didn't have that much data in the database but their metadata was that huge because they were going so fast so don't ignore your catalogs, it's important and again I'm sorting here by the number of updates and deletes added together, because that's what really matters so when you get that data set back we take a look, I did some PG benching just to see what what was going on and our our PG bench tables, again this is the standard and yeah, I had an empty database and it was quiet, I wasn't doing anything else but this is very common to have somewhere between 2 and 10 tables that are orders of magnitude busier than everybody else so you want to look at that and then start making your adjustments based on this what I would know, one of the things you'd say okay, I've got three tables here I've got three auto vacuum workers maybe I just need to go to four auto vacuum workers to take care of all the stragglers let those three get their thing, right that's one of the ways to deal with that so again, we can that's what I basically just said so how do we actually modify maybe we want to adjust the threshold maybe the number of workers, whatever you can actually, instead of going in and doing it globally it's a storage parameter and you can say hey, auto vacuum threshold for this table is 1,000 and now you've not had to modify the global and you just modify that one table take a look at these alter table storage parameters and the docs this is how you do all that per table it's kind of nice, it's easy to do you can see it inside the database it's not in config files, right and then again, you could add another worker or two or three right because the threshold if I make the threshold too big and vacuums don't run frequently enough when a vacuum does run, it's going to take a long time because I've done a lot of updates it's going to take longer it's going to take longer it's going to put more demand on the IO system and you just got to be careful with it and the number of workers, it's just the number of concurrent vacuumers that are going to suck up your time I only have five minutes so I'm getting close here but what are the things you want to monitor again, workload table is not vacuumed within seven days the logs, the logs, the logs please look at your logs while I'm running on a vacuum and blow so how do we do that so for the workload I already showed you this query look at all of the data from our tables how do I see tables that haven't been vacuumed in seven days how is that even a thing it's not gorgeous but you can pull it right out of my second favorite table PG Stat user tables or PG Stat all tables it actually gives you the time stamp the last time the vacuum took place it gives you a counter the number of times it's been vacuumed there's a lot of stuff there your auto vacuum logs they show you I don't think I have one in here I don't have an example but it shows you how long the vacuum took how much it vacuumed how much space it found it gives you a lot of nice rich data there long running auto vacuums this is pretty easy this is my number one favorite table on the database PG Stat activity this one you can just say hey show me everything in the database that's been running for more than an hour and I could then say where it's like auto vacuum and I could actually see them so just watching those as they're running is important that's going to help you get back in your system blow this one I'm probably going to run right up to my time with because this is really important calculating blow sucks not just in Postgres but in general the the problem is if I want an accurate view of the actual blow in the database I have to look at every single row in the table and then I have to count it and then I get a report so that's great for small tables and I do have the PG Stat tuple module in Postgres will do this for you it scans everything it sees all the dead rows hey here's what your blow is it can take a really long time so I usually don't use that I stick with the estimate method now there's multiple ways to estimate as a matter of fact after I did this PG Stat tuple I realized has an estimate but my favorite query by the way these are the estimate queries are horrible and scary looking and I don't even know who did them well I do know who did them and I give much love and joy to them because I could not write one what you're basically doing is it looks at the catalogs and it tries to figure out how much blow it thinks is there based on the statistics in the system so take a look at this is my favorite one but there's also the defecto Nagios plug-in for Postgres called check-under-score-postgres.pl has a bloke estimator in it the query for that is like prints out on like four pages it's horrifying but run those and this top one especially will very quickly come back and say hey it looks like this table is bloated by approximately this much it shows it right to you what's going on stick to estimates but they are estimates if you want to calculate it you've got some work to do you've got some tables to scan and then again looking at your tables I say seven days you're going to get more sophisticated with that number as you've been tuning these things as you start feeding back and say maybe I don't care seven days maybe it's more like 20 whatever your number is and you can adjust as I showed you per table you can adjust not only how when should I vacuum after a thousand but I could adjust the delay the throttle for just that single table so hey for this table you've got the disks go man take it and run as fast as you can and a couple of other things and then I've got 30 seconds left and this is basically it for extremely high bloat right or if you do again for space reclamation PG repack or you've got to basically dump and restore the table or create table select or vacuum full which is horrible or the cluster command which is horrible so use PG repack it's a great module for doing it it works very easy the funny thing about high blow is remediation right I'm a DBA I'm an assistant assistant admin in DBA so what does that mean I hate developers I really hate them so number one remediation is stop doing that do something else man that can be a problem right but lower the throttle get some separate disks right get move those tables or those indexes to faster disks they'll speed up your vacuum and again disabling auto vacuum and moving to chron is actually a good strategy it's useful usually when people get to that step they're like man I hate this auto vacuum thing but if you think about it at that point you're dealing with one table and auto vacuum deals with the rest it's kind of beautiful so again the four step programs of success start with the defaults monitor your workload go per table don't try to go global at first eventually you'll be comfortable enough you say you know what I can go globally in a way I go but it's monitor adjust monitor adjust and then be happy because you don't have to worry about vacuum that's it so any questions awesome may the vacuum be with you thank you everybody