 Alright, so I'm going to cover this slide again, but I've got to make some corrections to it because apparently I'm wrong about this slide, which is great. So I'm the CTO at Crunchy Data, I'm a committed major contributor. Rollable security was 9.5, common recruit was 8.4, but the roll system was actually 8.1, which Chris actually pointed out to me and I updated some of my other slide decks, but apparently not this one, so oh well. But yeah, I've been hacking on Postgres for 15 years or so, something like that for a long, long, long time. So as we go through this, feel free to reach out to me with any questions or whatever. I'm happy to take questions as we go, but basically I'm just going to kind of run down kind of Postgres from a systems administrator kind of viewpoint, right? So first off, let's talk about some terms. So in Postgres, you have what are called clusters or instances, right? We kind of use them interchangeably in the Postgres ecosystem, even though they aren't really that in other systems. But what we mean by that is it's one Postgres server listening on one port, which by the way is a Postgres limitation, it's kind of noxious. I wish we would allow it to listen on multiple ports, we don't. You can listen on multiple addresses. It's one set of data files, including all tablespaces and one stream of write-a-head log, right? So what this means is that if there's a crash, we have one set of write-a-head log to replay across all of the databases that were included in that cluster. There are some different things that you can do at a cluster level. There's just basic initialization that's just creating it to begin with, starting and stopping the cluster, right? That happens on a cluster level. You can't, like, start or stop individual databases. It's the whole thing or nothing. The file level backup tools, things like Backgres, which I'll talk about later, operate on a cluster level because they involve the write-a-head log. You have to save the write-a-head log in order to have a consistent backup. And because the write-a-head log operates at a cluster level, it means that all your file level backups end up being at a cluster level as well. And the same thing goes for the streaming replication. So Postgres has physical streaming replication support, and that's done through the write-a-head log. And what that means is that the write-a-head log, or that the system that you're doing the replication with has to be a complete copy of the entire cluster, right? We don't have the ability to do, on a physical level, like, streaming application for individual databases. That is something you can do with the logical replication that Postgres supports. So Postgres does have logical replication support today that you can replicate individual tables even, or individual schemas. You then have sets of objects that are defined at a cluster level. Table spaces is a big one, and then users enrolls as well. So what that means is that you have, even though you might have multiple databases inside of Postgres, they all have to have ownership of, you know, from the roles that exist in the system, and all the roles are defined at a cluster level. Table spaces, because they're defined at a cluster level, mean that they can actually have objects from multiple independent databases in them, which can be handy at times, but can also be a little bit annoying, because if you want to drop a table space, you have to connect to every database and make sure that that database doesn't have any objects in that table space. If they do, Postgres will complain to you. We won't let you drop it if there's any objects in there, but it's a little bit obnoxious having to do that. What all of this kind of boils down to is that I tend to recommend to people, and, oh, Chris, maybe you need to hear your comments on this, but I didn't tell people, like, treat Postgres as almost as if databases didn't exist, right? One cluster is like one database, right? I would tend to argue. Having multiple databases underneath of a single cluster is not something that I found typically end up being a good solution, because you can't do things like join across tables in different databases, unless you use, like, the Postgres FTW or something, and then the performance just sucks. So I tend to encourage people to, like, be thinking about things in terms of databases and having one or just a few databases on an individual cluster. That way you can do replication, right? You can do things like a five-level backup and restore, so you can restore just that one cluster, right? You don't have to end up doing a restore that restores, you know, 50 different databases that you've got running. One big downside is that every individual cluster chews up shared buffers, right? It allocates however much shared buffers you set for that database or for that cluster route. So that is one of the downsides of the trade-off. You've got a server that isn't hit very hard, and you want to have one set of shared buffers, and you don't care that stuff sometimes has to come off of disk because it's been pushed out, then maybe, okay, fine, right? I can see arguments there where maybe having them on the same databases or on the same clusters, okay? But a lot of times it just makes as much sense to reduce your shared buffer size, and that Postgres uses the Linux kernel cache for a lot of its buffering, especially if it's a relatively low-performance system. Databases. So Postgres does have databases, and these are basically the container objects for all of the schema and what are called database-level objects. Whenever you connect to Postgres, you connect both to... You really connect to a database, right? So you connect to the cluster, right, the listening process, also called the Postmaster, but then you tell it, I want to connect to database X, and then the Postmaster will go open database X, make sure that you have connect privileges on it, and then start you up, and once you're started, you're working inside of database X. If you want to go talk to another database, you have to disconnect and reconnect, or you use some kind of inter-server thing like the Postgres FTW. There are a number of permissions that exist on databases. You have the connect privilege, which allows you, whether or not you connect it all to the database, create, which basically means you can create schemas, and then temporary, which allows you to create temporary objects. So inside of a database, you have schemas. Everything pretty much lives inside of a schema, all of the normal objects that you work with live inside of a schema. So this includes tables, indexes, functions, all of these things exist inside of a schema. Views, and then we have a set of permissions, right? So you can think of schemas in a way as kind of like databases, right? Or sorry, kind of like directories. And then you have the option of either being allowed to create new objects inside of a schema, or you have the question of whether you're going to allow to use objects inside of a schema at all. You then have the ability to have further granular permissions on the objects inside of the schema itself, which we'll talk about in a minute. I've talked a little bit about tables based before, but they're basically, it's an alternative directory. Usually it's not like a different file system because it provides you with additional IBO channels that are available to the system. And so you can create them using create tables based, which is pretty handy. It also means that you can expand your Postgres instance across multiple file systems. Basically what goes on table spaces are tables and indexes. Things like views don't actually have storage. Materialized views can go on to a table space. But one thing to realize, of course, as I mentioned before, is at their cluster level, and therefore can contain objects from multiple different databases. Postgres has a right ahead lock. So this is really, really important. And we've had some changes, right? So prior to version 10, we called this directory pgxlock. In pg10, we changed it to pgwall because we found that sometimes people thought that those were logs that could be deleted. And that's very bad because we need those right-hand logs to do things like crash recovery. So in Postgres crashes, what happens is that we come back up and we look at what the last checkpoint was in our control file, and then we replay the right-hand log from that last checkpoint up to the end of the right-hand log, and that's how we reach a consistent state. So if you delete it, bad things happen. Don't do that. Every single query that comes into Postgres that does any kind of rights is only going to be, you know, when you go to commit that query, we will only acknowledge that query once we've written that data out to the right-hand log, right? And sync it, right? This is how Postgres guarantees that we don't ever lose any of your data. All of those changes are written into the right-hand log with a CRC so that we can tell if we have a partial right into the right-hand log. One of the things that can happen is that the right-hand log can become a potential point of contention on high-right volumes. And this is why it makes sense a lot of times to split the right-hand log off onto its own file system, onto its own IO channel, onto its own disk, or SSDs, or NVMEs, or whatever you're running, right? Have it be independent because it can become a contention point as we have lots and lots of data being written into the right-hand log. One of the things that we'll be aware about is that it contains both what are called full-page changes and incremental changes. And we'll talk about those a little bit more next. We talk about checkpoints. But just realize a full-page image is a full 8K page. Everything in Postgres operates based on these 8K pages, right? We read off the file system in 8K chunks. We write out to the file system in 8K chunks. Everything happens in 8K. Periodically, we will take an 8K page that's been modified and write it into the right-hand log. Once we have that written out, subsequent changes will be written out in essentially a binary diff method where we track exactly what change was made to that page at a binary level, and that's how physical replication in Postgres works. Checkpoints. We have this bunch of data in memory in our shared buffers. We go and make some update to that data in memory. That modification of that page dirties that page. Someone then wants to commit that change. What do we do? We basically make sure that that page has been written into our right-hand log, or that change, whatever that change is, has been written into that right-hand log. We need to go make that change out to the heap at some point, too. That process is called check-pointing. Check-pointing is the process that makes sure that page gets written out before our next checkpoint. Why does this matter? Because during crash recovery, we have to go back to the last checkpoint and replay all of the changes from there. That's how we can, again, reach a consistent state. This is a really important thing to be doing periodically. By default, it happens every five minutes. I know people who change that. One of the reasons why people change that is because you may want to accept a longer recovery time in the event of a crash in order to reduce the right-hand log volume. How does changing the checkpoint time out reduce the right-hand log volume? After each checkpoint, the first modification of an 8K page results in that full page image going into the right-hand log. Subsequent changes are only modifications to the page and those are much smaller when those go into the right-hand log. But if you have checkpoint time out set it to five minutes, then after five minutes we will have checkpointed and the next right to that page will, again, put an 8K page into the right-hand log. It's a trade-off. People go as high as, like, half an hour on their right-hand log checkpoint time out. What that means is that you could potentially have as much as, like, half an hour of changes pending in the event of a crash that you then have to replay during crash recovery. So it's a real trade-off I'm going to consider. I will talk more about those configuration options a little bit later. It provides a lot of really great packages for various different distributions for Postgres. So one of the really nice things is that the Postgres packages that are provided allow you to do concurrent major version installation which also means that you can have relatively smooth upgrade paths. So if you're doing a major version upgrade from 9.6 to 11 or from 10 to 12 or what have you Postgres PGDG packages makes that much easier. Those packages are also supported through the community mailing lists and all of the updates to those packages are coordinated by the Postgres release team for all of the mining whatever Postgres does a minor or a major version update. So you know that those packages are always current with whatever the current Postgres release is. So if we're looking at installing on a Debian or Ubuntu style system app.postgresfield.org is the repository for those packages you can add a PGDG sources list D which is pretty straightforward. You can see it here. This is all documented if you just go to app.postgresfield.org you can see it there too. And then you have to install the key for that repository which is pretty straightforward and then it's app get update or app update and then apps install Postgres. And boom! Postgres is installed. That'll install both the server and the client library the server, the client libraries and the default clients. Postgres packages for Debian do split up the client into other packages so if you want to you can just install the PSQL client and libpq and what not onto an individual system. You don't have to install the server everywhere that you might want to access Postgres from. When it comes to the configuration on a Debian system between the two different systems between Debian and Red Hat Debian follows the FHS file system hierarchy standard and that means the config files go in Etsy. So in Postgres land the config files are in Etsy PostgresQL and that X is whatever the major version is so that'd be like 9.6 or 10 or 11 and then main is the cluster name. So one of the really cool things on Debian is that it has a bunch of stuff for cluster management so you can have multiple independent clusters running different major versions the same major version whatever you want on one Debian instance. There are then a bunch of wrappers that make working with these different clusters easier. So PSQL on a Debian system is not actually binary PSQL when you run it. It's a Perl script. It's a Perl script that looks for a dash dash cluster option that you provide and then it will figure out what cluster you want to access, what port it's running on and then it will actually set the appropriate environment variables so that when PSQL isn't executed by that script it will actually connect you to the correct Postgres instance. It will also pick the right PSQL because PSQL changes major versions too and it will actually pick the PSQL version that matches which is kind of interesting. The actual binaries live and user live Postgres Qo the logs all go into VAR log which is pretty straightforward. There's also some startup logs although some of that's now been changed with SystemD magic and then we have one either an int script or now it's a SystemD process that will actually start all the major versions and all the clusters for you automatically. So these are those Debian provided wrappers and helper scripts PGLS clusters, PDCTL cluster are the ones that you really might be thinking about that are not familiar to you and here you can see an example of running PGLS clusters on my laptop here where it shows all the different ports that things are running on whether it's online or not what the name is here you can see I have two different clusters that are running Postgres 11 with different names they live in different directories so this is the data directory for the app 1 cluster this is the data directory for the main cluster under 11 and then they've got different log files and whatnot pretty straightforward so that's kind of things on the Postgres on Debian kind of value and like I said you pass that it should be 2 dashes dash dash cluster and then you put something like 11 slash main and then that would connect you to the Postgres 11 cluster named main on a Debian system alright now let's talk about Red Hat so for Red Hat, CentOS RPM based environments we have yum.postgresql.org it's got a script that initializes the cluster and sets things up for you you can run multiple major versions in parallel I will say that unfortunately the Red Hat packages don't have all of the nice cluster different stuff that Debian ones do but you can still install it and run it and that Postgresql 11 setup script you can specify different data directories for it and that would allow you to have multiple different Postgres instances of the same major version running under a Red Hat or a yum based environment and then you can enable starting Postgres on boot using this system CTL so otherwise it's pretty much the same as running Postgres on other systems one thing I will mention that if you're thinking about doing multiple clusters in a production environment it's my preference to run those independent clusters with different Unix user IDs or run them in independent containers just to keep the separation there containers provide C group separation which is nice if you are running them inside of one container or running them inside of one kind of Linux instance then having different users means that you won't have any chance of confusion about which shared offers belongs to that can be bad if you mess it up Postgres has some checks in place to try to avoid that happening anyway but I like to be extra paranoid and run them as independent Unix users so redact configuration by default Data Directory is as far as PGSQL all of the configuration is in the Data Directory which I hate the reason I hate it is because it means that you're going into the Data Directory and VIing files and potentially MVing files and RMing files and doing other things that are very very risky to do inside of your Postgres Data Directory so I like that I mean different people do different things but not my recommendation so you do have logs that are moved out and put so the logs are also inside of the Data Directory inside of this PG Log Directory I honestly think that's part of the reason why people went and removed things like PGX Log is because the Log Directory in Red Hat was PG Log and that's pretty freaking close to PGX Log you do need to have an independent initialization script or an independent CTL setup in Red Hat it's not like Debian it doesn't have the nice one kind of service starch them all thing and it doesn't have any of the helper scripts that are on Debian so pick your poison that's all I would say about Red Hat I like Red Hat in general but I'm not a huge fan of how they do that installation alright so let's talk about some Postgres config files so postgresfield.com is your general server configuration it holds things like what IP address to listen on what port to listen on how big shared buffers is going to be you then have the HBA file which configures how authentication is done to the server IDENT allows you to do user mappings and then there's that PG Log Directory for Red Hat as mentioned on Debian or else on Debian there are also some Debian specific config files for you which are kind of cute there's one called start.conf which allows you to configure whether or not the cluster is started automatically for you or not I think there's ways of doing this inside of systemctl now too since we're using systemd there's also a pdctl.conf which allows you to configure different options to be passed to pdctl when it's run out of the pdctl cluster command and then there's some different environment settings that you can set you generally probably don't need to monkey with those though we do also have a PostgreSQL common package on Debian which is what gets used when things like new clusters are created so you can change what the defaults are for a given cluster one of the things that you might want to do in particular is that you might want to specify some options to initdb when creating new clusters my favorite is dash k does anybody know what dash k does to initdb so which one it is not the fsync option no you should know this you should care about your data enough to know this well that's a checksum it's checksums so the dash k option to initdb is enabling page-level checksums it is off by default PostgreSQL supports page-level checksums but you have to explicitly enable it and you have to enable it at initdb time which sucks it also means that you can't PG upgrade from a non-CRC cluster into a CRC cluster not yet anyway there is work going on to improve that situation but I think it just got punted out of p12 unfortunately but hopefully at some point in the future we will have support for online enablement of checksums but it's not there today but I did want to mention that dash k is a great option to specify in your createcluster.com for initdb so that all of your new clusters created automatically have checksums enabled because it's really good to have there's some other things you can do around user-specific information and controls for clusters this is kind of cute because you can set it up so that users A through F or whatever connect automatically to this cluster whereas everybody else connect automatically to a different cluster which is kind of cute PG upgrade cluster is a directory for scripts to be executed during a PG upgrade so PG upgrade is how you go from one major version of Postgres to another and you can have pre and post scripts in there for doing whatever you need to have done during that PG upgrade it has a few specific configuration files such as the init script there are habits and changes to reduce the need to modify them which is nice for example the port is no longer specified in the init script which is handy so that's a little bit nicer but otherwise I don't tend to modify the red hat configuration files very much not the red hat specific ones alright so let's talk a little bit about initial configuration of Postgres the first one is listen addresses you may need to configure to allow external access and then max wall size is something that you probably want to consider increasing if you have any expectation of there being any kind of write load on your environment what happens is that remember we talked about checkpoints and having to write these 8k pages out well you can have the check point happen due to time which is typically what you want meaning that you check point every 5 minutes or so but you can also have checkpoints be forced if we run out of right ahead log space so that max wall size there we write more than that amount of data in 5 minutes we are going to check point immediately right because we need to have we need to get back underneath of that max wall size right in order to kind of do what we're supposed to be doing which is respecting the parameters here but that means that we have the check point check point faster increases the amount of wall you have right because we have to go do a new full page image after each one and just overall ends up slowing down the system right so you really want to increase your max wall size if you have a high write load the check point completion target for reasons that still baffle me we have a check point completion target parameter that basically says well if you want you can make it check point even faster than what you actually set the time out to be why we have these two things is unclear to me because if you wanted to make those just check point faster just decrease check point time out and then we would do it more often but we also have a check point completion target I tend to recommend upping this to 0.9 that way we actually use the entire time between check points to do writes by default it's set to 0.5 which means that we're actually we're doing a check point every 5 minutes but we're only going to write the data out in the first half of that when we're going to try to write all the data out and then we're going to like do nothing for a while so it's a little bit interesting and it also means that you end up with discrepancies and latency and things like that because depending on when your write is happening it's in that the second half of the check point time or the first half of the check point time can make a real difference into how long it took to actually get it out effective cache size is something else you may want to adjust along with max wall sender I think max wall sender in 11 maybe we got that already but for a while there it was set very low so you may want to consider increasing that the reason that matters because of things like PG base backup uses a wall sender replicas use wall senders PG receive wall uses a wall sender so there's different processes that conventionally want to be a wall sender to cover all of them all right logging configuration these are things that you should enable the default logging configuration in Postgres sucks it's terrible we actually do have a lot of good logging options in Postgres but they're all disabled by default so things like logging up connections and disconnections logging of lock weights is really important logging DDL I always recommend logging all DDL unless you really can't for some reason logging duration statement that's in milliseconds so if you have a query over 100 milliseconds you may want to enable it if you're here for my last talk I'm probably going to update this for v12 to say log min duration statement to 0 and then set a log sample rate instead because v12 will have the ability to have a log statement sample rate that will give you a sampling of queries rather than logging all of them log tempfile 0 means log every tempfile that's created that's the bite size so you could not log really small ones if you wanted I find that tempfiles don't get created often enough to be an issue so I like to log all of them so I can see what's creating tempfiles log auto vacuum min duration is basically giving you all the information about auto vacuum runs whenever auto vacuum runs, whatever it does give me all that information in the logs really really handy here you'll log my prefix it's a pd badger so if you're a pd badger which you should be because it's a really awesome log analysis tool you can just go pull their log min prefix this might be slightly different from that but it's based on that the default log on prefix is nothing which is insane because it means that your log doesn't have things like time so if you're using the poster that's built in logging go modify your log min prefix so here's some basics about pghba configuration so you essentially have what kind of connection it is whether it's a local connection or a tcp or a tcp over ssl or not over ssl and then what database you're connecting to what user you're connecting to where you're coming from in terms of address space and then what method to use for authentication we'll talk about that in a minute but the big important thing here is that this is read in order so you can have special database called all or same user you can have special user called all you can also have the addresses either v4 or v6 and then there's a special method called reject which denies access so again this is top to bottom first match wins the first match is a reject line then the connection is rejected otherwise it uses one of these authentication methods and these are the ones that I would recommend that you use here is a great authentication method it basically punch to the Unix socket and says Unix socket tell me who the user is whoever the user is is who that user will be considered to be authenticated as inside of Postgres and that's great that's very secure gss also known as Kerberos or SSPI if you're on a Windows platform if you're coming from a Windows platform into Postgres on a Linux platform you can still use this and actually vice versa too but why would you run Postgres on Windows I do recommend using gss it is a really great authentication method it integrates great with Active Directory I just wrote a blog post about how to do it so if you're not sure go read my blog post and it's just basically a good strong authentication method certificate base is also acceptable but it means you have to maintain this whole certificate hierarchy thing and you have to have a certificate authority in the whole bit you have to roll your certificate authority and it's a whole complicated mess but it's very secure so if you want to have a lot of security using client-side certificates is definitely a good solution as well then there are the methods that are alright they're not as good as first list but they can be okay the first one is Scram Scram is a new option that was added to Postgres 10 I think it's called the salted challenge of authentication method it's password based but otherwise it's quite strong password based authentication methods suck but at least this one's pretty strong so this one the server never knows the user's actual password for one thing which is really really nice because that means if your server gets popped the attacker does not automatically get your password which is really nice and the client never knows the server's secret the server has a secret as well and the client never knows about it so that's also really nice so there's some really good properties of Scram Pam, Radius and password are all also okay all of them mean that the user's password is sent to the Postgres server in some way shape or form so you have to be using SSL and you have to trust that your Postgres server never gets popped or the attacker can monitor that server and discover what your passwords are as you authenticate to the system Radius is there's a caveat on that one right because typically with Radius you have a one-time password token in which case they only get your pin they don't actually get the one-time password code so that's a little bit better the ones that I really recommend you avoid are MD5 that's old and is really deprecated now use Scram instead LDAP which has the same problem of the password gets sent over the wire and yes you can encrypt it but it still is not ideal and not only that but if you're using LDAP it probably means that you have a single sign-on environment which means that if that Postgres server gets popped they get the password that is the authentication password to your account on the entire domain which means that they can access any other resources as you at least with password or with PAM or something you could have a password that is specific to that server so that if that server gets broken into with LDAP you end up not doing that and you're typically running LDAP in an environment where you have active directory and Kerberos and you should just use that IDENT I think we actually ripped that out finally but it's terrible don't use it and then TRUST is just bypassing all authentication there's some very specific use cases for it but I'm not going to go into those PGIDENT PGIDENT is a way of mapping from whatever you authenticated as a user you authenticated as to a Postgres user and you can provide a map name that you can configure in your pghba.com so this is basically if you're using some of these more advanced methods like GSS or certificate based authentication this is how you can create a user mapping from the authentication user to a simpler username of the database which is nice alright so let's talk a little bit about running Postgres so is Postgres up right you can just use service and that gives you the status from system D you can then also call a binary called PGISREADY and then again if you're on a Debian system you can pass in this cluster option to have PGISREADY called for a specific cluster you can also connect to the database using PSQL to discover if it's up and online this is what that looks like on a Debian based system where I've passed in dash-dash cluster pretty straightforward PSQL is typically your user interface to Postgres it's really powerful, it's really handy all of the PSQL commands start with a backslash you can do that backslash question mark anything else that you send into PSQL will be sent to the server as a query backslash question mark gives you all of the different backslash commands if you want to get SQL syntax help backslash 8 will give you the SQL syntax for select, create table, create index whatever you want it's really really nice backslash q or control d will quit I believe now you can also type quit and PSQL will quit if that's all it gets that was like some new addition into 11 or 10 or something all queries return tables or command results and you can see expanded output by doing backslash x so if you want to see who's connected to your Postgres server backslash x gives you the expanded view that means you can get a separate row for every column and then you can run this table command table is a SQL standard query by the way that's part of the SQL standard if you get which one but you just do table PDSAT activity it's like a select star and then you can see a record for every user who is connected currently to the database and you may also see back end different back end processes that are operating inside of PDSAT activity which is nice you can see a list of what databases exist using backslash l then there's a whole bunch of other commands inside of PSQL that are really really handy here you can see I got a Postgres database and then I have a template 0 database what are templates so whenever you want to create database command Postgres basically wants to copy an existing database and make it the new database whatever the new name is the source database is considered a template and by default we use something called template 1 and you can actually connect a template 1 and add new objects to it if you want right you can install extensions and then when you want to create database we'll copy those all along out of that template this can be pretty handy if you screw it up badly enough you might have a case where you can't run create database anymore your results are garbage and this is where we have something called template 0 which is basically a if you screwed up template 1 bad enough you can explicitly do a create database from template 0 this does require you never modify template 0 don't do that it is a little bit hard to do but you can't do it you just have to set it as allowing connections and then connect to it and modify it but if you do that sorry good luck if you break it you get to keep both pieces alright so there's two main ways of creating users in Postgres there's this create user units command and then if you actually connect to the database using psql you can also run create user which is pretty handy you can also run create role or create group I think also it still works create group is basically create role create role is like create group create user is create role with login rights so the only real difference between users and groups in Postgres is which one is allowed to login or not otherwise they act basically the same alright so here are the different user privileges if you're standing up some new Postgres instance don't give people super user access don't give your application super user access you should only ever be one super user you should only be able to access that super user by eschewing your Postgres on the file system, on the box and connecting at least that's my opinion other people have other options but be very careful with super use create role it actually gives you both creating and modification of role privileges be very careful with that one create database again I don't really like having lots of databases on my server if I didn't send anyone it seems like a bad idea I would make that super user job we then have login which you can change to say whether or not a given role is allowed to login whether a given role is allowed to do replication or not whether a role is allowed to change role membership with the admin rate and then the inheritor which basically allows you to justify whether the role gets grouped for privileges you have to set it up so that you have to actually explicitly do a set role to change roles to a different role that has different privileges by default inheritance is enabled and you automatically get access the other thing about this is a little bit already users are role role users groups are roles too create role, login you can grant roles to other roles but you can't create it by default inheritance is there so that you automatically get access to a role if you set a no inheriting role users have to do a set role so if you want to have a pseudo like administration with set role you can create a role called admin with no inheriting grant postgres to admin and then create some user joe and grant admin to joe on login joe has the rights of joe and the admin user but not the rights postgres joe can then set role to postgres once that set role happens it becomes a super user kind of like pseudo to root currently there isn't any way to require a password for set role though unlike pseudo so something to be aware of I'm not going to cover all the different permissions but they're here for all the different types of objects that exist inside of postgres by default most things in postgres are secure when you create a new object inside of postgres it typically doesn't allow anyone else to access that the one exception to that are functions so functions by default when you create it the execute privilege is granted which is a little bit scary especially if you're playing around with creating essentially setuid functions which you can do in postgres they're called security definer functions the way around this is sort of transaction create your function revoke execute from public before you commit that transaction that way no one else could have seen that function until after you revoke rights to it there's some other handy things you can set default privileges and you can grant access on all objects in this schema so those are pretty nice as well to be aware of them there's database size information in postgres so you can query a database and say give me how big the database is you can also get this information in hl plus which will give you the size of every database you can get the size of individual tables using this total relation size that actually includes all of the components of the relation which is the table, all the indexes the toast table, everything there are ways of getting the information about the individual pieces too if you want it there's a pg relation size that you can pass in what you want so if you wanted to get the total size of all tables in a given schema that's a query that will do it it's a little bit complicated but it's not too bad creating table spaces this is pretty straightforward so if you are running out of space on your main file system you want to create a table space you can basically run this create table space command the directory should already exist and it should be empty those are the things and should be set to 700 for permissions and then you do need to specify the full path do not put table spaces directly on mount points number one always create a directory underneath and number two please don't ever put a table space in your data directory I don't understand why people do that but they do, don't do it it's complicated, postgres even tries to make you do that for various reasons including things like what happens when you back up some stuff don't do that if you want to see the information about a table space it's backslash db will give you all the information you need you can also get the table space size using that table space size function which is pretty handy and if you want to drop a table space that's pretty straightforward here just drop table space whatever but you might have to connect to issue drop commands for all the objects in order to actually get them all dropped from all of the objects dropped before you can drop a table space alright if you want to do backups with postgres everybody should care about backups the first one is pretty straightforward using pgbase backup it's actually a very good tool I recommend that if you're going to use it use your pgbase backup into a tar ball or into using the tar format that way it'll be compressed and checks up right because Zlib automatically does that for you so you know if the backup's any good or not if you do a pgbase backup into a directory you don't actually know if that backup has been later modified or corrupted due to some on disk corruption because there's no checks for that the only thing that you might have are page level CRCs which is good but other than nothing but that involves restoring the database and starting up postgres and then accessing that page it's going to be a little bit obnoxious note that you have to have your wall files so pgbase backup when you run it you can tell it give me the wall files for this you have to have the wall files for a consistent backup if you don't have the wall files it's not consistent so be aware of that it does nicely also include indexes it's not just tables so what it means is that you don't have to rebuild your indexes that's different for logical base backup tool called pgdump which give you essentially logical or text based backup this is kind of like exporting all the data but the indexes are not included the foreign keys are not included directly they're included as create foreign key commands what that means is that when you have to go to restore a logical backup it ends up being very very slow it's much slower than restoring a file based backup so think about this in terms of your restore timeline like if that deployment of the application went south and you needed to go run to your latest backup and restore it what are you going to do you probably don't want to re-import all the data from your pgdump re-create all of your indexes and re-validate all your foreign keys because it's going to take forever if you have a database of any size so instead you should be thinking about having file level backups it also gives you the ability to have restore points and do point and time recovery and be able to do things and have like all your indexes and what not so that cycle becomes much much shorter when you're using a file based backup be sure to test your restores don't assume they work I don't care what your backup methodology is if you are using pgdump you have to be testing that you can actually restore it if you are not testing that your backups are actually good and that you can restore from them you don't have any that's what I tell people you have to be testing these things and you should be thinking about things like restoring from off-site failover, failback and how much data loss and how much downtime are acceptable to restore a pgd-based backup that's pretty straightforward you basically just have to extract the tar ball pgbackrest so this is a tool that I strongly recommend for anybody who wants to go run file based backup with postgres it's got a lot of really cool features it's built to scale it handles petabyte sized databases and it's actually very very fast right so you really want to be thinking about pgbackrest as your go-to tool in my opinion if you're doing file based backup so it has compression it does check summing all files it backs up it does actual validation of all the page level check sums and it's just great so strongly recommend it there are some changes you have to make to your postgreskill.com in order to be able to utilize that I'm not sure I've done it 330? so I'm going to go a little fast here so this is what you have to do in front of your postgreskill.com to set this up basically the main thing is the archive command there's a couple steps for configuring pgbackrest there's actually a whole user guide on pgbackrest.org that you can go through that's really straightforward there's an info command to see all the information about your latest backups and also includes these days information about your write head log you want to do monitoring you're using Nagios, iChinga, MRTG the check postgres package is really nice for giving you lots of things to check these are the things that I recommend checking there is also a pgmonitor which if you're running Prometheus which is a great set of metrics collection tools pgmonitor is really good for that if you want to monitor your log files there is the tail and mail program which I can understand the postgres log format which is pretty nice you can also process your log you can also have your log files go to csv and then do things like process those csv files you can also configure postgres to send logs to syslog which can then go to Splunk or whatever your enterprise solution is for a log animal so I'm going to cover these things really quickly increase shared buffers increase work mount increase maintenance work mount make auto vacuum go faster because by default auto vacuum doesn't go fast enough most times if you see auto vacuum running you probably want that to happen number one and if you see it happening a lot it might be because it's not going fast enough if you have a lot of rights you should be thinking about hey maybe I need to make auto vacuum more aggressive because you don't want it to fall behind if it falls behind you can get into some really bad bad states so you can increase the number of max workers and you can decrease the cost away to make auto vacuum go faster the defaults are really only good for a very low transaction rate system so be aware of that managing connections so consider using a pooler if you have to increase max connections over 100 you should be using your connection pooler like PG Bouncer is my favorite it's fantastic you can use PG Bouncer in transaction mode which is fantastic that also gives you things like the ability to do failovers which are completely transparent to the application which is freaking awesome so be looking at PG Bouncer for doing that it's great PG Pool is alright but I consider it to be too much of what it's doing it tries to do too much so I like PG Bouncer a lot managing locks so Postgres has a limited set of locks called and those are configured through max locks for transaction you may want to increase that if you have a lot of objects in your database system we talked a lot about checkpoints already PG Badger PG Badger is a really great log analysis tool which produces some really pretty reports so if you're doing a lot of logging with Postgres seriously consider installing PG Badger and having it for you so you can check out what's going on Questions? I got 5 minutes yes go for it Chris just kind of a question comment you mentioned not using LDAP authentication that occurred to me where people are using LDAP as a single sign on but they're not using active directory would you particularly recommend like pushing them in the direction of using LDAP to manage a certificate authority and certs instead or if it was me I would say install MIT KDC and run a Kerberos environment it's actually once you understand the basics of Kerberos it's not that hard I do something that's a heck of a lot simpler than active directory is to run it manually so that would be the direction I would go in yes you could also do a certificate authority and deploy client-side certs and server-side certs and all of it Kerberos is easier yeah it's not the open system it can mess up the campaign yes it also means the other nice thing about Kerberos for those that don't realize it connections to Kerberos connections to Postgres using Kerberos do not involve the KDC right? so there is a KDC KDC issues tickets to the clients and it only does that because those clients and certificates for tickets last 8 to 10 hours by default and then once you have that ticket it's just a couple back and forth messages to the server it's actually really fast and it doesn't involve the additional outside network connection garbage that the LDAP stuff does so it's much more reliable because of that another question I wanted to ask what is your opinion on running Postgres in Linux Kerberos and namespace containers such as Docker because there are varying opinions some people don't see a problem about it because it makes maintenance easier some see performance problems but sometimes I feel like the reason behind them is of course your opinion so sure I'm happy to talk about that crunchy actually has a whole container suite of containers for running Postgres in containers I don't recommend Docker because I consider Docker to be less than production ready if you want my 2 cents on it but in general I don't see a problem using cgroups in fact I was part of the people that were running Linux virtual servers way way back in the pre-cursor to cgroups and I did it with Postgres the really big important thing that you need to be thinking about when you're running a container based environment is not typically the fact that you're running cgroups and I don't see performance issues with cgroups where you run into issues is with the storage underneath because you're running containers often times you're running them on some kind of network file system and that's where you end up having pain because those network file systems number one they may not be fully compliant or you may not have configured them to respect things like fsync and locking in which case you end up with corruption and if you do configure them correctly with the appropriate locking make sure to respect fsync all of that stuff it's slow and people complain about it being slow there's high latency but yeah you're like sending an fsync call to a remote system waiting for it to come back there's going to be some latency involved and Postgres does that a lot why? because you are committing a lot of data and you want us to guarantee that that data has been committed out to disk to do that we have to call fsync so that ends up being being expensive so I would say containers themselves are not the issue when it comes to Postgres and certainly OpenShift or whatever I think is perfectly reasonable to run Postgres inside of as long as you have an underlying file store that is you know that provides all the guarantees that Postgres needs to operate properly yes just to add to the conversation when you said that Docker is not production ready so we run several Docker containers in my clients and there's a script that moves copies data around and checks the exit code of cp to make sure that there's no errors we ran into a problem where one of the files ended up at the destination with zero bytes and cp came through an error that is wow yeah well I'll tell you one of the other nice fun things about cp it does in fsync right? there's no guarantee that the data is on disk after you cp it which is probably what happened right? it's probably you know maybe the cp itself executed just fine but it never actually made it to the file system underneath that's a whole other problem with the linux kernel and fsync and that whole monstrosity of a mess that is the way the linux kernel can lie to programs about whether fsync works or not which is just ridiculous anyway next other questions no no Postgres in place upgrades right? well we've supported it for years and years I mean you can actually what do we support back to right now? I think 9-1 is it 9-1? I forget yeah something like that I think it's maybe from 9-1 or 9-2 there is if you go back far enough and I hope you don't have to deal with this but if you go back far enough at any point where you have to do a stepwise pg upgrade I think if you want to go like you have to go from 9-1 to 9-2 and then you can go from 9-2 to like 11 right? there is a version somewhere in there but it's way back I think since 9-3 or 9-4 you can just pg upgrade directly to whatever your target is and you can do it in place if you're interested in how what's called the link mode basically a mode in pg upgrade where it will hard link all the files between the two major versions and that's how it does the in place upgrade other questions? no? good I'm out of time anyway thank you all