 We have a lot of material to cover here, so this is actually a condensed version of a tutorial that I've given that's three hours, so we're going to be going through it a little bit quick, but feel free to ask questions. I'm going to go ahead and get started now so that we can get a little bit more time in because nobody cares about the front end stuff anyway. I'm Stephen Frost, I'm with Crunchy Data Solutions, I'm a Committer, major contributor, implemented role-level security in Postgres 9.5, which was just recently released. I did call mobile privileges, I've implemented the role system back in 8.3, so I've done a lot of different contributions to Postgres over the time. So we're going to go ahead and jump right into it, right? So the first big thing is a lot of terminology here, when you're working with Postgres, it's not quite the same as every other system, but it's important to understand that terminology so we have a baseline. Yes, sir? Slides will be published, yeah. If you look for the Postgres Open 2015 slides, they're there. They'll also be posted on the Wiki website for this conference once I'm done. So when it comes to Postgres, you'll come across a term such as a cluster or an instance. This is essentially one Postgres system that's running. We view that as a single postmaster listening on a single IP address and a single port, right? Of course, we can listen on multiple IP addresses, but the idea is that it's one thing answering the call that comes in, right? And it's also one set of data files. Those data files can be spread out using table spaces, we'll talk about that a little bit later. The other big thing is that it's one right-ahead log. I'm going to go over what right-ahead log is later also. The general operations that you can do at a cluster-wide level or start, stop, of course, initialization, that's also where all of your file level backups are done at. Whenever you're doing a backup, you have to backup an entire instance or an entire cluster if you're using file system backups. There's other tools that you can use that allow you to do things that are smaller than that, but they're all logical backups. Streaming replication. So because there's a right-ahead log, the streaming replication also ends up happening on a cluster level. And then there's a number of different objects that are defined at that level. Users, roles, tablespaces, and databases, those are the kind of objects that exist at a cluster level inside of Postgres. When it comes to what is an actual database in Postgres land, so a database lives inside of a cluster and you can have more than one database inside of a cluster. There's a few specific permissions that you're allowed to grant at a database level which include connect, create, which is talking about the ability to create schemas, which I'll talk about next, as well as the ability to create temporary objects using the temporary bit. When you're talking about a schema, schemas are namespaces inside of databases. So this is kind of another level of this onion that we're working towards here. So you have on the outside, you have the instance or the cluster, you then get two databases, and then inside of databases you have multiple schemas. Schema level permissions include create and usage, so create means that you can now create objects, which I'll call principal objects, inside of the database. You then have the ability to use the schema with the usage option. Those permissions are at a schema level, so now you're considering like a directory, right? Whether you're allowed to modify the directory, add things into the directory, step into the directory, those are what would be analogous to create and usage when you're talking about schemas. Tablespaces are locations for other data files for Postgres. So if you've got five different mount points on your system, you could have five different tablespaces that correspond to those different mount points, or maybe you'll have four additional tablespaces along with the default tablespaces that exist inside of the system to begin with. They are our cluster level, and what that means also is that they can exist with multiple different database contents inside of them. So you can have one tablespace and then have files or tables or indexes that are from different databases on that system using that tablespace. Tablespaces cannot be shared between instances or clusters, however, nor can they be moved between instances or clusters. They are associated with a particular instance of the system. All right, now I told you I was going to tell you about right-ahead log. This is an extremely important part of any database that has right consistency and also guaranteed right. So the right-ahead log is important because that's where every right to the database goes first. So whenever you do an insert or you do a delete or you do anything that modifies anything whatsoever in the database, we're going to write it into this right-ahead log first. We then, when you go to actually do a commit, we go and make sure that those changes have been committed to disk. That means they've been f-synced all the way out to disk. And only once that happens do we acknowledge the commit and tell the application that asked for the commit that we've actually accepted those changes and have committed them to disk and will not lose them after that. All of the changes to the right-ahead log are CRC'd naturally. And the changes to the actual data files in the background, so we're talking now about the tablespaces and the other files that exist inside of the system, generally speaking those changes are done in the background. That's what we want to have happen. We want to accept the changes through the right-ahead log and then they get written out to the individual files in the background over a period of time during what's called a checkpoint, which I'm going to talk about next here in just a minute. One of the things that you do want to do with a right-ahead log is that there can be a contention there, right? Because you have all of these different processes that are all trying to write through the right-ahead log system. That's why one of the things that Postgres supports is the ability to put the right-ahead log in another location. A lot of times what you'll want to do, especially if you're on spinning rust, is put that on a dedicated set of RAID 10 disks, right? That way you get as much possible performance as you can out of it. Another interesting issue with the right-ahead log is that there's two different types of writes that we do into the right-ahead log. We do what's called full-page writes and we also do incremental changes. So a full-page write in Postgres is 8K by default, and that's the first time we write out a page into the right-ahead log after a checkpoint, which I'll talk about next. That's going to be a full-page write, and then any subsequent changes to that same page during that checkpoint will be incremental changes. In 9.5 that we just finally came out with, we are going to be compressing those incremental changes as well. So you'll actually get, your right-ahead logs will get smaller. Or at least there's an option now to have them be compressed inside of the right-ahead log. Just to kind of hit on it again, what happens after we have a crash, right? So because Postgres has written everything into this right-ahead log first, what happens after a crash so the database system gets killed or goes down for whatever reason, when it starts back up it's going to look and say, okay, where was the last point in the right-ahead log that we had a checkpoint, right? A checkpoint is what we have when we actually have written everything out that was pending in the right-ahead log out to the heap files and synced to them all, right? Once that's done, we make a mark inside of the right-ahead log saying, okay, everything's synced to this point. Anything that hasn't been synced after that checkpoint is what we're going to replay when the database system starts back up again. So when the database system starts up, we read through the right-ahead log and write all of those changes out to the heap files because we can't trust that they were there because we didn't f-sync them ourselves. We didn't know that we told the kernel we have, you know, to write this data out and we know that it's done. So a checkpoint by default happens in Postgres every five minutes. What's happening here is that it's going through all of the dirty pages that we have and pushing them out to disk and f-syncing them. And what we try to do is we actually try to spread out the time that we're doing those f-syncs over that five-minute period. So it can be forced for a couple of different reasons. One of the reasons that a checkpoint might be forced is that we've run out of room inside of the right-ahead log area that we've been provided to write more right-ahead log, right? What this means is that there's an option checkpoint segment inside of Postgres that I'll talk about a little bit later. But what's happening is that, you know, we don't want to just write out as much right-ahead log as we have available to the point where we fill the disk, because if the disk ends up filling up on your x-log volume, we can't accept any more transactions, right? Because we can't write out to the right-ahead log first, which is what we have to do. So there's a configuration option that says, okay, we're going to take and work with a subset of the amount of disk available. And once we've run out of that space, if we haven't hit that five-minute mark and have been able to, and are able to reclaim some of the x-logs that were already written, we will forcibly start in a checkpoint, which can be very expensive, and what will happen is that that will immediately basically start f-syncing everything out to disk, right? So that we can get back right-ahead log space so that we can start accepting transactions again. So again, if you have questions, you know, please don't feel too afraid to stop me, but we do have a lot of things that we're going to go through. All right, so the Postgreaves Development Group provides a set of packages, both for Red Hat and for, in terms of RPMs, as well as Debian-based systems. Those are really the principal ones we have now. There's also the Windows ones. I'm not going to talk too much about that, because we're at scale, just saying. So both sets of packages from the community allow you to have multiple major versions of Postgreaves installed concurrently, and allows for smooth major version upgrades using the PG Upgrade tool, which is really quite handy. And all of the systems that I run today, this is what I use, right? So I use everything from the PGDG, because those are the ones that are up-to-date, they're well-maintained, they're supported through the community mailing list, and the releases of them are updated in coordination with the Postgreaves team. So whenever a security change or anything is coming out, we coordinate that with all of the different packages for Debian and all the Debian-based systems in Red Hat and all the Red Hat-based systems, as well as the Windows builds. So on a Debian or Ubuntu-based system, in this kind of an environment, it's app.postgreaves.ql.org, so you would add that into your sources list, you can use LSP release if you want to. And here, it's pretty straightforward, right? It's app.get, app.get, app.get, right? Done. And we have the, and then we're, we've got the version installed. This is with 9.4, but of course 9.5 is out now, so if you want, just changing that to 9.5 should work just fine, and that'll pull in the latest version. In terms of how the configuration layout works when you've done this, so all the config files for a given instance or a given cluster, I was saying, live inside of a x.y, so that's a major.minor version. Well, in Postgres land, that's a major version, right? x.y is a major version, so 9.3 is a major version, 9.4 is a major version, 9.5 is a major version, right? And then inside of that directory under FC Postgres, because you, with Debian, you're able to have multiple different instances installed as well, there's an instance name. So in this case, it's main, right? That's the default, so whenever you do this app get install, you're going to get an instance of whatever version, so 9.4 in my example, and that instance is going to be named main. All the data files end up in varlib, naturally, but it's varlib Postgres, and then that major.major version, and then the instance name. Binarys, so the binaries are common to all of the instances, right? There's only one set of major version binaries. Those live in userlib Postgres following that x.y slash bin, and then there's a bunch of wrapper scripts, right? So if you're ever wondering, you know, you want to go look at PSQL for whatever reason and you're confused on a Debian-based system why it looks like a Perl script, because it is, right? So it's a, it's actually a wrapper for PSQL, because this whole notion of multiple different instances installed on a system is not something that the Postgres utilities themselves directly support. So the Debian maintainers, which I think is fantastic, have come up with a number of wrapper scripts that provide the ability to have multiple clusters on the same box and be able to work with them very easily. Logs are pretty natural. Note that there's a difference between what I'll call normal operation Postgres logs and what are called the startup logs. So the startup logs are just kind of the bootstrapping of getting Postgres running has its own set of log files. You know, in a Debian-based system, all of those logs go into our log Postgres. It's a little bit different on Red Hat. I'll talk about that in a minute. And then there's one init script that starts everything on a Debian-based system for us. Don't ask me about system D. Yeah, let's not go there right now. So with these clusters, so on Debian, those different instances are called clusters, right? And so these tools are all, say, cluster in them. So PGLS clusters gives you a listing of all the different clusters that exist on your system. And that can be across different major versions as well as multiple instances inside of a given major version. PGCTL cluster is what you use for controlling Postgres. So on a system like Red Hat where it's not using these wrapper scripts, the command is called PG underscore CTL. I'll talk about that in a minute. But on Debian-based systems, you want to use PGCTL cluster, right? And that's what you can use to start, stop, signal the database. Generally, you want to signal it to perhaps reload files, reload the configuration files. Then there's a dash dash cluster option. So this is one of those cases that's kind of weird, right? So this dash dash cluster, that's two dashes, sorry, specifies which option or which cluster to work on inside of a Debian-based system. And here you can see I've got a dash dash cluster, not a four slash main. So the way you specify it is always x dot y slash and then the instance name for basically all of the tools. If you don't specify it, there is a default that's picked up and used, which generally is whatever the most, whatever is running on port 5432, which is the Postgres port, that's where the default is going to be. And here you can see that they also tells you what the data directory, what the log file is when you're using PGLS clusters. There's a bunch of different configuration options that are available. I'll talk about that in a minute when it comes to Debian-based clusters. All right, so now for a Red Hat-based installation. On Red Hat, you're using yum.postgresql.org and whatever RPM-based system you want to use. When you install it, it will go ahead and initialize the cluster. You can run multiple different major versions in parallel. It's a little bit ugly in my opinion, but it does work. This is how you can do a group install that pulls in all the different dependencies that you want. And then here you have to, sorry, on a Red Hat system, you don't get the database initialized. You have to do this in NITDB yourself. So that's what you do after you get the actual packages installed. You have to do your own in NITDB, and then that will create the cluster for you, create the instance, and then you can use the CTL to start it up. All right, so on a Red Hat-based system, everything, all the data files and whatnot, live in this valid PGSQL. So you don't have quite the same, it's not quite as convenient to have multiple different clusters. You don't get all of the cute wrapper scripts that you get on a Debian-based system. All of the configuration files that I'll talk about here in a minute live inside of the data directory on a Red Hat-based system, which personally I don't like, although that's kind of the way Postgres operates when you're using it, when you install it from source also. Binary is going to this user slash pgfql-x.y slash bin, which I find bizarre. And then the logs are in varlib instead of varlog. And there's actually two different locations, so you have a startup log that's here in this kind of major version directory, and then the actual regular operational logs are in data slash pga underscore log on a Red Hat-based environment. I will fully admit I'm not a Red Hat guy, so take all of this with a little bit of grain of salt. I went through it and did it, and it's where I found that everything, but just be aware of it. Oh, and you also need an independent niche script on a Red Hat-based system for each of the different major versions you want to run. All right, so when it comes to configuring Postgres, there's kind of really four major configuration files. There's the main, what I call the main configuration, which is Postgresql.com, and that's where you configure a lot of things like how much memory Postgres uses, how much, what port it listens on, all of that kind of information. But then when you want to get into, okay, the question of how do I connect to the database? What do I do, how do I get users connected? What are they doing? So to do that, what you have to use is pghba and pgiDent. So pghba allows you to control what authentication method is used for the user to connect. pgiDent is for mapping users between, you know, you may have multiple different system users. So if you imagine on a Unix-based box, you might have 10 Unix users that all map to one Postgres user. You can specify a mapping like that using this pgiDent.com. And then there's a pgLog, that's not a config file. I'm not sure what that is. I think I was getting at the point that inside of the Data Directory is a pgLog on Red Hat systems. So on Debian-based systems, all these configurations are in that Etsy Postgres directory like I was talking about. One of the things that to be really careful about when you're on a Red Hat-based system is don't modify the other stuff in that Data Directory. There's a lot of things in there that are really important like your data, and you don't want that to be destroyed. So be careful whenever you're working inside of the Postgres Data Directory and you're modifying things inside of it. So as I mentioned, there are some Debian-specific configuration files. These only exist on Debian, and these are for working with these different clusters. So there's a startup config that allows you to say if you want to have the cluster automatically started or whether you want to have it disabled or have to be manually started. PGCTL, which just tells what options to pass to PGCTL, generally you don't need it. Same thing with this environment option, the environment configuration file on Debian. I don't generally change it, but it's there if you needed to set some kind of environment variable before starting the database. There's also cluster configuration information inside of this common directory. So on a Debian-based system there's actually a Postgres common package that's installed along with all the different Postgres packages that get installed, and that provides this information about setting up clusters. So inside of the Postgres common directory you can have different settings for how you want clusters to be created. You can also control which clusters are the default cluster for certain users. You can actually have different users have different default clusters that they don't specify dash dash cluster all the time. Users can also set that themselves through the environment variables that the wrapper scripts will respect as well. And then there's a PGUpgrade cluster.D, which isn't really used very much, but it could be populated by extensions to handle doing upgrades from different across major versions of Postgres. So on Red Hat systems there aren't isn't quite as much in terms of Red Hat specific config files simply because Red Hat doesn't have all those same cluster helper scripts and whatnot. So mainly it's just the init scripts, which generally you don't need to modify too much. Thankfully some of the things that you used to have to modify have been moved out of there. You don't have to specify the port for example if you want to do a different port on a Red Hat based system. All right so I'm going to talk about configuration. So these are configuration items that are inside of Postgres SQL.cont. So this is kind of what you might want to do when you're first installing a system. This is for 9.4 and earlier. 9.5 is a little bit different in this area. Unfortunately this this deck hasn't been updated to to account for that. So just be aware of that. Read the release notes if you're looking at 9.5 and that'll cover what the differences are between the releases. The big one here is if you want to allow other people to access the system like on a Debian based system it'll start up with only listening on local host and you'll have to change listen addresses to star to allow people to connect in from remote. So of course on a Debian based system it starts up in a default secure manner. Checkpoint segments. This is what I was talking about with the right-ahead log. In 9.5 it's called min wall size and max wall size. So that's one of the big differences. There is no more checkpoint segments. And what that does it just controls how much of that right-ahead log space is allowed by you know Postgres is allowed to use. So again that impacts whether or not you're having to do checkpoints more frequently than you should be. Checkpoint completion target. The default is point five which means that if we have a five minute checkpoint timeout we're going to try to complete the checkpoint that is writing all of those changes out to the the heap files or the file system the data directory in two and a half minutes. My experience is that there's not really any point to waiting or to being that aggressive I should say. So you might as well just go ahead and set checkpoint completion target to point nine which says okay so across five minutes you know use like four and a half minutes or so to complete the checkpoint during. I mean in my two cents we ought to get rid of this option but you know for now at least set it to point nine. Effective cache size. So effective cache size is something that is not actually ever allocated. What it's just used for is it tells Postgres it gives Postgres some idea okay how likely is it that the pages from this particular relation this particular table or index how likely is it that that stuff's in memory right. If it's really likely that it's in memory that's great we're probably going to use an index-based scan if it's less likely to be in memory we may move over to doing a sequential scan instead. So generally I would tip this to about half of RAM but you can actually look and see you know okay how big is your Linux file system cache and if that's the only thing it's being used for you can just set it to whatever that is. Max wall senders is set to zero by default because it does take up a really small amount of resources. I generally send it bump it up to three because you have to restart the system in order to change it so I find that it's better to get just go ahead and get that set up ahead of time that's what allows you to do things like PG base backup or to have a follower right like a replication system. All right we're doing logging. The default logging in Postgres sucks. This is what I do in terms of just kind of real basics for an initial setup so logging every connection to the database every disconnection those are pretty obvious log lock weights is actually not as obvious but it's ridiculously helpful. So whenever the system is stuck between two different processes due to a lock right and it could be any kind of a lock it could be a table lock it could be a row level lock. You know after a few seconds what we're going to do is run what's called the deadlock detector right the deadlock detector is going to say okay these processes have been stuck for a while are they deadlocked right and assuming that it's not deadlocked which generally is what you are hoping for Postgres will then if you have this enabled kick out a message saying hey I've been sitting here waiting on a lock for you know a minute just FYI right and that can be indicative of someone's gone in and started a transaction and acquired some locks and then is just sitting there right idle in transaction not doing anything and those are people you want to go you know over to their desk and kick them in you know kick their chair and say get out of your transaction so we can have processes move forward again. Logman duration statement is important because this in milliseconds what you're specifying here is okay if a query has gone longer than this amount of time I want you to log it so I can go review it and figure out why that query took so long right 100 milliseconds is a really long time especially when you're trying to load web pages for a database to be responsive you really want databases to be more responsive than that log temp files is really handy because in the operations of Postgres there are things that we do when we need to use a lot of they can end up using up a lot of disk space like a sort right so we have an on disk sort that's happening what'll happen is if you have log temp files set to zero then any file that we create for temporary purposes like doing an on disk sort for example will be logged that can also be very indicative of why a particular query was slow maybe I had to go you know maybe post it as a sort two gigs worth of data right that's going to take a little while um for that particular query and then log auto vacuum in duration setting that to zero basically says okay I want to log every single thing that auto vacuum does which I find very helpful as well because a lot of cases you know you see some kind of performance impact and maybe it's auto vacuum kicking in this will log all that information for you about what tables water vacuums and vacuuming and what it's been doing there's also a really important thing called log line prefix which you would think would be set to something reasonable but it really isn't so this is what I use it's pretty complicated the slides will be posted I'm not going to go through all of this since we have a lot to cover here but that's kind of what I end up using collects all the important information one of the new ones is application name that's one that you may want to really look at and see if you have if you're already running postgres you may not have application name logged that's helpful all the postgres applications will set that for you you can set that for yourself also whenever you're connecting to postgres through libpq you can set your own application name during that connection string and then that information shows up in the log as well as in some of the informational tables inside of postgres all right so here's pghba in a nutshell um everything's red top to bottom so here we have the different uh you have different connection methods here on the left so local means a local unix socket you then have a you can specify a database where you can specify all databases the user and then the the method um if it's over the network you have an address setting and then there's a bunch of different options that you can set addresses can be either v4 or v6 they can include site or notation if you want um there's a special option called reject which if you hit that when you're reading through the file means postgres will just kick you right back out um and so that allows you to kind of have cut out inside of your postgres config your postgres pghba.com in terms of the actual different methods that are listed there these are the different methods that exist inside of the uh that are supported by postgres pier is what i'm talking about when i'm talking about unix socket so that we're just going to ask the kernel who is this person that's connecting we're going to get back to unix username and that's what we're going to allow you to authenticate as um and then you can map that to multiple different users using that pg ident config file like i was talking about gss is is kerberos um it's also uh sspi on uh on windows so if you're setting up a pghba.conf on windows you want to use set it to sspi but if you're using it on a unix box the unix box which hopefully everybody is you're using gss um works really well with mit or heimdall uh kerberos i've i've played with both and been happy with it you can also integrate it with active directory so active directory is actually kerberos underneath you can have a linux box join to your active directory domain or you can have a independent realm that has a trust relationship between your mit or heimdall realm and the um windows realm and that allows you to do single sign on with postgres which is really really handy uh there's also the ability to do a certificate based installation or certificate based certificate based authentication um this is using ssl client side certificates so and when you're doing that inside of your pg ident what you're mapping there as the system username is a common name and that's what you use to map to whatever you want whatever you want that user to be allowed to log into on the postgres i whatever postgres username um one of the thing is interesting that i've never played with to be honest is that or that i've never done before but it works just fine is that when you're talking about gss and kerberos the system name is the user at realm so it'd be like you know frost at mit.edu or something like that you can have that as a postgres username inside the database if you want you don't have to map it to a simpler name i typically do but you don't have to you can set it to just be that and i've seen people do that and it's kind of interesting and works well in in large environments where you have a lot of different people across a lot of different realms these are some methods that work just fine probably most of what you're using um i don't like them i much prefer the stronger authentication method to discovered uh md5 is your stock kind of username password that's probably what most people are using um it it works it's not what i would recommend but it does work pan is complicated so the reason for that is that postgres doesn't run as root so you have to set up things like sassel opd um if you want to have postgres be able to authenticate users using pam um or using you know using your your password files at tpathw and at tshadow obviously with that and with md5 really you want to be using ssl encryption i would say just generally you want to be using ssl but that's just me we also support radius um for anybody who's got some old school stuff that's still doing radius authentication um postgres now has support for that and then there's also the traditional password base which is just like uh you know password with ssl is like open ssh right i mean you you feed the password over to this well with password based off you feed the password over to the server and the server checks it against your password files and and you're in right so that's that's how that works um these are methods i don't really like um lbap is it gets a bad rep from me simply because i know that if you're running lbap 99% of the time you're running active directory you should just be using kerberos with active directory instead of using lbap authentication um there are some environments where yeah you're running open lbap and you actually have an lbap and you want to be doing binds against it for authentication purposes but the other problem is that again you're passing the credentials of the user through the system right whenever you're using uh lbap or password or md5 the user's password gets exposed to the server it has to um with kerberos and gss api that doesn't happen which is one of the really nice things about it same with ssl based certificates ident just shouldn't be used and trust really shouldn't be used either they're just not they're not secured by any any way shape or form in my opinion i know some people have used trust in the past because they think it's faster than md5 maybe it is but i wouldn't recommend it at all um because it literally bypasses any authentication and potentially allows you to login as any user so just be aware of that so if you have trust configured you should go change it all right pgi den so here's what i was talking about before where you have a map name that's what you can specify in your pghba so you can have multiple different mappings depending on the authentication method that you're using and then you have what's called an off user or a system user and then you have what your postgres user is so for example here this joe would be uh a joe that's installed on the unix system right for pure based um authentication with peer map and that user on the unix file system can log in is bob realize that joe can't log in is joe with this configuration right if you want joe to be able to log in with you know as joe you have to specify another name you know another row here that allows that or you can use a regular expression so here's a regular expression for the kerberos mappings so what that regular expression does is you you know whatever you want your match to be that's what we're going to allow over here on the on the right so if i'm s frost at snowman dot net as shown here this backslash one with this radiox will be s frost that'll allow any user at snowman dot net to log in as their short name to the system so this in this particular mapping also allows me to log in as the postgres user so here for a certificate mapping like with certain name this would be my common name inside of the ssl certificate that i'm using to log into the system and that allows me to log in as this s frost postgres user and that's how you specify map equals whatever map you know whatever map name you want to use on the authentication line inside of or on the method line inside of pghp.conf all right next question so now we're going to talk about running postgres so big question is always you know is postgres up right so there's a couple different utilities to use you can use uh just the service command to check status and that'll you know tell you like on a devian based system so there's multiple different instances and for each instance it'll list whether it's online or not there's a pg is ready command that's available that you can use um and it is supported by the wrapper scripts in devian so you can pass the dash dash cluster option to it um and of course you can always just connect with psql um which is very handy the pg is ready is also available on the rpm based system but it doesn't accept the dash dash cluster option all right so using psql so if you're working with postgres a lot a lot of what you're probably doing at least if you're a command line guy like i am is using psql right there's a number of commands that psql takes all the commands that psql once you're inside of it so this is like you're inside of the psql shell here right so inside of that shell anything that starts with a backslash is going to be a command that's interpreted and handled by the psql binary anything else is like just sent to the server and whatever the server comes back with psql displays right that's essentially how it works so the big thing is you know backslash question mark is uh really helpful to get a whole listing of what the different options are uh backslash h is ridiculously helpful because backslash h what we do with postgres is all the postgres documentation has the syntax for everything that the postgres server accepts all the sql that it accepts all of that syntax is then stripped out of the documentation and included with psql so you can do backslash h select right or backslash h insert and you'll get the full syntax for that command it's very very helpful you know control d your backslash q to exit any queries that return information will be displayed to you and backslash x can be used to do what's called an extended display where you're getting a separate line for every column in the data set so for example this is what backslash x looks like um table is a really handy command if you're not familiar with it that's the same thing as select star from it also accepts limit it's very very handy and allows you to have you know it's just shorter and simpler pg stat activity is a table that tells you what users are logged into the database right now and what they're doing so here i've got uh expanded display on so each line is actually a column and each record starts with a record header like this and then this is all the different information that you have inside of the database that you can access through psql about the users that are connected such as you know what users connected what their application that they're using is if they're using one of the applications that sets that information you know the client information so here there's no client address because i was connected in over a local unix socket when their process started uh if they're waiting as an indication of whether they're waiting on a lock or not um the state is whether what they're doing right now if they're running something or if they're idle or if they're idle in transaction and then query is actually whatever the last query they ran or the current query that they're running is sorry so that's what that's what the query ends up being all right so if you want to look at what databases exist backslash l gives you the list of databases i'm not going to go through all of the different psql commands just because there's a huge set of you know slew of them but uh that's what you can see with backslash l tells you the different databases pretty straightforward yeah you when you are when you run psql you go into a shell right a psql shell and that's where you can issue backslash commands and then do selects and queries and whatnot if you want to send a command to psql from bash right you would do uh dash c um is the option that you pass to psql you can just do on the command line on a bash line you can do psql dash c you know and then double quote select star from whatever end double quote right and there's a whole bunch of different options there for uh dealing with formatting dealing with table alignment dealing with headers and footers so you can you know you can strip everything out and have like a select query that runs underneath the bash to run psql and just gets whatever that data element that you want out is which is you know a lot of times what people are doing and then you can shove that into a bash variable if you want works just fine yes it's a toggle yes yeah backslash x once you've turned that on it'll stay that way forever until you turn it off yes backslash question mark is you know i think i'm not sure what you mean by what command you're looking for so what i would suggest if you're not sure what the definitions of the commands are like the queries that you're talking about i would say if you do just backslash h you'll get a list of all the top-level commands that postgres supports and then once you've got that you can go to the postgres documentation and look it up all right so templates i'm going to cover this really quickly because we're already i mean we're already at 40 minutes and i've got a bunch of slides left so templates are are something that you can create and then if you do a create database using that template we just basically copy all the files from that templated database into your new database so if you're creating lots of databases for whatever reason and you have like a a common set of stuff you want to be in all of those databases create a template database fill it with that stuff and then whenever you create a new database just specify with template to the create database command and then that'll be pulled over all right creating users is very straightforward it's create user on the command line or you can do create space user inside of psql itself so this is inside of a psql shell and then you can set a password and whatnot pretty straightforward user privileges so a lot of different user privileges that are really important super user is a big one don't give it out to just anybody it allows you to do lots of different stuff you can access the database in ways that you really shouldn't be allowed to so be very cautious with that one create role is more powerful than just being able to create roles because it allows you to modify roles as well so that's one that you don't want to just give out to anybody login is you know by default if you do create user you'll get the login privilege if you do create role it's like a group so you won't have the login bit so if you've done a create role and you don't know why you can't log in is that new user you created it's because it doesn't have the login rights you need to go change that on the system and you can do alter user to change that information and if you're curious but how to use alter user you do backlash age alter user and then you'll get that information I'm not going to hit on the rest of this stuff but there's also replication for for changing you know for what users are allowed to log to connect into one pg base backup or you know be as a as a slave system connecting in to pull data down to be you know like a hot standby or read only replica all right roles so roles are kind of complicated in postgres roles are roles users are roles groups are roles everything's a role inside of postgres so when you were thinking about how you want to do a group membership you know use create role set up your groups that way set up your users with create user that way that login bit gets set the right way and then when you want to add someone to a group that's using the grant command so you say grant joe to or sorry grant you know whatever the group name is so grant admins to joe right and that allows joe to now have whatever the rights of the admin role have inherits really really important what it allows you to do is that it allows you to automatically inherit the rights of that role so for example with the admin role in fact I'm gonna go over here so with the admin role you can set up you can set it up as no inherit and that means that when you connect in you don't automatically get all of those privileges instead you have to actually do a set role to it so this is how I like to set things up for kind of a pseudo like mentality note that it doesn't ask you for a password there's no way to get postgres to ask you for a password on a set role right now I think that may change one day but don't get your hopes up so you create this role admin with no inherit grant postgres to admin and then create this user joe and grant admin to joe so now joe can log in he's a regular user doesn't have any particular you know special rights but he can do a set role to postgres when he's ready to so that's how and then yeah so that's how that works all right grant and revoke so this is going to kind of cover what I recommend changing on postgres so by default postgres comes with what's called a public schema that anyone can create objects inside of I don't generally recommend that because that schema is in everybody's search path so it can just get obnoxious by default so I would I always revoke create on that schema whenever I set up a new system and always have per schema or per user schemas instead so whenever I create a user I'm going to create a schema for them also and then they'll have rights on that schema and they can do whatever they want with that and then if I have something I want to create to have available to everybody on the system I'll go put that inside of the public schema these are the different ways in different options for privileges on a system depending on what type of object it is there's different options that are available and you can go all the way down to individual columns when you're talking about select insert update rights functions are important to think about functions with whenever you do a create function by default everybody's allowed to execute that function assuming that they can see the schema with that the function is inside of and then you can also have what are called security definer functions those are like functions with a set uid bit right so whatever you call that function the function is going to change rights that it's running as over to whoever owns it and whoever the owner of that is and then that's who the rest of the function's going to run as so be aware of that that's how security definer functions work inside of Postgres all right so I covered a little bit of this already default permissions for execute for functions rather everything else is basically secured by default right so you create a table nobody else has rights to that table until you grant them access to it unless you set up default privileges on the schema so you can set up a default privilege for either a role for a schema for a role inside of a schema and that allows you to set it up so that when you create a new table it will automatically have some set of grants that are allowed to it you can also use grant on all so if you want to grant some access to all the tables in the in the schema you can use that command if you're wondering how big your database is this is pretty straightforward this pg database size is a function you can call pass the database name and then pg size pretty just makes it look nice you can look at the size of individual tables using pg total relation size note that that particular command will actually include the size of indexes and the heap and the toast tables and everything so that'll give you the total size that can you know that could cross over different table spaces of course so both of these will go across any table spaces that are involved and collect up all that information for you if you look at the size of the individual table like you don't want to look at how big the indexes are associated with it you can just use pg relation size and that is what that'll give you that'll give you just the size of that relation and not include the indexes so here's a you know handy little command if you want to look at you know what's the size of all the tables inside of a particular schema you know how much how much disk space is this user using up right this is a way you can do that assuming that you have four user schemas all right creating a table space is kind of tricky there's a bunch of options you have to set up and things you have to get right um some of the big ones are you have to have created the directory already it must be empty permission's got to be 700 on it and you have to specify the full path to the directory when you're creating a table space do not ever create table spaces inside of your um data directory in postgres don't mess with the data directory at all create table spaces somewhere else okay postgres is you know feels like it has control and domain over everything in the data directory don't be creating table spaces in there i've seen it too often and i just have to bring it up um also i strongly recommend you don't use a mount point directly right create a directory underneath of that mount point and then use that as your table space um in my opinion that's just best practices but i think there's i think there's some good reason for it also if you want to get information about the table spaces that's backslash db on your uh at your psql command line and that'll give you the location these ones don't have location information because they live in the postgres data directory so that's like varlib postgres aql 9.4 slash main on debian and then you can use pg table space size to get the size that postgres thinks of all the objects inside of that table space dropping a table space you have to have the table space be empty before you can drop it there's some handy commands that allow you to move sets of objects inside of you know from one table space to another um but you may have to actually connect to multiple different databases to move all the different objects or to drop all the different objects and once you've dropped everything inside of the table space or moved them out then you can actually go and drop the table space itself all right file-based backups so pg-based backup with wall receive is really really handy for doing simple backups that does require that max wall senders i talked about before it's one time and it does the whole database so there's no incrementals or anything that are available with that um which isn't great and you have to have the right ahead logs that um were done uh were created during that base backup in order to restart the system so when you run pg-based backup there's an option there to stream or to pull down all the x logs make sure you use that i recommend the streaming option um jd to talk yesterday uh for those who weren't here sorry but he did talk yesterday about other back you know more about backing up the database and using pg dump um i'll talk about that here in a minute but one of the things that's also important is that this includes all the data files and all the indexes and everything so when you do a restore you don't have to recreate indexes or anything you do when you're using something else like when you're using pg dump for example so the other uh main one is using logical base backups this is where you use pg dump and pg dump all um these end up being essentially text-based uh dumps they don't include any indexes or anything the entire system has to be repopulated all the indexes have to be recreated when you're doing a restore so it's kind of painful all right backups don't work unless you restore them right so just best practice make sure you're testing your restores make sure you're actually doing restores and making sure that they work consider multiple different options for how to do a restore um if you're doing a restore with pg-based backup it's pretty straightforward pg-based backup create the tarball for you all you got to do is extract that tarball all the data should be included if you did the wall options like i was talking about all the wall files should be there as well um and that'll have all of the data since the you know uh any data that was created after the backup will be gone right unless you save the right ahead log and can replay the right ahead log uh you talk for an hour just about backup with postgres so if you have questions about that feel free to ask me all right pg-backrest is a really really nice backup utility it's available here github.com pg-masters backup um it's for file-based backups and it handles all of the craziness about right ahead logs and everything else is handled for you um you can do and it also supports full point-in-time recovery it's parallel you can do differential and incremental backups really nice utility if you're looking at a large uh system i strongly recommend looking at pg-backrest as your backup solution um these are the settings that you need to set to enable inside of pg-backrest you can look or instead of postgreseskill.com to use pg-backrest you can uh find all that information from the pg-backrest page uh web page as well i'm not going to cover too much of it um this is how you configure pg-backrest so if there's a etsypg-backrest.com file there's a repo path which basically says that's the path where all your backups and all your right ahead logs goes to and then you need to tell it where the database lives and then you can have multiple different clusters in that are supported inside of one configuration file with pg-backrest that's what the that's what main is here so main is my just like the 9.4 slash main that's what main is here inside of pg-backrest uh there's an info command with pg-backrest which tells you information about the last backup there's also a json version of this that gives you lots more data that you can then extract and parse and stick into a nagios check if you want and then speaking of nagios check postgres for monitoring so monitoring is obviously a really important aspect of of database systems check postgres.pl is a great monitoring tool for postgres integrates with nagios you can get it to dump out statistical information and have it be you know graphed with pretty graphs and whatnot through nagios and with isinga or whatever you're using works pretty well also supports custom queries which are really nice these are the ones that i typically play with in terms of a minimum recommended set of checks to be running with check postgres sorry uh monitoring log files so the same person who wrote check postgres also wrote tail and mail so if you have log file you have postgres log files you do right it's nice to be able to monitor them and make sure that you're tracking any errors and whatnot i like to use tail and mail to do that um it's pretty straightforward and it works quite well you can also reconfigure postgres to use what's called the csv log csv log allows you to dump out the information from the postgres logs into csv files and then load those back up into the database um and there's lots of other options for doing your own rotation if you don't want to use log rotate all right configuring and tuning postgres never seem to have enough time for this section so shared buffers is this massive pool of stuff of memory that we allocate whenever we start up and that's where what's kind of the postgres local cache right so that allows us to avoid going back to the kernel and going back and forth whenever we need to get data off of uh off of disk or out of the cache so it's really tricky to tune right um if your working set is less than the amount of memory you have in the box i generally recommend setting shared buffers up enough that you can set your and have your entire working set in memory or maybe your entire database in memory if it's small um and that works out really well if it's larger than that i find that it can be helpful to actually have shared buffers be less significantly less down to the point of being only maybe a gig right and what that does is that means that we're going out to the kernel a lot more but it also means we're doing less double caching right so whenever we have our own cache and the kernel has its cache there's a risk of double caching there right which just ends up wasting memory that memory could be better used for running queries and doing other things potentially work mem is the setting that we use to figure out how much memory we're allowed to use with this query so it's more complicated than that but just realize if you up work mem more queries will be allowed to use more memory for doing things like building hash tables or doing sorting so it can be really really helpful uh the default used to be one mag we just up that a little bit i think it's like eight now so which is better than one but it's still pretty freaking small if you ask me so i tend to bump this thing up pretty well the only downside to it is that if you have lots and lots and lots of connections you run that risk of running the big system out of memory realize that postgres is actually very good at handling issues when it comes to memory all right i got five minutes left so it's actually not like the end of the world if postgres ends up um using all of the the memory in the system because it'll get a memory error and it'll clean up everything very nicely so it's not it's not like a huge problem don't run out of room for pgx log though all right maintenance work mem is what we use to create indexes um so you want to bump that up whenever you're creating indexes but realize that we're going to use all of that memory so just uh be careful from that regard but it may can make index builds go much much faster because we have to basically sort everything i talked about effective cache size of fair bit already so i'm not going to cover it again auto vacuum that really washes out the screen i'm sorry about that so auto vacuum um the defaults generally are too low in my experience for a high transaction rate system usually what you want to do with auto vacuum is actually make it more aggressive not less aggressive right what you want to do you want to turn it up get it running more frequently because those more frequent runs will end up running for less time right and it'll also be cleaning things up on in a more expedient fashion so these are some different ways you can do that decrease the cost delay or just completely turn off the costing model um and increase the number of workers definitely if you have a lot of if you have a very busy system increasing the number of workers allows it to work in parallel across more systems across more tables uh managing connections you can bump max connections up i have some folks that were doing it up to as much as like 500 it's really too much in my opinion um really where you want to have max connections is you want to have the number of connections that are actively working in the database along you know on par with the number of cpu's you have in the system generally speaking that's the most efficient way of running a database server one of the ways you can do that is use what's called pg bouncer right pg bouncer allows you to have uh is a connection pooler which allows you to have all your applications can connect to pg bouncer and then pg bouncer will manage those connections to the database and will basically multiplex on top of those database connections so it's a very effective tool pg pool also works um watch for idle and especially idle in transaction processes those are bad because it's idle in transactions that means that there's certain processes that we can't move forward with like auto vacuum and vacuuming and cleaning up dead duples almost done folks managing locks um so generally speaking if you have a lot of objects in your system and you're using a pg dump based solution pg dump has to actually connect and then lock every object that it's going to dump out and that can end up being expensive so a lot of times i'll actually end up bumping this value of maxbox per transaction up some so that there's enough locks available for a pg dump based backup um a heavyweight lock is required for every object that's uh accessed by a given process when it's connected so not just pg dump but your applications as well generally your application they're accessing that many different tables at a time so it's not that big a deal but be aware that we do create we do take out a lock i say heavyweight here that's a postgres terminology thing heavyweight in this sense means that it's a full level lock it doesn't mean it's actually blocking anything right access share locks don't block anything except if you want to like drop the table or something right checkpoints uh i talked about this a little bit earlier but just make sure that your checkpoints are happening due to time and if you have that log checkpoints option turned on like i was talking about earlier every time we do a checkpoint it'll say why it's starting and it'll say when it's finished and it'll give you a bunch of stats about how many pages it had to write out how many right ahead logs there were that kind of information the big thing is make sure it's always starting based on time if it's starting based on x-log you're writing more x-log then you have your system configured to be able to write out to disk in time so you want to make sure that you you basically that means bump up checkpoint segments generally speaking um the larger the checkpoint timeout a lot of people change checkpoint timeout because you reduce the number of full page writes um and it also does some other things that are nice in that regard but the higher up it is the more right ahead log that you may have pending when your database crashes and has to restart and what that means is that the database will take longer to restart because it has to replay everything from the last checkpoint so if your database crashes and you have checkpoint timeouts at the five minutes it shouldn't take more than five minutes for us to restart if it's at the 30 minutes it could take as much as half an hour to restart uh pg badger is really helpful and handy for uh generating reports about queries so if you're curious about what the query time information looks like against your database um i strongly recommend pg badger there's some configuration options you can find it all on the web all right i have five seconds for questions i will be at a bar somewhere or you can find me after this if you have questions i'll be happy to chat thank you uh the slides will be up i'll put them up on the wiki here in a few minutes