 over to you. Oh, this is actually installing open shifts and stuff. I'm going to get rid of that. But I'm going to turn it over to you here because you're going to talk about how you built the Docker container? Okay. So I'm going to change this to building Docker. And then should I leave your – so here's the post – is that the CPM thing? Do you want to give the architecture first? Yeah, let me give you the – I think that – and then I think I'm going to delete the commons slide. Should I leave it? I think that's way too much propaganda. Okay. So I need to get rid of this. I'm going to leave upstream first because you're an upstream first company as well. I might do it on – oh, it's charging. Okay. I'm the first to – so I tried the secret one. That one doesn't work unless you've already defined the secrets, right? Yeah, you have to define the secrets. So this one works with – you don't have to define the secrets first with the environment very well. That one just worked out of the box. And it worked. Yeah. Oh, so actually I should open – that's the other thing I should have open here. If you don't mind is I will go here and I will go github.com. And then it's crunchy data. And then you're just the first one there. Actually, I'm going to go here because then – okay. So now I need to do my font checks. Yeah, you can – I'll unplug in a second because it worked to do – and then you'll try yours. Yeah, it's actually online now. I should actually put a link into the slides. Hold on. Just give me a sec though. I want to see if I can turn off the lights in the front. So that font is big enough if I can – is it okay with everybody if we leave the lights like that? Because that gets rid of all the glare on the – it makes the slides easier to see. You want to take the video cable and check in? No, just take it. I'm done. Do you need the network cable? Okay. I just want to keep that. Go ahead. Yeah, just pull. And if you want I'll walk to the back of the room. I just want to put – I just got a Windows 10 box as well. Yeah, are you going to do any command line stuff? Okay. Well, this is kind of small I bet. This line right here. Let me go to the back. Not that I think we're going to fill the room anyway. But yeah, that line with my old eyes is kind of – it's hard to read anything on the left. And it's hard to read. So if you can increase the font size for that. Is there a way to? Oh yeah, just control plus. One more. That's good. Is that okay? Can you see enough? There you go. Yeah. That's much better. Okay. Should I take the video back? What time are we at? Oh, we got 20 minutes. We got some water. Could you buy that? Maybe we should get rid of that then. Unless you want to drink somebody's backwash. You're not really interested in the backwash? On the street? Not so excited about that. There's in the back. I'll go take it there. All right. So I got that. Got that. I don't need bit.ly anymore. Got that. I got that. Oh, I need to make a new project. I was going to – it's okay. All right. I think that – okay, the mic's on now. All right. So the questions – all the other questions that were asked – I'm looking at the camera now. All the other questions that were asked so far, meaningless. You didn't need to hear this. The question that this young gentleman just asked is where does all the magic happen that allows – in normal streaming replication, there's like SSHGs and all this other stuff that happens. How is that happening here? And that's why I said, OpenShift doesn't know any of that. That's not part of the OpenShift part. OpenShift just knows how to spin up the replicas. It's the Docker image builder, Jeff, who made it so that when they spin up, they could do this kind of stuff. So now, Jeff. Yeah, this is really simple in that there's no SSHG generation required for this type of postgres streaming replication. The only thing – because it's using just the postgres protocol. So what happens is when a slave connects to the master, before that happens, all I do is execute PG-based backup and create that replica for that slave container. And then, at that point, just start up postgres. And it's in a replication mode. The master is doing a protocol with that slave to establish that replication protocol. So there's no SSHG generation required in these examples. The question was, maybe this is ignorance on my part. I've got to get that part in, because it's important. But so there's no encryption between the master and the replica in this case. And then, Jeff said no. But then, Josh looks concerned about that statement. You're not using an SSL connection? There is not an SSL connection between the slave and the master. I'm just using a straight SQL connection. It seems easy to fix it. It's not currently doing that. But as soon as Jeff leaves here, he promises me that he will make a new Docker container. It sounds like it's an easy fix. You would basically just have to put the search on the different machine. The Docker image itself. I believe postgres gives you that option. But I didn't choose to do that with this example. But the way I would fix it is I would go into the bash script in that container. And whenever it goes to set up the connection, that's the point where I could tweak the configuration of the replication. By default as well, this is using asynchronous replication instead of synchronous. So those kinds of configuration changes are possible either by tweaking the way I did the bash script, or I give you another avenue with this in that I let you mount a local volume that contains your own postgres.comp and PGA HBA comp files that are local. So you can even override the entire postgres configuration using a set of external files. And that's just a Docker volume, a config volume that I let you have the option. You can either mount that and put something in it. And if something's in there, it'll use that. Otherwise, it'll use the one that gets pre-generated. The environment variables that Steve pointed out are another way of you tweaking the configuration that is generated by default. But you have the option to completely override the postgres configuration using your own local config files. Yeah. It's not public. By default, the only traffic we route is HTTP traffic. Yeah. Oh, right, right, right, right. Yeah. Typically, you would mount the certificate store as a Docker volume so that you've externalized all those keys. And we also have one of the volume types that we've also put in. It's an OpenShift. It'll probably get pushed back up into Kubernetes soon enough. It's a secret volume. Right. So you can mount that volume. Everybody can make a claim to that volume and get those secrets inside. It's a separate volume claim. So we have that as well. You know our secrets now? Yeah. Yeah, there's probably. Yeah, we'll leave now, sir. Yeah. I'm just kidding. So he said something that was inconsequential. I know you couldn't hear it, but it was... The young gentleman here was also making any suggestion about how we might want to tweak it so that the path was an environment variable rather than hard-coded images. I want to keep going. I don't know where we are on time. Oh, we have a half hour. Well, that's still a lot of time. We're good. So now this young gentleman in the front, you couldn't hear it because I didn't repeat it, asks, what happens when you scale up to three? Funny you should ask that question. Let's do that. Look at the screen. Oh, I know why. Because I increased the font so that you can see that. There we go. Overview. Ready? Here's how hard it is to scale up to three read replicas. What we're waiting for here is these are docker images. So what has to happen is we have a master, and then we have a bunch of nodes that actually run the docker images. If that node has not pulled that docker image down already, it's got to pull down that docker image and then spin it up. So we're ready at two reps, pods running. Or did I just go to two? Or did I not go to three? Okay, there we go. So now I've got to wait for it to scale up to three. So that's what we're waiting for right there. We're waiting for... The object has been modified. Oh, there, two. I've never seen that error before. Welcome to live demo. I know exactly. Is there any way of public open shift to ensure that my replicas end up on different physical hosts? Yes. So the question was, is there a way that I can actually make sure that my replicas end up on-stair? Now we have three pods running. So that's all it took to get your three running. I'm going to save it in a bit. If I answer your question in a second, Josh, just let me, I just want to show one thing. Here's the two that just spun up. We have to now spin up Postgres, right? So I want to show the logs while it's going. So that way, because I can't do this. I can't show that it's replicated yet until that's actually finished. The question that Josh asked again was, is there any way to make sure that they end up on different posts? There can actually be controls through your deployment. Actually through the cluster admin, they can set policies for how the deployment of Docker images happens or how the deployment of containers happens. So you can do things, it's actually a set of rules. You can have it as simple as disaffinity. I want you always to pick one that doesn't have one on it already. So when it goes through all the nodes to say where should I, this is the scheduling part that we talked about. The Docker doesn't handle by itself. So you say, oh, I want, there was three replicas, okay? I've got four nodes. This one's already got a master, or this one already has a slave. Where am I going to set up the next one? One of those other three. I've already spun up the second replica. I'm going to, I want to put up the third now. Where am I going to put it? I have to put it on one of the other ones that don't have it. So that's a simple disaffinity rule, but you can also set things like, but I all want them to be in the same data center, right? Or I want them all to be in different geographic data centers. So do a disaffinity rule for data centers. You can also say I want the one that is not on the same hose that has the least amount of CPU and the most amount of memory. So it'll go through and start finding nodes and then do rules based on that. So it should be up now, right? Did we watch it do it? Yeah. So now I'm on another one of the new ones. Here I am again. Can you see that in the back? Once I get to it? Yep. They're really talking to each other. Here's the master again. Copy and paste again. That's probably pretty neat. Can you see it now? Copy and paste not working? Is the control ship being here too? No. How, I thought, oh no I didn't. Let me try one more time. Nope. Okay. You guys get to watch me write SQL on the slide. Thank you, pair programming. Love it. And I don't even sit next to you because that's the best part. There. Done. So I inserted it into the master. Now this is the part where Jeff gets nervous. Okay. There. So that's all set up in life. So now I have... Okay, that's enough giving him attention. Don't reward him for his insubordination. Okay. So if I go back to this again and make this smaller. First off, this is where I think... I may not make it small enough. One more. There we go. One of the things I really want to point out again here is I would never be able to do this if I was trying to score. I might be able to do this after like five days worth of Yacht Shaving and I'm sure I would set it up wrong and I would probably give up in the end because I'm not going to spend five... I never know what I mean when I say Yacht Shaving. This is a Linux conference. Really? Okay. Well, I think I'm on a BSD list, I think. But Yacht Shaving is the idea... I'm going to tell it out for everybody since we have time. Yacht Shaving is the idea that you want to check your mail at the end of your 100-yard driveway and it's the winner in Manitoba. So it's freezing outside. So you're like, oh, I better put a sweater on before I go get my neck. Oh, I don't have a sweater. Okay, I got to get a sweater. Oh, I can't go to the store. I got to make a sweater. Do I have any yarn to make a sweater? No. I guess I better go spin some yarn. Do I have any wool? No, I don't have any wool because I get... I better go get some wool. Okay, I better go outside to change my Yacht. And so you spend the whole day shaving your Yacht and you still haven't gotten your mail. Right? Which is a typical problem for developers to like. Right? I really just want to write an app that has a database and a web tier and I want the database tier to be replicated. Okay, well, spend five days setting up a Linux VM and then you're going to start playing around with first installing Postgres and then you're going to play around with how you're going to... And if you're five days later you're frustrated and your boss is coming to you and saying, where's that project I wanted? And you're like, well, I'm still monkeying around with Postgres 5. And that's when they say, use Microsoft SQL Server because I don't want you spending all this time monkeying around with your big boss. And then you're sad and then you quit your job. Or you become an alcoholic. Yeah. Okay, when you created the other replica did it copy the data from the first replica or from the master? So the question was, when it created the other replicas, did it copy the data from the master or the first replica? Oh, or from the first replica? No, this is typical like master replica replication where the new replica comes online, it's checked in with the master and it says, where should I be? And then it proceeds to read from the master and make itself like the master. Is that correct? That was just a fact check from the guy who asked me that. I like that. Yes. We're not going to get into that other inferior database that begins with the API and ends with the SQL. We're not even going to mention that on whole... That's a great project. I've been a PostgreSQL user for 10 years now and I understood why people use my SQL because it was really easy to set up. This is not Red Hat Seat at all. This is off myself because it wasn't even... But I just never saw a reason to use it personally. I just never... I wanted transactions and I wanted sword procedures and I wanted triggers and I wasn't willing to give that up and it was never that much slower that I needed to give it up. I'm not Google scale or whatever scale everybody thinks they need to be at. Yeah, do you have a question? In this scenario? Bad things. Right? There's a network disconnection. There's a network disconnection. Nothing like that ever happens in the real world. So the question was, what happens if the master fails, right? Or the master gets disconnected. Jeff? Yeah, right now there's no automatic failover happening for you. You're going to have to, as a DBA, you're going to have to get in and trigger a failover. Wait, but the master would re-spin itself back up. Yeah, that's true. So if OpenShift can see that it's down, because there's lots of... We used to say going down, and I used to mention something very particular, like about what it meant that the database was down, but there's actually lots of different states of down, right? So if the OpenShift cluster or the Kubernetes cluster can't see, if that replication controller says, there's supposed to be one of these and it's not up, it will spin up a new one if it can't see it at all. We also have liveness checks, like a ready check. I don't know if you've defined one, but OpenShift actually has the ability to say, hey, I know you're up, the container is actually running, but are you actually working? So you can define a probe that the cluster will keep hitting and saying, is it working, is it working? If it's not working, it'll spin up a new one and bring down the old one. But I don't know if what will happen in terms of if I bring up a new master with no data to hit it. What I'm assuming is going to happen, because this is the problem with using an empty directory, right? There'll be no data. We were using some other data directly as the PDF2L data. It could come up and then... I hope you can't say I'm confused. Sorry? Yeah, and then you wipe out all your data. So that is just not a production setup here for anything that actually matters. Hold on, wait one second, one second. I'll be talking on Sunday about that. So on Sunday, Josh will be talking about automatic failover with containers? Oh, yeah. Oh, with containers. Jeff should attend. I hope I can give a better talk next time. I believe Jeff, I couldn't answer that back. One thing, that's a great question. Best question of the day in that it depends on how the master fails. If the node or the physical server that you've deployed the node to, let's just say it goes up in flames. At that point, you need to have made sure that your slaves or your replicas are running on separate physical host and you're going to have to make a decision. Can I restore that server? If so, Kubernetes, when you start Kubernetes back up, it'll restart, try to restart the master and things will come back. But if the server that that's on just completely is toast, then you probably will want to trigger one of the slaves to become the new master. You're going to have to do that manually, currently, with the way these containers are constructed today. So you as the DBA would have to say, okay, I want that one to become the new master. Or you're going to have to go to a backup, create a new node somewhere else on another server and call that one the master. But when you do, you're going to have to reconfigure this cluster all pretty manually at this point. I'll show you another project here soon that does this for you to a certain degree. But that's a great question and it's one of the downsides to using the empty-dure volume type. Okay, because it sets on that physical server and that empty-dure volume is managed and provisioned by OpenShift on that server. The upside is high performance. You directly are accessing local disk. If I make the choice or the decision to deploy all of this on, say, NFS volumes, then all of my Postgres performance deals with the latency or whatever of that network storage. So that's the architectural quandary or decision that you need to make when you deploy this. Am I deploying for ultimate IO performance? If so, I'm probably going to use empty-dure and deal with if I lose a server. Here's my procedures to reinitialize this cluster somewhere else. Or if performance is acceptable using whatever network volume type that Kubernetes supports, whether it's Amazon or Google Cloud Storage, those are both options, then you probably would go there because now you have the ability to have a lot more resiliency. That's a great question. When I was writing these containers, I can't answer that question architecturally for everybody because everybody in this room has a different problem they're trying to solve. But I did want to make sure that I could make the empty-dure volumes work because I know a lot of people are going to say, man, I got to have ultimate IO performance. So as tempting as it is to just say, I don't always use NFS and not have to deal with that problem. A lot of the thinking, at least in this project and another one here, I'll just briefly introduce to you, it's using empty-dure for a reason, and that's to give you the highest IO bandwidth performance. And it's not a simple one either. The other thing that Jeff was talking about or like when he said, you know, figuring out where the others become the master. Open chip and Kubernetes, as soon as that pod dies, because the replication controller says there should be one of these running, it's going to spin up a new master right away. And I'm assuming that the replicas will try to reconnect to that new master that just came up, and you'll say, I have no data. Make yourself like me, and you're going to wipe out all your data. So this is not... I'm not showing the ultimate production scenario right here, but I'm trying to show something that I don't think any of us, most of you have probably never seen before. And some of the advantages to using kind of architecture. More work would have to go in, and you could engage the crunchy people and engage the open chip people to figure out a better way to do that. Wait, there was a question in the back, and then I'll get to you. No, the pod will be completely different names, but what will stay the same is the service name. And that's why the replicas are actually reading through the service rather than directly to the pod. So it will work, because when you talk to the service, and because you put the service in front, it doesn't care what's the actual pod name behind it. Does that make sense? How do you trick that? Yeah, everything uses the service name, and the service name is the IP entry point that everything can find everything through. So a pod, in this case, whenever I create the replica, I'm passing in the master service name. And if you look inside the pod... I'm going to show them right now. Okay. ...doctor will actually always associate or give you a service name to IP translation for you. Actually, Kubernetes, it doesn't. Kubernetes. So Kubernetes and open chip insert, so it's going to be Postgres, right? Or is it going to be master? Well, just cat out Etsy host, and you'll see something interesting there, too. But that's a good example there. I'm in one of the replicas, and I probably have to make this bigger. Come on. Can you see it? When he said the sleigh is actually probably using that environment variable to get the IP address of the... Oh, no. Is that... Yeah, to get the IP address of the master. Rather than hard coding in. Right? Am I right? It's using pgmasterhost as the connection. Yeah, two down. Yeah, it's using that. And you're like, okay, how does that get resolved? pgmaster. And, yeah. So cat out Etsy host. I nearly had a heart attack on that one. You're like, what? I'm not seeing what I was expecting to see there. But anyway, the Kubernetes in this case, I keep forgetting what it's doing for me versus Docker, but Kubernetes actually... Oh, no. This service, right in that service, that actually maps to the IP address. Yeah, and it's getting resolved. That IP address is actually getting resolved by Kubernetes. But I'm referring to that as the connection host name. So the reason I... If I had gone more into the whole open-chip thing, back in that slide with the golden blue, one of the hard parts about setting up Kubernetes is it doesn't include a networking layer. We... Open-chip actually brings the whole software to find out. And it's got... And Open-chip runs its own internal DMX to do that. So it's doing it all out of the code. So in the scenario where the master dies, the slaves aren't going to reread or to reconnect, right? No. No, but the IP of the service stays consistent, right? Remember what I said? That IP address for the service never ever changes, unless you delete it and bring it back to it. But service IPs never change, and which is why you're talking to the service, not directly to the master. All of your talk to the service under the hood, Open-chip knows, and Kubernetes knows, to reroute that to the pod that is actually the master. Right? But the IP for the service has never changed. That's the benefit of this framework. Your entities just don't have the care about the polish coming up and down. In this case, you end up wiping out the entire case. Steve Mott mentioned it, but the service name for the slaves is actually a proxy. So it just implements a simple round robin. So your app, when it connects to that service and does a request, it actually just does a round robin against however many slaves you have. That's the other point. Which is a good point. So this is the last part of the demo I want to talk about, because I want Jeff to show off his stuff as well. Go smaller. Thank you. So this is for the app developers, or even for the DB admins who talk to their developers. Which is a few of you. What is the other nice part about this architecture? Remember when we talked about the web here talking to this? I promised you an answer to a question. Sorry. I got to wait about the nice part. We're going to miss the nice part now. Oh. Go ahead. Taunt them too. Go ahead. This is a whole Kubernetes thing. We need like a whole. So the question was, how can I, if I bring up, if I promote one of the replicas to master, how do I get that to respond to the service that is actually the master service? Kubernetes uses labels for everything. So if you look at this service right here, I'll click on the service. And if we look at the selector, the selector for that service says where are the names equals pdmaster.rc. So the service knows what to route to you based on that selector. So anything that says name equal to pdmaster.rc, that service will then start routing to that top. So all you would have to do is, and you can do this in the version that I'm showing here. You just do oc edit and you can edit labels. But if you go back and you look at your pods here, these three pods, if I click on these, let's see if it shows me the labels right here. Nope, they must be under annotation. Where's the labels? All right, let me, I can't see it through the, oh, there. There's the label, sorry. It's the new web consideration. The labels are actually all in blue on the top. So this one has the name pdslaverc. All you have to do is edit that label to pdmasterrc, and automatically that service will start routing to that traffic. You can manually edit this and do it. But without restarting the container, without doing anything else, you can just change traffic around by changing the label. So that's one of the other great benefits of this kind of architecture. Does everybody understand that? So that's the other. And, or, let's say you have a misbehaving slave. It's working, but you're throwing a bunch of errors, and you go to the terminal, you see all these errors. So Kubernetes doesn't want to, OpenShift doesn't want to kill it. But you want, and you don't want to delete it because you want to look at what went wrong. You can just change the label from pdslaverc to pdslaverc busted, and what will happen is the reputation controller will say, oh, I don't have three of those, I only have two. I'm just going to put in one, and the service will say, oh, I don't want to route to this because it's not the right name, but it'll still be up there and running and not dead. So you can go and do all the diagnostics. If you fix it, you can just change labels and do it back again if you want. Okay, so the cool thing, right, I was showing the cool thing, is that what I was doing? It's not actually even a cool thing to show, it naturally happens. When you're writing your application, it's very easy now when you're doing, if your application is doing right, the only service it talks to is pgmasterrc, right? I didn't have to do any weird configurations. I don't have to tell my programmers anything different. I was like, this is your right service and this is your read service. Well, read should happen off the pgslaverc. You don't have to tell them to do, it's like, for me as a developer, there's no way to think of what I'm doing. Oh, I want to read and play and I want to write for the master. So those are the things I write to and I don't have to worry about anything changing under the hood or how many replicas are up or anything like that. I know pgpool can take care of a lot of that stuff but here if I'm just using, even if they don't, if you choose not to use it, it's there and I don't have to care how many replicas are behind it and I don't have to make my master do any reads at all. So my master can just handle writes exclusively. Which is the preferred architecture, right? Yeah. So it's just out of the box, we get that preferred architecture and we get it easy for the programmer to know how to work with that preferred architecture. Why did you warn me? I meant you were on a roll, so you were covering everything I was going to talk about anyway. Now seriously, I'll just mention a few more things. I'm about two weeks out from having a pgpool container that will work with this by the way, pulling that from this other project called Crunchy Postgres Manager. So probably in about a two week time frame you'll have a pgpool container that will go along with this. So the idea there is you would deploy that into this environment. It'll know who the master service is, the slave service, and it'll configure pgpool for you so that it'll do smart load balancing so your apps at that point would have one entry point instead of two. And that's going to actually be pretty useful for people. Also back up and restore capabilities. People will ask about that. And that's probably another one where I'm pulling from another Docker project, again the CPM project, back up and restore. And that's just a simplistic back up and store where you can run back up on this and it'll create a network volume that will contain that backup. And I have to use the network volume in that case because you want your backups to be long living and whatnot. And you can deal with any kind of slowness to using network storage in that case. The restore will work and basically it'll fire up this crunchy pg container. It'll look for some restore flags and also which restore archive you want to build your new container upon. And it'll basically pull down that archive to the local emptier storage and it'll start up Postgres using that. And that's another couple of containers that you'll see on this GitHub project pretty soon. And those are just what I would call basic features to get started. Allowing for a simple fail over is definitely something that I've been thinking about how to do. Like maybe exposing a trigger file location via Docker volume so that you could trigger it that way. And maybe some utilities that would help with managing some of this state. Some other points is this runs on CINOS and Rail currently. So it's up to you which one you want to use. I supply Docker files to both. It bases it uses the RPMs from the PGDG repo currently to do all of this. It's where it gets its Postgres bits from currently. Crunchy itself has our own repo that we're looking at maybe offering that version up as well for a more secure or supported version of Postgres as well. So you could run these containers using that say in an enterprise scenario where you needed support. Other things is I put in here PostGIF and PG Routing by default. So the size of this container is pretty hefty by a Docker container image size. But the reason is there's a whole bunch of stuff in there that I just said let me throw it in there and if people need it great it's there. And if not they can pair that out just by chopping it out of the Docker file if you wanted to shrink down that image size. But I've not found where the image size between 300 meg and 500 meg really matters for anything I've noticed. What else was there? I could talk probably for an hour just on the Postgres UID and GID configuration settings in OpenShift. The bottom line is Postgres needs and requires to run as the Postgres user. Out of the box OpenShift generates random numbers for UID. That's a problem if you just start this container up it will die. Postgres will die and I'm saying I need a Postgres UID to run under, I can't run under some dynamically generated number so there's different strategies for that and that's the security measure that's in OpenShift. But there's configurations in OpenShift that allow you to totally change that behavior and you can say no OpenShift run all these containers as any user. You'll see in the Docker file I wrote where I specified the Postgres user to run this under and also in the Docker file you'll notice that Postgres is the last thing that runs and it runs in the foreground so it hangs there and the reason for that is if it ever dies Kubernetes will know that the entire pod has died and try to perform a restart. Okay and that's just another thing that you'll notice if you look in detail. Quick question? Not in this example. What I do is I start up a bash script and it's called start.sh in there it eventually makes its way down so if you look, PID 1 is actually that bash script. Now some Postgres examples will not work that way at all but I'm still using a bash script in this example to set up some of these configuration options and then the very last thing the hanging process essentially Postgres keeps that script alive so it won't exit. I don't have time today but there's a whole other project here there's a couple of parallel tracks at Crunchy in terms of Docker work this has certainly been a big one we wanted to make it a way for people to run Postgres in an open shift container world there's a whole other project I've did called CPM or Crunchy Postgres Manager Web UI it's Docker only so it does its own level of orchestration and it's for people that want to implement a purely on-premise Postgres as a service but it uses Docker Swarm and it's using Docker volume plugins and things like that to perform sort of how what you saw today except through a point and click Web UI and there's certain things I can do in that environment because I can control the entire host topology that I can't do in an open shift so I encourage you to look at that project as well in that one for instance I can do a backup and restore just point and click and I can also do restores from things like PG Base Backup I can run PG Badger on any container I can start up a predefined profile of clustering and then also built in their Postgres metrics collection using Prometheus so it includes a whole lot of enterprise functionality as well we have a slide on it with the link to the Git repo as well we're in a time of container explosion I take on my gift to the Postgres maintainers right now if you don't know this already some of the way that Postgres does things with great backup postgres is not so good anymore in the world that we live in with containers so it would be nice if we started to think about moving Postgres to being friendlier to containers and stopped thinking that everybody's standing it up the way they used to stand it up before I had a big argument with wasn't with the actual person but like extension please stop forcing it to be a root user or the Postgres user to install extension that is not the world we live in anymore in the cloud and container world I don't need to be root or Postgres anymore to install extension and still be secure from an administrator standard the world is changing and some of the ways that Postgres needs to change have to come along with that and I get to say that now because I got this so far Kubernetes and OpenTips get to be on containers good time to go ahead for you with containers from Postgres upstream first so the stuff that you're seeing is upstream both from Jeff's work and Red Hat's work it's all done in open source first and you have access to it right now to play with it reach out to me on Twitter or any other places where you can reach to Stevo even at RedHat.com you can't talk to Jeff because he refuses to use any electronic media and that's it, thanks everybody the reason I hate the X is because of a reason you're probably checked out this is about how many were on the last one though right now that's true everybody hear me okay good, we're going to get started on time here in just a couple minutes because we have a lot of material to cover told you so told you so going to get a crowd for this one topic, well maybe both we'll be started we have a lot of material to cover here this is this is actually a condensed version of a tutorial that I've given, it's three hours so we're going to be going through it a little bit quick but feel free to ask questions I'm going to go ahead and get started now so that we can get a little bit more time in because nobody cares about the front end stuff anyway I'm Steven Frost I'm with Crunchy Data Solutions Crunchy Data Inc, however you want to look at it nice I'm a committer, major contributor implemented role-level security in Postgres 9.5 which was just recently released I did column mobile privileges I've implemented the role system back in 8.3 so I've done a lot of different contributions to Postgres over the time so we're going to go ahead and jump right into it the first big thing is a lot of terminology here when you're working with Postgres it's not quite the same as every other system but it's important to understand that terminology so we have a baseline, yes sir slides will be published if you look for the Postgres Open 2015 slides they're there they'll also be posted on the Wiki website for this conference once I'm done so when it comes to Postgres you'll come across a term such as a cluster or an instance this is essentially one Postgres system that's running we view that as a single postmaster listening on a single IP address and a single port of course we can listen on multiple IP addresses but the idea is that it's one thing answering the call that comes in and it's also one set of data files those data files can be spread out using table spaces we'll talk about that a little bit later the other big thing is that it's one right ahead log I'm going to go over what right ahead log is later also the general operations that you can do at a cluster wide level or start, stop of course, initialization that's also where all of your file level backups are done at so whenever you're doing a backup you have to have an instance or an entire cluster if you're using file system backups there's other tools that you can use that allow you to do things that are smaller than that but they're all logical logical backups streaming replication so because there's a right ahead log the streaming replication also ends up happening on a cluster level and then there's a number of different objects that are defined at that level users, roles, tablespaces, and databases those are the kind of objects that exist at a cluster level in Postgres when it comes to what is an actual database in Postgres land so a database lives inside of a cluster and you can have more than one database inside of a cluster there's a few specific permissions that you're allowed to grant at a database level which include connect, create which is talking about the ability to create schemas which we'll talk about next as well as the ability to create temporary objects using the temporary bit when you're talking about a schema those are namespaces inside of databases so this is kind of this another level of this onion that we're working towards here so you have on the outside you have the instance or the cluster you then get two databases and then inside of databases you have multiple schemas schema level permissions include create and usage so create means that you can now create objects which I'll call principal objects inside of the database you then have the ability to use the schema with the usage option those permissions are at a schema level so now you're talking considering like a directory whether you're allowed to modify the directory, add things into the directory step into the directory those are what would be analogous to create and usage when you're talking about schemas tablespaces are locations for other data files for Postgres so if you've got five different mount points on your system you could have five different tablespaces that correspond to those different mount points or maybe you'll have four additional tablespaces along with the default tablespaces that exist inside of the system to begin with they are our cluster level and what that means also is that they can exist with multiple different database contents inside of them so you can have one tablespace and then have files or tables or indexes that are from different databases on that system using that tablespace tablespaces cannot be shared between instances or clusters however nor can they be moved between instances or clusters they are associated with a particular instance of the system now I told you I was going to tell you about right-ahead log, this is a extremely important part of any database that has right consistency and also guaranteed right so the right-ahead log is important because that's where every right to the database goes first so whenever you do an insert or you do a delete or anything that modifies anything whatsoever in the database we're going to write it into this right-ahead log first we then when you go to actually do a commit we go and make sure that those changes have been committed to disk that means they've been f-synced all the way out to disk and only once that happens do we acknowledge the commit and tell the application that after the commit that we've actually committed those changes and have committed them to disk and we'll not lose them after that all of the changes to the right-ahead log are CRC'd naturally and the changes to the actual data files in the background so we're talking now about the tablespaces and the other files that exist inside of the system generally speaking those changes are done in the background, that's what we want to have happen we want to accept the changes through the right-ahead log and then they get written out to the individual files and over a period of time during what's called a checkpoint which I'm going to talk about next here in just a minute one of the things that you do want into with a right-ahead log is that there can be a contention there because you have all of these different processes that are all trying to write through the right-ahead log system that's why one of the things that Postgres supports is the ability to put the right-ahead log in another location a lot of times what you want to do especially if you're on spinning rust is put that on a dedicated set of ten disks that way you get as much possible performance as you can out of it another interesting issue with the right-ahead log is that there's two different types of writes that we do into the right-ahead log we do what's called full page writes and we also do incremental changes so a full page write in Postgres is 8k by default and that's the first time we write out a page into the right-ahead log after a checkpoint which I'll talk about next there's going to be a full page write and then any subsequent changes to that same page during that checkpoint will be incremental changes in 9.5 that we just finally came out with we are going to be compressing those incremental changes as well so you'll actually get your right-ahead logs will get smaller or at least there's an option now to have them be compressed inside of the right-ahead log just to kind of hit on it again because after we have a crash so because Postgres has written everything into this right-ahead log first what happens after a crash so the database system gets killed or goes down for whatever reason when it starts back up it's going to look and say okay where was the last point in the right-ahead log that we had a checkpoint a checkpoint is what we have when we actually have written everything out that was pending in the right-ahead log out to the heap files and sync them all when that's done we make a mark inside of the right-ahead log saying okay everything's synced to this point anything that hasn't been synced after that checkpoint is what we're going to replay when the database system starts back up again so when the database system starts up we read through the right-ahead log and write all of those changes out to the heap files because we can't trust that they were there because we didn't f-sync them ourselves we didn't know that we told the kernel to write this data out and we know that it's done so a checkpoint by default happens in Postgres every five minutes what's happening here is that it's going through all of the dirty pages that we have and pushing them out to disk and f-syncing them and what we try to do is we actually try to spread out the time that we're doing those f-syncs over that five minute period so it can be forced for a couple of different reasons one of the reasons that a checkpoint might be forced is that we've run out of room inside of the right-ahead log area that we've been provided to write more right-ahead log what this means is that there's an option checkpoint segments inside of Postgres that I'll talk about a little bit later but what's happening is that we don't want to just write out as much right-ahead log as we have available to the point where we fill the disk because if the disk ends up filling up on your x-log volume we can't accept any more transactions because we can't write out to the right-ahead log first which is what we have to do so there's a configuration option that says okay we're going to take and work with a subset of the amount of disk available and once we run out of that space if we haven't hit that five minute mark and are able to reclaim some of the x-logs that were already written we will forcibly start in a checkpoint which can be very expensive and what will happen is that we'll immediately basically start f-syncing everything out to disk so that we can get back right ahead of log space so that we can start accepting transactions again so again if you have questions please don't feel too afraid to stop me but we do have a lot of things that we're going to go through alright so the Postgres development group provides a set of packages both for Red Hat and for in terms of RPMs as well as Debian based systems those are really the principal ones we have now there's also the Windows ones I'm not going to talk too much about that because we're at scale just saying so both sets of packages from the community allow you to have multiple major versions of Postgres installed concurrently and allows for smooth major version upgrades using the PG upgrade tool which is really quite handy in all of the systems that I run today this is what I use I use everything from the PGDG because those are the ones that are up to date they're well maintained they're supported through the community mailing list and they're updated the releases of them are updated in coordination with the Postgres team so whenever a security change or anything is coming out we coordinate that with all the different packages for Debian and all the Debian based systems and all the Red Hat based systems for the Windows builds so on a Debian or Ubuntu based system in this kind of environment it's app.postgres.ql.org so you would add that into your sources list you can use LSP release if you want to and it's pretty straightforward right it's app get, app get, app get done and we have the we've got the version installed this is with 9.4 but of course 9.5 is out now so and that's 9.5 should work to sign and that will pull in the latest version in terms of how the configuration layout works when you've done this so all the config files for a given instance or given cluster I was saying live inside of a x.y so that's a major dot minor version well in Postgres land that's a major version right, x.y is a major version so 9.3 is a major version 9.4 is a major version 9.5 is a major version inside of that directory under x.y Postgres because with Debian you're able to have multiple different instances installed as well there's an instance name so in this case it's main that's the default so whenever you do this app get install you're going to get an instance of whatever version so 9.4 in my example and that instance is going to be named main all the data files end up in Postgres and then that major dot major version and then the instance name binaries so the binaries are common to all of the instances there's only one set of major version binaries those live in in user lib Postgres following that x.y slash bin and then there's a bunch of wrapper scripts so if you're ever wondering you want to go look at psql for whatever reason and you're confused on a Debian based system why it looks like a pearl script because it is it's actually a wrapper for psql because this whole notion of multiple different instances installed on a system is not something that the Postgres utilities themselves directly support so the Debian maintainers which I think is fantastic have come up with a number of wrapper scripts that provide the ability to have multiple clusters on the same box and be able to work with them very easily and the logs are pretty natural note that there's a difference between what I'll call normal operation Postgres logs and what are called the startup logs so the startup logs are just kind of the bootstrapping of getting Postgres running has its own set of log files in a Debian based system all of those logs go into our log Postgres it's a little bit different on redhead I'll talk about that in a minute and then there's one init script that starts everything on a Debian based system for us I don't I don't yeah, let's not go there right now so with these clusters so on Debian those different instances are called clusters right and so these tools are all say cluster in them so pgls clusters gives you a listing of all the different clusters that exist on your system and that can be across different major versions as well as multiple instances inside of a given major version pgctl cluster is what you use for controlling Postgres so on a system like red hat where it's not using these wrapper scripts the command is called pg underscore ctl I'll talk about that in a minute but on Debian based systems you want to use pgctl cluster right and that's what you can use to start, stop, signal the database generally you want to signal it to perhaps reload files reload the configuration files and then there's a dash dash cluster option so this is one of those cases it's kind of weird right so this dash dash cluster that's two dashes sorry, specifies which option or which cluster to work on inside of a Debian based system and here you can see I've got it dash dash cluster not a force slash main so the way you specify it it's always x dot y slash and then the instance name for basically all of the tools if you don't specify it there is a default that's picked up and used which generally is whatever is running on port 5432 which is the postgres port that's where the default is going to be and here you can see that they also tells you what the data directory with the log file is when you're using pgls clusters there's a bunch of different configuration options that are available I'll talk about that in a minute when it comes to Debian based clusters alright so now for a red hat base installation on red hat you're using yum dot postgresql dot org and this whatever rpm based system you want to use when you install it it will go ahead and initialize the cluster you can run multiple different major versions in parallel it's a little bit ugly in my opinion but it does work this is how you can do a group install that pulls in all the different dependencies that you want and then here you have to sorry on a red hat system you don't get the database initialized to this initdb yourself so that's what you do after you get the actual packages installed you have to do your own initdb and then that will create the cluster for you create the instance and then you can use the CTL to start it up alright so on a red hat system everything all the data files and what not live in this varlib pgsql so you don't have quite the same it's not quite as convenient to have multiple different clusters you don't get all of the configuration files that you get on a devian based system all of the configuration files that I'll talk about here in a minute live inside of the data directory on a red hat based system which personally I don't like although that's kind of the way postgres operates when you're using it when you install it from source also binaries go into this user- your user slash pgsql-x.y slash bin which I find bizarre and then the logs are in varlib instead of varlog and there's actually two different locations so you have a startup log that's here in this kind of major version directory and then the actual regular operational logs are in data slash pga underscore log on a red hat based environment I will fully admit I'm not a red hat guy so take all of this with a little bit of grain of salt I went through it and did it and it's where I found that everything just be aware of it and you also need an independent niche script on a red hat based system for each of the different major versions you want to run alright so when it comes to configuring postgres there's kind of really four major configuration files there's a main configuration which is postgresql.com and that's where you configure a lot of things like how much memory postgres uses what port it listens on all of that kind of information you want to get into the question of how do I connect to the database how do I get users connected what are they doing so to do that what you have to use is pghba and pgi-dent so pghba allows you to control what authentication method is used for the user to connect pgi-dent is for mapping users between you may have multiple different system users so if you imagine on a unix based box users that all map to one postgres user you can specify a mapping like that using this pgi-dent.conf and then there's a pg-log that's not a config file I'm not sure what that is I think I was getting at the point that inside of the data directory there's a pg-log on the red hat systems so on debian based systems all these configurations are in that pg-postgres directory like I was talking about one of the things to be really careful about is don't modify the other stuff in that data directory there's a lot of things in there that are really important like your data and you don't want that to be destroyed so be careful whenever you're working inside of the postgres data directory and you're modifying things inside of it so as I mentioned there are some debian specific configuration files these are only exist on debian and these are for working with these different clusters so there's a startup config that allows you to say if you want to have the cluster automatically started or whether you want to have it disabled or have to be manually started pg-ctl which just tells what options to pass the pg-ctl generally you don't need it same thing with this environment option the environment configuration file on debian I don't generally change it but it's there if you needed to set some kind of environment variable before starting the database there's also cluster configuration information inside of this common directory so on a debian based system there's actually a postgres common package that's installed along with your all the different postgres packages that get installed and that provides this information about setting up clusters so inside of the postgres common directory you can have different settings for how you want clusters to be created you can also control which clusters are the default cluster for certain users you can actually have different users have different default clusters that they don't specify cluster all the time users can also set that themselves through the environment variables that the wrapper scripts will respect as well and then there's a pg-upgrade cluster.d which isn't really used very much but it could be populated by extensions to handle doing upgrades from different across major versions of postgres so on redhead systems there aren't isn't quite as much in terms of redhead specific config files simply because redhead doesn't have all of those same cluster helper scripts and whatnot so mainly it's just the init scripts which generally you don't need to modify too much thankfully some of the things that you used to have to modify have been moved out of there you don't have to specify the port for example if you wanted to use a different port on a redhab based system alright so I'm going to talk about configuration so these are configuration items that are inside of postgres.ql.conf so this is kind of what you might want to do when you're first installing a system this is for 9.4 and earlier 9.5 is a little bit different in this area unfortunately this deck hasn't been updated to account for that so just be aware of that, read the release notes if you're looking at 9.5 and that'll cover what the differences are between the releases the big one here is if you want to allow other people to access the system like on a devian based system it'll start up with only listening on localhost and you'll have to change listen addresses to star to allow people to connect in from remote so of course on a devian based system it starts up in a default secure manner check point segment this is what I was talking about with the right ahead log in 9.5 it's called min wall size and max wall size so that's one of the big differences there is no more check point segments and what that does is it controls how much of that right ahead log space is allowed because it's allowed to use so again that impacts whether or not you're having to do checkpoints more frequently than you should be check point completion target the default is point 5 which means that if we have a 5 minute check point timeout we're going to try to complete the check point that is writing all of those changes out to the heap files file system the data directory in 2.5 minutes my experience is that there's not really any point to waiting or to being that aggressive I should say so you might as well just go ahead and set check point completion target to point 9 which says ok so across 5 minutes you know use like 4.5 minutes or so to complete the check point during I mean in my 2 cents we ought to get rid of this option but you know for now at least set it to point 9 effective cache size so effective cache size is something that is not actually ever allocated what it's just used for is it tells Postgres it gives Postgres some idea ok how likely is it that the pages from this particular relation this particular table or index how likely is it that that stuff's in memory right if it's really likely that it's in memory that's great we're probably going to use an index based scan if it's less likely to be in memory we may move over to doing a sequential scan instead so generally I would set this to about half of RAM but you can actually look and see how big is your Linux file system cache and if that's the only thing it's being used for you can just set it to whatever that is max wall senders it's set to zero by default because it does take up a really small amount of resources I generally set it bump it up to 3 because you have to restart the system in order to change it so I find that it's better to just go ahead and that's what allows you to do things like pgbase backup or to have a follower like a replication system alright we're doing logging the default logging in Postgres sucks this is what I do in terms of just kind of real basics for an initial setup so logging every connection to the database every disconnection those are pretty obvious log lock weights is actually not as obvious but it's ridiculously helpful whenever the system is stuck between two different processes due to a lock it could be a table lock it could be a row level lock after a few seconds what we're going to do is run what's called the deadlock detector the deadlock detector is going to say these processes have been stuck for a while are they deadlocked and assuming that it's not deadlocked which generally is what you are hoping for Postgres will then if you have this enabled saying hey I've been sitting here waiting on a lock for a minute just FYI and that can be indicative of someone's gone in and started a transaction and acquired some locks and then is just sitting there not doing anything and those are people you want to go over to their desk and kick their chair and say get out of your transaction so we can have processes move forward again logman duration statement is important because this in a second what you're specifying here is if a query has gone longer than this amount of time I want you to log it so I can go review it and figure out why that query took so long 100 milliseconds is a really long time especially when you're trying to load web pages for a database to be responsive you really want databases to be more responsive than that log temp files is really handy because in the operations of Postgres there are things that we do when we need to use a lot of we end up using up a lot of disk space like a sort so if we have an on disk sort that's happening what will happen is if you have log temp file set to 0 then any file that we create for temporary purposes like doing an on disk sort for example will be logged that can also be very indicative of why a particular query was slow maybe Postgres has a sort 2 gigs worth of data that's going to take a little while for that particular query and then log auto vacuum set to 0 basically says ok I want to log every single thing that auto vacuum does which I find very helpful as well you see some kind of performance impact and maybe it's auto vacuum kicking in this will log all that information for you about what tables auto vacuum vacuuming and what it's been doing there's also a really important thing called log line prefix which you would think would be something reasonable but it really isn't so this is what I use the slides will be posted I'm not going to go through all of this since we have a lot to cover here but that's kind of what I end up using collects all the important information one of the new ones is application name that's one that you may want to really look at see if you have, if you're already running Postgres you may not have application name logged that's helpful all the Postgres applications will set that for you you can set that for yourself also whenever you're connecting to Postgres through libpq you can set your own application name and then that information shows up in the log as well as in some of the informational tables inside of Postgres alright so here's pghba in a nutshell everything's red top to bottom so here we have the different different connection methods here on the left so local means a local unix socket you then have, you can specify a database where you can specify all databases the user and then the method if it's over the network you have an address setting and then there's a bunch of different options that you can set addresses can be either v4 or v6 they can include site or notation if you want there's a special option called reject which if you hit that when you're reading through the file means Postgres will just kick you right back out and so that allows you to kind of have cut out inside of your Postgres config your Postgres pghba.com in terms of the actual different methods that are listed there these are the different methods that exist inside of the that are supported by Postgres here is what I'm talking about when I'm talking about Unix sockets so we're just going to ask the kernel who is this person that's connecting we're going to get back the Unix username and that's what we're going to allow you to authenticate as and then you can map that to multiple different users using that pgi-dent config file like I was talking about gss is Kerberos it's also SSPI on Windows so if you're setting up a pghba.com on Windows you want to use set it to SSPI but if you're using a Unix box, a Unix box which hopefully everybody is you're using gss works really well with MIT or Heimdall Kerberos I've played with both and been happy with it you can also integrate it with Active Directory so Active Directory is actually Kerberos underneath you can have a Linux box on your main or you can have an independent realm that has a trust relationship between your MIT or Heimdall realm and the Windows realm and that allows you to do single sign-on with Postgres which is really really handy there's also the ability to do certificate-based installation or certificate-based authentication this is using SSL client-side certificates so when you're doing that inside of your dg-ident what you're mapping there as the system username is a common name and that's what you use to map to whatever you want that user to be allowed to log into on the Postgres site, whatever Postgres username one of the things that's interesting that I've never played with to be honest is that, or that I've never done before but it works just fine is that when you're talking about gss and Kerberos the system name is the user at realm so it'd be like frost at MIT.edu or something like that you can have that as a Postgres username inside of the database if you want you don't have to map it to a simpler name I typically do, but you don't have to you can set it to just be that and I've seen people do that and it's kind of interesting and works well in large environments we have a lot of different people across a lot of different realms these are some methods that work just fine probably most of what you're using I don't like them, I much prefer the stronger authentication method that discovered MD5 is your stock kind of username that's probably what most people are using it works it's not what I would recommend but it does work PAM is complicated so the reason for that is that Postgres doesn't run as root so you have to set up things like sazzleupd if you want to have Postgres be able to authenticate users using PAM or using you know using your password files etc passwd and etc shadow obviously with that with MD5 really you want to be using sazzleupd encryption I would say just generally you want to be using sazzleupd, but that's just me we also support radius for anybody who's got some old school stuff that's still doing radius authentication Postgres now has support for that and then there's also the traditional password base which is just like you know password with sazzleupd is like open sazzleupd you feed the password over to the well with the password base off and the server checks it against your password files and you're in so that's how that works these are methods I don't really like LDAP it gets a bad rep from me simply because I know that if you're running LDAP 99% of the time you're running Active Directory and you should just be using Kerberos with Active Directory instead of using LDAP authentication there are some environments where yeah you're running open LDAP and you actually have an LDAP instead for authentication purposes but the other problem is that again you're passing the credentials of the user through the system right whenever you're using LDAP or password or MD5 the user's password gets exposed to the server it has to with Kerberos and GSSAPI that doesn't happen which is one of the really nice things about it same with SSL based certificates IDENT just shouldn't be used and TRUST really shouldn't be used either they're just not secure by any way shape or form in my opinion I know some people have used TRUST in the past because they think it's faster than MD5 maybe it is but I wouldn't recommend it at all because it literally bypasses any authentication and potentially allows you to login as any user so just be aware of that so if you have TRUST configured you should go change it all right PG IDENT so here's what I was talking about before where you have a map name you can have multiple different mappings depending on the authentication method that you're using and then you have what's called an auth user or a system user and then you have what your Postgres user is so for example here this joe would be a joe that's installed on the Unix system right for peer based authentication with peer map and that user on the Unix file system can login as Bob realize that joe can't login as joe with this configuration if you want joe to be able to login as joe you have to specify another row here that allows that or you can use a regular expression so here's a regular expression for the Kerberos mappings so what that regular expression does whatever you want your match to be that's what we're going to allow over here on the right so if I'm sfrost at snowman.net as shown here this backslash one with this radioX will be sfrost any user at snowman.net to login as their short name to the system this particular mapping also allows me to login as the Postgres user so here for a certificate mapping like with a certain name this would be my common name inside of the SSL certificate that I'm using to login to the system and that allows me to login as this sfrost Postgres user and that's how you specify a map equals whatever map name you want to use on the authentication line or on the method line inside of pghba.com alright next question so now we're going to talk about running Postgres so a big question is always is Postgres up so there's a couple different utilities to use you can use just the service command to check status and that'll tell you there's multiple different instances and for each instance it will list whether it's online or not a command that's available that you can use and it is supported by the wrapper scripts in Debian so you can pass the dash dash cluster option to it and of course you can always just connect with psql which is very handy the pgisready is also available on the rpm based system but it doesn't accept the dash dash cluster option alright so using psql so if you're working with Postgres a lot of what you're probably doing at least if you're a command line guy like I am there's a number of commands that psql takes all the commands that psql once you're inside of it so this is like you're inside of the psql shell here so inside of that shell anything that starts with a backslash is going to be a command that's interpreted and handled by the psql binary anything else is like just sent to the server and whatever the server comes back with psql displays so the big thing is you know backslash question mark is really helpful to get a whole listing of what the different options are backslash h is ridiculously helpful because backslash h what we do with Postgres is all the Postgres documentation has the syntax for everything that the Postgres server accepts all the sql that it accepts all of that syntax is then stripped out of the documentation and included with psql so you can do backslash h select right or backslash h insert and you'll get the full syntax for that command it's very very helpful you know control d your backslash q to exit any queries that return information will be displayed to you and backslash x can be used to do what's called an extended display where you're getting a separate line for every column in the data set so for example this is what backslash x looks like table is a really handy command if you're not familiar with it the same thing is select star from it also accepts limit it's very very handy and allows you to have it's just shorter and simpler pgstat activity is a table that tells you what users are logged into the database right now and what they're doing so here I've got expanded display on so each line is actually a column and each record starts with a record header like this and then this is all the different information that you have inside of the database that you can access through psql about the users that are connected such as you know what users connected what their application that they're using is if they're using one of the applications that sets that information you know the client information so here there's no client address because I was connected in over a local Unix socket when their process started if they're waiting is an indication of whether they're waiting on a lock or not the state is whether what they're doing right now if they're running something or if they're idle there's no transaction and then query is actually whatever the last query they ran or the current query that they're running is sorry so that's what that's what the query ends up being alright so if you want to look at what databases exist backslashl gives you the list of databases I'm not going to go through all of the different psql commands just because there's a huge set of you know slew of them but that's what you can see with backslashl tells you the different databases pretty straightforward yeah yes when you run psql you go into a shell right a psql shell and that's where you can issue backslash commands and then do selects and queries and what not if you want to send a command to psql from bash you would do dash c is the option that you pass to psql so you can just do on the command line on a bash line you can do psql dash c and then double quote select star from whatever end double quote right and there's a whole bunch of different options there for dealing with formatting dealing with table alignment dealing with headers and footers so you can you know you can strip everything out and have like a select query that runs underneath the bash to run psql and just gets whatever that data element that you want out is which is you know a lot of times what people are doing and then you can shove that into a bash variable if you want works just fine yes it's a toggle yes once you've turned that on it'll stay that way forever until you turn it off yes backslash question mark is you know I think I'm not sure what you mean by what command you're looking for so what I would suggest if you're not sure what the definitions of the commands are like the queries that you're talking about I would say if you do just backslash h you get a list of all the top level commands that postgres supports and then once you've got that you can go to the postgres documentation and look it up templates I'm going to cover this really quickly because we're already at 40 minutes and I've got a bunch of slides left so templates are something that you can create and then if you do a create database using that template we just basically copy all the files from that template database into your new database so if you're creating lots of databases for whatever reason and you have like a common set of stuff you want to be in all of those databases create a template database fill it with that stuff and then whenever you create a new database just specify with template the create database command and then that will be pulled over creating users is very straightforward it's create user on the command line or you can do create space user inside of psql itself so this is inside of a psql shell and then you can set a password and whatnot pretty straightforward user privileges a lot of different user privileges that are really important super user is the big one don't give it out to just anybody lots of different stuff you can access the database in ways that you really shouldn't be allowed to so be very cautious with that one create role is more powerful than just being able to create roles because it allows you to modify roles as well so that's one that you don't want to just give out to anybody login is by default if you do create user you'll get the login privilege if you do create role it's like a group so you won't have the login bit so if you've done a create role and you don't know why you can't login is that new user you created you don't have the login rights you need to go change that on the system and you can do alter user to change that information and if you're curious but how to use alter user you do backslash age alter user and then you'll get that information I'm not going to hit on the rest of this stuff but there's also replication for changing for what users are allowed to connect into one pgbase backup or be as a slave system pull data down to be a hot stand by or read only replica all right roles so roles are kind of complicated in postgres roles are roles users are roles groups are roles everything is a role inside of postgres so when you were thinking about how you want to do group membership you know use create role set up your groups that way set up your users with create user that way that login bit gets set the right way you want to add someone to a group that's using the grant command so you say grant Joe to or sorry grant you know whatever the group name is so grant admins to Joe right and that allows Joe to now have whatever the rights of the admin role have inherits really really important what it allows you to do is that it allows you to automatically inherit the rights of that role so for example with the admin role in fact I'm going to go over here so with the admin role you can set up you can set it up as no inherit and that means that when you connect in you don't automatically get all of those privileges instead you have to actually do a set role to it so this is how I like to set things up for kind of a pseudo like mentality note that it doesn't ask you for a password there's no way to get postgres to ask you for a password on a set role right now I think that may change one day but don't get your hopes up so you create this role admin with no inherit grant postgres to admin and then create this user Joe and grant admin to Joe now Joe can log in he's a regular user doesn't have any particular special rights but he can do a set role to postgres when he's ready to so that's how that works alright grant and revoke so this is going to cover what I recommend changing on postgres so so by default postgres comes with what's called a public schema that anyone can create objects inside of I don't generally recommend that because that schema is in everybody's search path so it can just get obnoxious by default so I would I always revoke create on that schema whenever I set up a new system and always have per schema or per user schemas instead so whenever I create a user I'm going to create a schema for them also and then they'll have rights on that schema and they can do whatever they want with that and then if I have something I want to create to have available to everybody on the system I'll go put that inside of the public schema these are the different ways in different options for privileges on a system depending on what type of object it is there's different options that are available and you can go all the way down to individual columns and you're talking about select insert update rights functions are important to think about functions with whenever you do a create function by default everybody's allowed to execute that function assuming that they can see the schema that the function is inside of and then you can also have what are called security definer functions those are like functions with a set UID bit so whenever you call that function the function is going to change rights that it's running as over to whoever owns it and whoever the owner of that is and then that's who the rest of the function is going to run as so be aware of that that's how security definer functions work inside of Postgres so I covered a little bit of this already default permissions for execute for functions everything else is basically secure by default you create a table nobody else has rights to that table until you grant them access to it unless you set up default privileges on the schema so you can set up a default privilege for either a role for a schema for a role inside of a schema and that allows you to set it up so that to create a new table it will automatically have some set of grants that are allowed to it you can also use grant on all so if you want to grant some access to all the tables in the schema you can use that command if you're wondering how big your database is this is pretty straight forward it's pd database size is a function you can call pass the database name and then pd size pretty just makes it look nice you can look at the size of individual tables using pd total relation size note that that particular command will actually include the size of indexes and the heap and the toast tables and everything so that will give you the total size that can, that could cross over different table spaces of course so both of these will go across any table spaces that are involved and collect up all that information for you if you look at the size of the individual table like you don't want to look at how big the indexes are associated with it you can just use pd relation size and that'll give you just the size of that relation and not include the indexes so here's a handy little command if you want to look at what's the size of all the tables inside of a particular schema how much disk space is this user using up this is a way you can do that assuming you have poor user schemas creating a table space is kind of tricky, there's a bunch of options you have to set up and things you have to get right some of the big ones are you have to have created the directory already it must be empty, permissions gotta be 700 on it and you have to specify the full path to the directory when you're creating a table space do not ever create table spaces inside of your data directory in Postgres, don't mess with the data directory at all create table spaces somewhere else okay, Postgres feels like it has control and domain over everything in the data directory don't be creating table spaces in there too often and I just have to bring it up also I strongly recommend you don't use a mount point directly create a directory underneath of that mount point and then use that as your table space in my opinion that's just best practices but I think there's some good reasons for it also if you want to get information about the table spaces that's backslash db on your at your psql command line and that'll give you the location these ones don't have location information because they live in the Postgres data directory so that's like varlib, postgres, ql, 9.4 slash main on debium and then you can use pdtablespace size to get the size that Postgres thinks of all the objects inside of that table space dropping a table space, you have to have the table space be empty before you can drop it there's some handy commands that allow you to move sets of objects from one table space to another but you may have to actually connect to multiple different databases to move all the different objects or to drop all the different objects and once you've dropped everything inside of the table space or moved them out then you can actually go and drop the table space itself alright file-based backups so pdtbased backup with wall receive is really really handy for doing simple backups that does require that max wall senders I talked about before it's one time and it does the whole database so there's no incrementals or anything that are available with that which isn't great and you have to have the right ahead logs that were created during that base backup in order to restart the system so when you run pgbase backup there's an option there to stream or to pull down all of the X logs make sure you use that, I recommend the streaming option JD did a talk yesterday for those who weren't here sorry but he did a talk yesterday about backing up the database and using pgdump I'll talk about that here in a minute but one of the things that's also important is that this includes all the data files and all the indexes and everything so when you do a restore you don't have to recreate indexes or anything you do when you're using something else like when you're using pgdump for example so the other main one is using logical base backups this is where you use pgdump and pgdump all these end up being essentially text-based dumps they don't include any indexes or anything the entire system has to be repopulated and the indexes have to be recreated when you're doing a restore so it's kind of painful backups don't work unless you restore them so just best practice make sure you're testing your restores make sure you're actually doing restores and making sure that they work consider multiple different options for how to do a restore if you're doing a restore with pgbase backup it's pretty straightforward pgbase backup create the tarball for you all you got to do is extract that tarball all the data should be included all the wall files should be there as well and that will have all of the data since any data that was created after the backup will be gone unless you save the right ahead log and can replay the right ahead log you can talk for an hour just about backups with Postgres if you have questions about that feel free to ask me pgbackrest is a really nice backup utility it's available here github.com pgmasters it's for file-based backups and it handles all of the craziness about right ahead logs and everything else is handled for you and it also supports point-in-time recovery it's parallel you can do differential and incremental backups really nice utility if you're looking at a large system I strongly recommend looking at pgbackrest as your backup solution these are the settings that you need to set to enable inside of pgbackrest you can look at pgbackrest.com to use pgbackrest you can find all that information from the pgbackrest web page as well I'm not going to cover too much of it this is how you configure pgbackrest so there's a etsypgbackrest.com there's a repo path which basically says that's the path where all your backups and all your right ahead log goes to and then you need to tell it where the database lives and then you can have multiple different clusters that are supported inside of one configuration file with pgbackrest that's what main is here just like the 9.4 slash main, that's what main is here inside of pgbackrest there's an info command with pgbackrest which tells you information about the last backup there's also a JSON version of this that gives you lots more data that you can then extract and parse and stick into a Nagios check if you want and then speaking of Nagios so monitoring is obviously a really important aspect of database systems checkpostgres.pl is a great monitoring tool for Postgres, integrates with Nagios you can get it to dump out statistical information and have it be you know, graphed with pretty graphs and what not through Nagios and with I think or whatever you're using pretty well, also supports custom queries which are really nice these are the ones that I typically play with in terms of a minimum recommended set of checks to be running with Check Postgres sorry monitoring log files so the same person who wrote Check Postgres also wrote tail and mail you have Postgres log files you do it's nice to be able to monitor them and make sure that you're tracking any errors and what not I like to use tail and mail to do that it's pretty straightforward and it works quite well you can also reconfigure Postgres to use what's called the CSV log CSV log allows you to dump out the information from the Postgres log into CSV files and then load those back up into the database and there's lots of other options for doing your own rotation if you don't want to use log rotate alright configuring and tuning Postgres never seem to have enough time for this section so shared buffers is this massive pool of stuff of memory that we allocate whenever we start up and that's where that's kind of the Postgres local cache right so that allows us to avoid going back to the kernel and going back and forth whenever we need to get data off of disk or out of the cache so it's really tricky to tune right if your working set is less than the amount of memory you have in the box I generally recommend setting shared buffers up enough that you can have your entire working set in memory or maybe your entire database in memory if it's small and that works out really well if it's larger than that I find it can be helpful to actually have shared buffers be less significantly less down to the point of being only maybe a gig and what that does is that means that we're going out to the kernel a lot more but it also means we're doing less double caching right so whenever we have our own cache and the kernel has its cache there's a risk of double caching there which just ends up wasting memory that memory could be better used for running queries and doing other things potentially work mem is the setting that we use to figure out how much memory we're allowed to use with this query so it's more complicated than that but just realize if you up work mem more queries will be allowed to use more memory for doing things like building hash tables or doing sorting so it can be really really helpful the default used to be one mag we just up that a little bit I think it's like eight now which is better than one but it's still pretty freaking small if you ask me so I tend to bump this thing up pretty well the only downside to it is that if you have lots and lots and lots of connections you run that risk of running the bit system out of memory realize that Postgres is actually very good at handling issues when it comes to memory I got five minutes left so it's actually not like the end of the world if Postgres ends up using all of the memory in the system because it'll get a memory error and it'll clean up everything very nicely so it's not like a huge problem don't run out of room for pgx log though alright maintenance work mem is what we use to create indexes so you want to bump that up whenever you're creating indexes but realize that we're going to use all of that memory so just be careful from that regard but it can make index builds go much much faster because we have to basically sort everything I talked about effective cache size a fair bit already so I'm not really going to cover it again auto vacuum that really washes out the screen I'm sorry about that so auto vacuum the defaults generally are too low in my experience for a high transaction rate system usually what you want to do with auto vacuum is actually make it more aggressive not less aggressive what you want to do is you want to turn it up get it running more frequently because those more frequent runs will end up running for less time and it'll also be cleaning things up in a more expedient fashion so these are some different ways you can do that decrease the cost delay or just completely turn off the costing model and increase the number of workers definitely if you have a very busy system increasing the number of workers allows it to work in parallel across more tables managing connections you can bump max connections up I had some folks that were doing it up to as much as like 500 really too much in my opinion really where you want to have max connections is you want to have the number of connections that are actively working in the database along on par with the number of CPUs you have in the system generally speaking that's the most efficient way of running a database server one of the ways you can do that is to use what's called PG Bouncer PG Bouncer allows you to have all your applications can connect to PG Bouncer and then PG Bouncer will manage those connections to the database and will basically multiplex on top of those database connections so it's a very effective tool PG Pool also works watch for idle and especially idle in transaction processes those are bad if it's idle in transactions that means there are certain processes that we can't move forward with like auto vacuum and vacuuming and cleaning up dead tuples almost done folks managing locks so generally speaking if you have a lot of objects in your system and you're using a PG dump based solution PG dump has to actually connect and lock every object that it's going to dump out and that can end up being expensive so a lot of times I'll actually end up bumping this value of max locks per transaction up some so that there's enough locks available for a PG dump based backup a heavyweight lock is required for every object that's accessed by a given process when it's connected so not just PG dump but your applications as well generally your applications aren't accessing that many different tables at a time so it's not that big a deal but be aware that we do create, we do take out a lock I say heavyweight here it's a terminology thing heavyweight in this sense means that it's a full level lock it doesn't mean it's actually blocking anything access share locks don't block anything except if you want to drop the table or something checkpoints I talked about this a little bit earlier but just make sure that your checkpoints are happening due to time and if you have that log checkpoints option turned on like I was talking about earlier every time we do a checkpoint it'll say why it's starting and it'll say when it's finished and it'll give you a bunch of stats about how many right ahead logs there were that kind of information the big thing is make sure it's always starting based on time if it's starting based on X log you're writing more X log than you have your system configured to be able to write out to disk in time so you want to make sure that you basically that means bump up checkpoint segments generally speaking the larger the checkpoint time out a lot of people change checkpoint time out because you reduce the number of full page writes and it also does some other things that are nice in that regard the higher up it is the more right ahead log that you may have pending when your database crashes and has to restart and what that means is that the database will take longer to restart because it has to replay everything from the last checkpoint so if your database crashes and you have time out set to 5 minutes it shouldn't take more than 5 minutes for us to restart if it's set to 30 minutes it could take as much as half an hour to restart PG Badger is really helpful and handy for generating reports about queries so if you're curious about what the query time information looks like against your database I strongly recommend PG Badger there's some configuration options you can find it all on the web alright I have 5 seconds for questions I will be at a bar somewhere or you can find me after this if you have questions I'll be happy to chat, thank you the slides will be up I'll put them up on the wiki here in a few minutes that's what I want to put them yes seems good there we go I'm going to be talking today about explain which is PostgreSQL's primary tool for query execution query planning and execution analysis it is an extremely powerful tool it is also somewhat unintuitive hence the presentation so first let's talk a little bit about what explain is explain tells you a number of things about your queries it tells you what the query planner planned on doing how it planned on executing the query and then gives you some statistics depending on what mode you do it in some statistics on what data it expected and what data it got and can often tell you which step in the query took the longest but explain doesn't do everything for you it doesn't for example if you're saying why didn't it use that index that's not something you can get out of the explain plan sometimes you can infer it from the explain plan but you can't get it out of the explain plan it doesn't tell you how to rewrite your queries to make them faster and it doesn't say anything about other factors that may make the query slow for example you will sometimes find that the majority of your response time in a query like one of the things that I find all the time when I do analysis is I look at the explain analyze and postgres but I also look at the response time as reported by the application and you can have a query where explain analyze says it's being executed in 50 milliseconds but the response time in the application is three and a half seconds and that's actually because there's no transmission or network problems or rendering on the application it's not uncommon at all and explain won't tell you that it will only show you the time spent inside the database and it won't tell you and it particularly won't won't tell you for example things like data transmission time because if you request 150,000 rows from the database that is going to take a significant amount of time to go across the wire and that again does not get shown in explain so it doesn't tell you everything but it does tell you a lot of things so now I did say that explain is a little unintuitive to interpret so in order to understand what explain is showing you I need to actually explain a little bit about how the postgres query planner works because how the explain output appears is really tied to how the query planner works so what happens with the query planner is the parser has parsed your query by the time the query planner gets to it and rendered into a canonical form and so the query planner breaks that down into individual atomic units of work that are called nodes appropriately enough and then the query planner comes up with every possible different way it could execute each one of those nodes and then it changes those nodes together and that will involve some discarding possible execution plans because they can't be chained together and then it will take those chains and estimate a cost of executing that particular chain and whichever one has the lowest cost is the one it will pick and that's the one that will display to you I'm told this was before my time that our query planner was originally written in Lisp if you look at the code I believe it's very recursive and uses a lot of lists of things so but you really have to understand the node concept to see what the explainer is showing you so the basic idea of the nodes is for every node you have two inputs one input will be one or more row sets in the case of joins it will be more than one row set in the case of other forms of operation it will be a single row set a fixed set of data that might be a table and it might be data coming from another query node and then parameters that go into that node that define what the node is supposed to do how it's supposed to execute its thing and then what will come out of it is another row set and that's the query planner showing you so for example if you had a sort node you're doing a sort on a column so what's going to go into that is some scan rows some columns and how you want to sort on them or sort expressions and then what's going to come out of that is a sorted row set and that's going to be the text that explain gives you now I've also mentioned here several times the term cost so Postgres has what's known in the database world as a cost based optimizer as in we try to estimate the cost which is generally sort of an estimation of time because that's what people care of the most about is generally sort of an estimation of time that it's going to take to execute the query and pick again the plan with the lowest cost because everything else being equal you want the query plan that's going to give the shortest response time but it's important to understand that these cost units are completely relative they are not meant to represent anything external to the query planner itself as a matter of fact the cost between different queries are not even necessarily corresponding like you can pretty much guarantee that a query that has a cost of 10 million is going to take longer to execute than a query that has a cost of a thousand but two completely different queries one that has a cost of 1 million and one that has a cost of 1.5 million you can't make any assumption which one is actually going to execute faster the only thing that cost is supposed to be comparable to the same query the higher costing plan should execute slower and then in cases when it doesn't that those are cases where we want to tweak the query planner the statistics or other information in order to make the cost in correspond better with reality now that being said Postgres has this parameter called sex page cost or consequential page cost that people do like to manipulate in order to try to get that cost number to correspond more closely to say milliseconds of execution but it's still an estimate and it's still not scalably comparable so you can only get extremely approximate that way I regard that as a waste of time because you're trying to actually make one thing represent another that it doesn't so let's actually look at some of that so here we have a basic a very simple query with a very simple query plan we're going to these are all out of my sort of fake public library database called libdata that I have on several projects on my github and we're doing it very simple we're looking up a patron by their indexed library card number and Postgres can be implemented in a really simple way it's going to look on the primary key index and it's going to retrieve our one row so now one of the things that's the most important thing like the very first time you jump on Postgres IRC and you say this query is slow when you post an explain plan the first response you're going to get how many people have done this what's the first response you get when you do that the first one to get is please run an explain analyze because here's the thing explain by itself shows you what the query planner planned to do explain analyze I expected to retrieve 1,000 rows from this but I actually retrieved 10,000 rows I expected this to cost x and this is how much time it took so explain analyze actually shows you that now it's important to know that explain analyze does actually run the query because without running the query we can't find these things out that's critically important if you're trying to analyze a right query of some kind like an update or insert or whatever it will actually do the update or insert explain analyze explain will not explain analyze will actually do the update or insert so if you're going to do that it's important to do it inside a transaction so you can roll it back so here's an explain analyze and you can see the extra information that we have here right we have our index scan and we still have our cost information and brother I'm going to go over this in detail in a minute and then we suddenly have these actual time figures and actual rows and in more recent versions of post queries this is from 9.4 you get other information like how long did it take to plan the query now one of the things you'll find in very simple queries like this the planning time is actually greater than the query execution time that's not uncommon in queries where you're retrieving one row based on an index where the entire execution time of the query could be measured is measured in the tenth of a millisecond and right here we see it took us more than twice as long to plan the queries it did to execute it so you get a lot of good information that way now you pretty much always want to run explain analyze because explain by itself you're going to get what the query planner considers the most reasonable plan and most of the time the query planner is right the one of the big cursors from a Postgres development standpoint one of the big problems of the query planner is that it is right so often because it makes us very reluctant to mess with it because if we start messing around with the internals of the query planner what we're going to initially get is a lot of really bad queries until we retune it so it's right most of the time and so for that reason just looking at the explain plan by itself doesn't necessarily tell you anything you want to explain analyze obviously the update case is one if you can't wrap it in the transaction roll it back because of locking or whatever the other thing is maybe the query is so slow it never finishes in which case you can't run explain analyze because it never finishes maybe you just want to check whether or not a particular index is going to be used and the explain plan will tell you that and maybe if you've got partitioned out of you just want to check if you're going to get hit by a particular query you want to make sure that the query is partitioned in compliance so you're only going to hit the partitions that match and again a plain explain plan will tell you that you don't actually have to run explain analyze the whole rest of the time you're going to want to run explain analyze now there are some other options for explain that are useful one of the big ones got introduced in Postgres 9.2 I think is buffers so if you've got buffers then it tells you information about how much data was retrieved from Postgres's dedicated cache versus how much had to come out of the file system now because Postgres goes on top of the file system it can't tell you how many things came out of the file system cache versus off of disk you have to sort of infer that by how long it took but at least giving you an idea of cache data versus red data gives you an idea of just query really slow because we're waiting to read stuff out of the file system cost and timing are on by default that's what you saw before you can't actually turn them off the one reason why you actually want to turn them off is on some platforms such as Win2k server checking the system clock is really expensive it is so expensive that it affects it affects the query it affects the query execution time when you're doing explain analyze and this gives you a really deceptive result and so that's the reason to turn timing off the other thing that we can actually do is we can change the format that is what I've been showing you up until now is the default format the default sort of line oriented text format for explain and explain analyze but we also offer explain analyze in YAML, XML and JSON and actually when I'm actually doing work I use the YAML format a lot because it breaks out each different thing into a separate key value field and I find that much easier to read plus I can put it into scripting tools and then actually look for certain patterns and then there's verbose let's actually show you a few of those so this is an example of buffers so one of the things that we're actually getting here so we've got this and we're doing here now you notice when you want to add these extra parameters you have to use this extended format explain open parentheses parameter and then on or off parameter value and that's the extended format that we're adding for a lot of Postgres commands because we decided a while ago that we wanted to stop extending the SQL parser syntax all the time because that was becoming a real problem so instead you put these in parentheses so it might be a good idea to get used to doing that anyway even though you don't technically need it with analyzing verbose and then we're going to go here and I'm actually doing a select from people and then I am doing a JSON search because this is a JSON field so I'm doing a JSON query on looking up certain key values in the JSON field and then here I'm getting a lot of information and here you can see what we're getting from buffers so we're doing a bitmap heapscan on people right here bitmap heapscan means that we are reading data actually this is a two-part thing so the first part is we do the bitmap index scan which means that we are reading the index into memory and then transforming it into a bitmap so that it can be manipulated and matched up with other query criteria and then it says buffers here as in we hit one data page already in memory and then we had to read 39 of them which is fine you know it's like less than 1k of data no less than 1 megabyte of data not really an issue but then we go up to the bitmap heapscan and we're hitting 18,000 in memory and then reading 70,000 out of the file system so if this query was slow that would be a pretty good hint that it's probably pulling data off of disk and that's why it's slow now here's a cut-off example of the envelope format since it's vertical it would scrolls all the way down but you can see how if you're looking for these specific statistics it's a lot easier to read this it's a little harder to see the whole query because you have to do a lot of scrolling but it's a lot easier to read the parameters because they're broken out into separate fields instead of being in a massive long line that wraps however I'm not using YAML format for the slides it's vertical it just runs off the slide all the time and then verbose actually doesn't give you a lot of extra information it's possible in the future that we might add extra things that display when you're in verbose mode but right now what verbose mode mostly does is remember I said there's row sets going into the node and row sets coming out of the node it gives you all the column names for those row sets going in and out where that's really useful they handed you that query and the query lacks table aliases so you're like hey where did this F name field come from what table is that from that'll tell you that you can actually trace that particular column through its various levels of the query to see what table it originally came from so you can see where it's missing an index for example as well as that obviously it is verbose so be prepared for lots of output but it's useful that way so here we can see actually here's all of our right output all the output columns from each level of the query and for example one of the things that I would say here for this is like hey at this level of the query we're actually getting all of the columns in the fairly broad patrons table even though we're only going to display four of them couldn't we add a filter criteria to you know couldn't we add a filter criteria and actually avoid manipulating some of the data it gets that sort of thing gets actually particularly critical when you have these giant tables that have 600 columns anytime somebody selects star things get really slow so so that's one of the things that you get usefully out of verbose so next we're going to talk about reading explain in detail before I go to that does anybody have any questions about the options like I'm actually going to show you what each different clause means in just a minute but does anybody have any questions about sort of the options for explain before we get to that no good okay cool so the first thing to understand about explain and this is the hardest thing to sort of grok when you're first looking at it is it displays as an inverted tree and what I mean by inverted tree is that the thing that's executed first is the one that's indented innermost so in this particular query which has a join in it this bitmap index scan on loan patrons we're joining the patrons table against the loan patrons table this particular bitmap index scan on loan patrons is what got executed first for certain definitions of first here's the important thing when postgres is waiting on I O it will attempt to go to other query nodes that it can execute in parallel so you will actually have overlapping execution between nodes the no we don't have full parallel query in postgres yet coming in 9.6 we're in a full parallel query in postgres yet so it's not really parallel it's actually more of sort of an asynchronous request model when we're waiting on other systems then we'll go on to other stuff but anyway this is executed first and then second we have a heap scan on loans and an index scan on patrons that's being executed second and again like I said we're going to have some I O going on there because we have some I O going on there those two things will probably be executed more or less concurrently I mean it's context switching by the particular processor but it's more or less concurrent and then when those get executed that all gets passed up to the join which is the nested loop join right here and that gets executed last you know so we have first, second, third so that makes it hard to read because first of all you have to understand that the innermost thing is getting executed first and the innermost thing may not be on the bottom sometimes we have other stuff in the bottom and you'll see some Harrier queries with an example and the things that line up margin wise are executed more or less concurrently so now let's actually look at what's in those lines of explain so here's one line of explain for an index scan on patrons peaky right patrons primary key so we've got the index scan right here so the first part of it just tells us what is the operation we are performing in this node so it is an index scan node those first you know two words tell us what kind of a node it is the index scan node and then the rest of it will be some sort of a text phrase telling us what were the operational parameters with an index scan node right as in an index scan node what index are we using we're using the patrons peaky index what relation are we scanning we're scanning patrons now depending on what kind of node it is that text string is going to be structured completely differently because if we were doing a join node the information that we need is completely different from an index scan node right the when the second thing here is we're going to get some cost information so one of the things that people with mistake always they say hey why is this costing two numbers you know why is this a range here what does that mean does that mean the cost is somewhere between these two numbers well what actually we're showing you here is this number is the estimated start-up cost that is how what is going to be the cost of launching this node at all in the case of an index scan what is the cost to get the first row from the index and then the other cost is what is the estimated total cost for executing this so it's the second one you're generally concerned about the the and then the next thing that we'll actually estimate for you is how many rows do we expect to return right and you know and what's the estimated width and we'll talk a little bit about those you know more in a second the we've got so you know first of all we're expecting a cost start-up cost of 0.28 final cost of 2.5 and again these are arbitrary units estimated to return one row that row is estimated to be approximately 24 bytes wide that width becomes important if it's a really high number otherwise you can pretty much ignore it but if you're turning a whole lot of rows or the rows are really wide then just the sheer memory manipulation of moving those rows around takes a significant amount of time so now the other thing important to know about these costs is that the costs accumulate upwards in the query plan so here is our first query node and it actually has a start-up cost of 0 which probably means it's just a straight-up table scan which has no start-up cost right we've got a start-up cost of 0 final cost of 1.5 and then that accumulates up to the other nodes where we have and you notice here the start-up cost on that is equal to the final cost on this it's not a coincidence it's because what Postgres is telling us is it can't actually start this step until that step is done it needs the data from this one in order to do that one whereas this one it does not need the data for the other one and hence the lower start-up cost and then this one again has an even higher but you notice it's actually starting before, potentially starting before any of these are finished that will often, you'll often see that pattern when we're doing a scan or a join or whatever and this stuff is going to assort well as the rows come in from the other node if there is processor time available Postgres will start sorting them and then this is our sort of final estimated cost for the query so now if you do an explain-analyze you're going to get an actual line as well and so the actual line has some other stuff we once again, this time instead of having a cost which is arbitrary units we have a time in milliseconds so this is very fast 0.015 milliseconds now in this case in the time what we're generally showing is from the row set coming out of the node what was the time to return the first row versus the time to return the last row which is similar to startup cost but not exactly the same for this again is how many rows did you actually return and then you've got this loop thing so the reason why the loop is important is that there are certain things that Postgres will do such as nested loop joins and CTE scans, we'll be defining this in a minute where it might actually execute that same node multiple times in a loop so the thing is if you read this line what you're learning is that executing that node took a total of 0.015 milliseconds right? but in this line we actually executed that particular node 1258 times so this time is the amount of time for one average execution of the loop so the actual amount of time taken by this node is 0.015 times 1258 it's one of those things that trips people up when they look at it and this number of rows is the average number of rows returned from each execution of the loop so the loop thing trips people up all the time because they look at it and they go wait why does this node start only seconds after this other node is finished this other node is finished after 1 millisecond and this one only starts after 2 seconds so what's going on? well that node was a loop it actually executed 10,000 times and it took 2 seconds to do that now the other thing that you'll get in certain kinds of nodes such as scans or joins or other things is that you will get some conditions for example in an index scan you'll often get filter conditions on the index as in hey we're looking for this particular card number on the index so that gives you some extra information about what's going on particularly useful if for example you're saying hey why didn't we use an index for this well look at the filter conditions see if the filter conditions correspond to any index you have if the answers know then you already know how to fix that query so I've thrown around the names of a lot of well actually let me stop we take questions real briefly so we actually have gone over the little parts any of you have questions on the individual little clauses before we go on to node types that's a column name that is a column name right here and that is a value that I supplied yep other questions the width is the number of bytes in one row yes yes the question is if there are 10 rows you only show per row and the answer is yes oh pretty good actually so let's go over some node types you got a whole bunch of different node types in Postgres this is in a slightly incomplete list of node types particularly if you're using Postgres 9.5 because one of the things that we do in Postgres is we add node types all the time certain Postgres extensions will add their own node types the query planner is designed to be pluggable and therefore as long as you meet the node API for the query planner people can actually add their own node types in an extension and they do so you may see things that aren't in this list plus there were some that were added new to 9.5 that are not in this list and other stuff but these are the ones you can see most of the time because they're the oldest ones and the most common ones right most common first of all SEC scan is our obscure term for a full table scan SEC is short for sequential and it means that we're doing a sequential read from desk so that is the short that is the Postgres cryptic term for full table scan is SEC scan SEC scan is something that you want to see in a very small table and you don't want to see in a really large table index scan is scan the index and look up individual tuples in the index in the sort of normal index tree transversal way and index only scan is so one of the confusing things about Postgres how many people were in Quinn's talk yesterday morning so one of the things Quinn went over is how Postgres does in place concurrency because rows are not deleted from Postgres immediately they're marked as invalid for a certain range of transactions and then garbage collected later well so at any given time Postgres has to check what are called visibility roles to determine if you can see a certain rows or not and the visibility rows are stored in the base table however starting in Postgres 9.3 we said hey if all of the data in the table is really old then we don't actually have to check visibility rules and so we added index only scan and index only scan is really nice if you're getting sorted output from a really big table because it means all we do is scan the index and we don't touch the base table at all you won't see it as often as you would like to see it but it's in there now bitmap scan is scan the index and then build a bitmap of which rows correspond to our criteria one of the primary reasons to do that is I'm going to scan a second index or an index on another table construct a bitmap and then I'm going to overlap the two bitmaps in order to see which rows correspond this is how Postgres can combine using two or more indexes on the same query something that certain other open source databases can't do the most of these kinds of scans will contain a filter condition obviously sometimes you use select star from table it's a small table that's not going to filter gradation but most of the time you're going to select star from select columns from table where and that where clause is going to become what's known as a filter condition showing what you're getting but do keep in mind this is a node you know per node thing so sometimes the filter condition will be something that was generated by the query planner or executor based on criteria passed elsewhere in the query you know and sort of bubbled up so here's an example of a bitmap scan and this is why I said visibility rows so we actually have two steps here we've got the bitmap index scan and so this is where we scan the index and we looked for all of the all of the items in the pages in the index that corresponded to that particular library card number and constructed a bitmap now the problem is that even in the bitmap index scan we do have to check visibility roles and in order to do that we have to do a bitmap heap scan and heap is again another cryptic term for base table so we do the bitmap heap scan to actually check double check that all of these rows correspond to our visibility rules before we proceed on to the next step in this particular operation the bitmap index scan you will often find that the bitmap heap scan takes longer than the bitmap index scan that's because the base table is much larger than the index and it's not ordered yeah right yes so what's happening is we're getting all of the rows so the index does not have visibility information so we're getting all of the rows in the index that have this card number and then we're checking those rows and only those rows in the heap in the table that is so join types Postgres supports lots of join types in order to have different strategies to see which join nest loop join nest loop is the join that's most intuitively understandable which is for every row in table A look up all the rows in table B and loop over them one at a time this is the fastest join strategy when very small numbers of rows are involved because there's almost no start up cost it is a terrible strategy when there are a lot of rows involved because it is completely serial and there's no batching so if we're joining some big tables Postgres is going to try to do some other things like for example a hash join where we actually make a hash map based on one table and then shove stuff into hash buckets from the other table based on which hash bucket they fall in and then sync them up whether or not a hash join is used is often dependent on how you have set the parameter work memory because that defines the largest hash map you're allowed to have as a query operation so if you think Postgres should be using a hash join and it's not one of the things you can try doing is bumping work memory a merge join says hey I'm going to want sorted output but you often see the query plan you're doing this when it knows you're going to be sorting afterwards in that sort corresponds to indexes that you have in both relations so I have an index and first name here and an index and ID number here and I'm going to be actually I have an index and ID number here and an index and ID number there and I'm going to be displaying them an ID number order and they're both big tables one of the things Postgres will do is say hey I'm just going to sort everything according to ID and match everything that matches up so you don't see that as often as the other types because you really need to have certain query conditions for that to be the efficient plan it has a huge startup cost but if you're joining a big row set against another big row set it's often the fastest plan more joins because like I said we support lots of joins in Postgres lateral join is most often used against what are known as table expressions that is functions that return a table and you want to actually pass in criteria from one table like a scan on a table and you want to take the results of a column from that scan and pass that as an input to the function so what you do is what's called the lateral join it's really funky syntax a lot of people can spend years in Postgres without using it but when you do need it it's really useful and then as a separate type a semi-join this is how we execute outer joins a lot of the time and hey we're only going to join against the rows that match in this table and we're going to ignore the ones that don't you're going to ignore the novel ones right and the opposite of that is the anti-join that's when you say a not in expression or where not exist expression we will often do an anti-join so some examples again this is joins now we have a hash join here and one of the things I mentioned there are all the conditions so in the join is you're going to find the matching conditions between the two tables or relations or synthetic row sets or query clauses that you're joining right so in this case we're matching the stlsourceid against the source.id in the other table and merge join the same right here merge join we're matching this against that I pulled these off of the explain depth says which I'll show you in a minute and you can actually see some things that are happening like for example you notice one of the reasons that merge join is being used here is because what we have is we have two giant huge tables we've got a row set of 30 million here but the resulting and you can actually kind of see how this filters out but that's actually examples of some of these node types so more node types these slides by the way are online if you want to review them later on don't expect you to memorize all these aggregates one of the things we do in Postgres is aggregate some total average median standard deviation all kinds of other things so when you see an aggregate group aggregates that's our basic aggregate again fastest way to do things on small row sets just go ahead and add up the values and when they don't repeat doing the value and that sort of thing a hash aggregate says okay we've got a lot of data so we're going to construct a hash table out of these values and then put stuff in hash buckets and group them that way much faster for if you're going to have a lot of rows but your aggregation column is indexed we'll do a hash aggregate window aggregates are used to support windowing clauses how many people have used windowing clauses yeah those of you who haven't probably should they're awesome they allow you to do something in one query that would have taken two or three I'm not representing some of the new 9.4 aggregates in here grouping sets etc which you'll now see as new node types or 9.5 aggregates sorry you'll now see as new node types so again examples of that so here's our hash aggregate so we've got our usual execution time and cost and then the criteria information we get here right is the key on which we are grouping right so these are the three columns card no, last name, first name that I am grouping by and we get that as our group key and that gives us our condition information or in the group aggregate we have the same thing other operations that you'll see expressed as nodes although often although you know they don't correspond to other things occasionally you'll see unique it's a lot less common now that we have other ways to deal with returning a unique row set that are less expensive but occasionally you will as a way of deduplicating rows and the primary things that will produce this is select distinct and union as opposed to union all and by the way that's a query tip for you if you're doing a union query and you don't have a good reason to deduplicate then use union all because it is much less expensive to execute the deduplication is expensive but most of the time you won't see unique because postgres will do something like a hash aggregate instead in order to implement this under the hood the a sort should be obvious right we're going to have a sort limit using limiter offset that's the node you'll see is the limit node other operations CTE scan that's with clauses so if you have a with clause and then you have another query portion that is querying that with clause you'll see a CTE scan node that's our reference to the with clause sub query scan is the same only for sub queries and this is usually in cases where the results of the sub query can't be folded into the main query we try to do that whenever possible because the thing that postgres can do the fastest in terms of corresponding rowsets is some kind of join so one of the things that the query rewriter tries to do all the time is come with a way of how can I turn this into a join so if you're seeing some of the sub queries scan it's because postgres couldn't figure out how to turn it into a join materialize that's actually our most expensive sort of sub query or with clause which says hey in order to actually execute this sub clause I had to actually turn it into a built constructed complete row set in memory which is expensive and that's what materialize means so the query that can be a reason why it's slow and then an append and that's for like union all and partitioning and other things where we're appending rowsets to each other vertically so ctescan example we're going to have all kinds of things ctescan and it's going to have this and it'll refer to here's the name of the with clause toe tab and then there'll be a filter outside the ctescan and in recent versions you'll get this tremendously useful rows removed by filter because that rows removed by filter is actually the majority of the rows that were returned you start saying hey is there some way I can push that criteria down inside the with clause so that I'm not doing all this expensive filtering so having done all of that analysis the real actually I can take like one minute of questions on the node types before we start looking at why is my query slow any questions yeah your statistics Postgres keeps statistics on I think I talk about this in why is my query slow actually so let's actually see let me talk about why is my query slow if not I will revisit it so why is my query slow so one of the things that you may have seen you've noticed the background of my title slides here well that background is this wonderfully useful tool maintained by Hubert out of Poland called explain.devces.com and what this tool allows it's an online website in PHP that lets you paste in a a query text and it will it will kind of parse it out and it will attempt to actually find certain things about your query there are common patterns for why it's slow like a huge difference in the estimated number of rows returned versus the actual number of rows returned it'll also tell you which steps took the most time right because one of the problems is remember I said those costs in the times cascade upwards well so it can take you a fair amount of math to figure out how long did this step actually take independent of the other steps and so this will do that math for you as you can see whoops inclusive versus inclusive times right so in this particular query hey this bitmap index scan is where I'm spending most of my time and explained at epsys.com has highlighted that for me so tremendously useful tool do keep in mind that it is somebody's public web thing so if you have data restrictions you work at a HIPAA compliant PCI compliant company etc. you might want to pull down his source code for this and run it internally instead the or he actually does have a button there to obfuscate all your column names I don't know I don't really do any of my sequels so maybe yep the um there are some other tools you can use the PG admin GUI web desktop client will do a visual diagram of the query execution I tend to find that it only works for very simple queries because the diagram gets unwieldily large very quickly there's a more sort of steroid version of this originally developed by redhat for redhat database and now maintained as part of the enterprise db developer studio um I haven't actually seen this because I've never used the edb developer studio but it's there um however for a lot of people their primary tool for helping interpret explain plans is the PostgreSQL channel on irc.freenode.net um the um where people come in and they say guys this query is slow and they post their explain plan and then they get please do an explain analyze and they post their explain analyze then people are like ah so particularly when you're learning how to interpret explain plans the irc channel is really invaluable so here's some examples of particular query problems that you might encounter and how you'd actually get that out of the explain plan um so here's one where we simply need an index that we don't have um because you can see here we're doing a full table scan and we're doing a full table scan on a pretty large table 200,000 rows probably full table scan is not the best strategy for this it's certainly not happening quickly um and we've got this filter called category equals 7 now there's actually a little bit more to this query I just sampled it out there were other columns being selected but the other columns were what you call selective there were columns for each each value had thousands of rows and postgres will only use an index if it can pull a very small portion of the table usually no more than a couple percent beyond that it's actually cheaper to do the sequential scan um so the answer for this particular query was to modify one of those indexes to include this category column and all of a sudden the query got orders of magnitude faster another one example is bad row estimate now I've actually flexed with the alignment here so that you can actually see this um I didn't actually pick it out um so index scan and this sort of thing and here's the cost here's the estimated number of rows returned from this here's the actual time and here's the actual number of rows returned well we're a couple orders of magnitude off now these are estimates um I generally say anywhere within a factor of 5 that is 5 times as many to 1 5th as many um well doesn't generally change the query plan sometimes it does when you're really close between two different cost estimates but it doesn't generally but things where you are 2 orders of magnitude off will change the query plan and this particular case you know if you're actually pulling 10 percent of the table then in index scan is not the way to go um the um it's going to be slower than doing a sequential scan so um and you can see where this estimate gets off um trying to remember if I talk about statistics later anyway we'll find out um now another one that you actually have here is I mentioned that loop thing right and this is where a lot of people miss the calculation which is we're doing a nested loop query up here and we're corresponding to 10,000 rows which means that um on this actual index scan that we're doing is we're actually looping over it um you know almost 11,000 times and that's what's actually consuming a lot of our time that's why actually the end time for this is what it is most of the time it's being spent in that loop and so you look for these things with very high numbers of loop values usually there's a better way to do that now this is a complicated one um but it's really annoying because there isn't a good easy way to fix it so one of the things we implement in Postgres is this thing called abort early plans and the way that an abort early plan works is hey, you've asked for 100 rows from this table and I don't have a good index to use or I don't have um uh you know or I'm joining against something else or whatever but the criteria you've specified are very broad and therefore I think I can scan 1% of the table and get your 100 rows and then I can quit the problem is if Postgres is looking about how common your criteria are or how well distributed they are like say that your criteria are common but all those rows are grouped somewhere down towards the end of the table then it can end up scanning the whole table so um where you can actually see this here um is you notice this really weird circumstance look the cost of this node is $1,000 and the cost of the node above it is only $2,600 wait I thought you said cost accumulate upwards so the cost of accumulating upwards why is this cost this low well the answer is Postgres is expecting to abort this join long before it finishes executing it but if it's wrong that can be a really slow query um and the reason why it's doing that here is if you look down a node and you say index scan backwards so what happens is you have an ascending index and then you ask for everything greater than x we'll do an index scan backwards um and um and we expect that the index scan backwards you see we've got this right here $4.2 million rows on the index scan backwards so Postgres is figuring that those index criteria apply to a lot of the table um but it's wrong they apply to a much smaller slice of the table and you scan a lot more of the table to find the 100 rows you were looking for um I did say I would mention statistics so the way that Postgres um the way that the query planner figures all this stuff out is that we have tables of statistics kept in the Postgres system catalog about what's in all of your tables and these statistics are asynchronously updated however they are samples number one they're based on sampling the table because we don't want to full table scan all tables every time we update the statistics they're prohibitively expensive so number one they're based on samples and number two um we're then applying algorithmic rules to those samples so we have a list of the most common values we have some histograms of ranges um we have some information on how unique values are in the column and that sort of thing but because all of those things are estimates they can all be off right so if you apply an estimate to an estimate you get an estimate that has an even higher range of error um than you did originally so one of the things that you can actually do is mess with some of the execution of things now the brute force way to do this is to actually force certain plans in Postgres by turning off certain types of node so if you say set enable nest loop equals off then you're preventing Postgres from executing a nest loop now that is fine that is great as a way to test different query plans like hey if this was a hash join would it have been faster well let's force that by disabling nest loops right and then you can see hey no a hash join was actually much much slower that's why the query planner predicted it was right so it's useful for testing the mistake that people make is that they actually change these things in postgresql.conf because here's the thing if you do that in postgresql.conf you're not disabling nest loops in that one slow query you're disabling nest loops for every query in the database which means a bunch of queries that used to be fast will all of a sudden become slow don't do it that's for troubleshooting queries only now one of the things you can do that's more useful is I said that the statistics were based on sampling you can increase both the default sample size and you can increase the sample size on specific columns the default sample size is like 100 samples which is very small if you're looking at a table that's got 50 million rows so you can increase it up to 10,000 samples anywhere over about 500 to 1,000 I really recommend doing it on a per column basis because if you're sampling say a large text column that gets really large and if you start bloating out the statistics table it itself gets slower at which point all of your query planning gets slower so you don't really want to do that do it on the columns that you frequently join on for example the other thing that you can do is that you can actually clobber the statistics in certain ways indistinct is the estimation of how unique a column is and because of there's no such thing as a perfect algorithm for estimating indistinct from a small sample so sometimes Postgres can play off based on how it took the sample and the thing is it's not that hard for you to get a better estimate of indistinct simply by doing a group by query one time the indistinct of tables that have been used for a while doesn't tend to change much over time and so you can overwrite it with a set statement and then you go ahead and run the explain again to see if it actually changed the query plan and in this case for our abort early plan it did change the query plan by changing some of the statistics and saying hey this abort early plan is bad because there's a lot more rows than you think by increasing the sample size we changed the histogram and Postgres suddenly realized that more rows would correspond to those criteria and therefore the abort early plan was a bad idea sorry less rows would and therefore the abort early plan was a bad idea so that's your sort of cycle so that's your basics of explain so you have about five minutes for questions so we have questions yes, yes you do if you're doing any of these statistics things, you have to run an analyze statement or it doesn't take effect until the next analyze so either you run an analyze statement or you wait for autoanalyze to get around to it but no, so the thing is like say you've got a B tree index right, that is a normal index on last name or a B tree index actually on a date added right and then you want to see everybody whose date added is after a certain date then it's going to do a reverse index scan because that's a descending search but the index scan is an ascending order I'm not entirely clear on why reverse index scans are slower than forward index scans it has something to do with the data structures that we use they are actually a little bit slower so for example if you are searching on greater than that date column all the time you probably want to remove that ascending index and replace it with a descending index a little bit faster more questions? the question was whether or not you could force it to use a particular index and the answer is no we haven't implemented that in Postgres for a variety of design reasons the you can manipulate a bunch of other things that will end up causing it to use that index and there are certain things that you can do with indexes like Postgres supports the concept of partial indexes where you do an index on where? yeah it's generally better in that case to actually look at why is it not using that index in the first place often if the index really would be the fastest way to do it and it's not using it there's a good reason why like actually your criteria don't match the index as well as you think for example one of the problems that you'll have in indexes on text columns is mismatch text encodings so you're using actually a different correlation in your client then the database then the column was populated with and therefore we can't use the index on the text column or you're using say a like or unanchored text search and you're using a like criteria and the text column wasn't built with there's this criteria called text pattern ops that you can apply to an index on a text column that makes it support like and otherwise it doesn't unless you're in C encoding there's a bunch of little things like that will actually prevent postcards from using a particular index even when you think the index is applicable and sometimes the index is actually slower in that case it would be nice to be able to force it to use the index just as an experiment but we don't currently have a way to do that so I did a tool a while ago and I'm trying to remember where it is on GitHub that actually looks through all of your tables and then based on an estimate of how many rows it has in your tables it takes all the columns that you've bothered to index deciding that those are your important columns because why would you index them otherwise and increases the statistics to a value based on a coefficient of that I did that quite a while ago though and I don't remember where it is ping me later on and I'll find it keep in mind that increasing the sample size in postcards is not always beneficial and it's not always size based that is in a lot of cases where you want bigger samples is not because the table is bigger but because it's more asymmetrically skewed and then like if your data is completely randomly distributed then you can go with the sample size of 10 right it's when it's not randomly distributed that you need larger sample sizes so in cases of highly skewed tables right for example if you have a column that represents activity status and then you're corresponding that against date well obviously the active rows are going to be most of the ones with the latest dates but postcards won't really understand won't know that instinctively and so that having higher sample sizes there's been various proposals to actually make the automated sampling correspond to a percentage of the table but none of those have been incorporated into mainstream postcards because that would actually make more sense from my perspective yeah one of the things to understand is until Postgres 9.6 we will not have any correlation stats between columns so Postgres assumes that values between two columns are entirely randomly matched which is a really problem when they are entirely non-randomly matched like say month and year right or zip code and state Postgres will estimate way low on the number of rows returned from that we are getting column correlation stats in 9.6 for some value of that and that situation will improve a lot but for right now if you're saying hey I put criteria on two independent columns and Postgres estimated really low orders of magnitude low it's usually because those two columns are actually highly correlated but Postgres has no way to estimate that and sometimes the answer can be making a multi-column index which Postgres will evaluate according to different stats and sometimes there isn't really good answer for that so I take one more question then we have to quit one more question okay awesome thank you very much go ahead and get started I'll be talking about all the big data science stuff that Postgres has features and capabilities that maybe some of you haven't heard about talk about developer features administration performance there's a community aspect to this as well I think and then finally I actually have a case study for one of our clients that kind of shows how you can put all these different features together so developers there's a number of things that Postgres has that you can use to make your job of developing an application or developing predictive analytics models easier the most obvious of which is the JSON data type this was added in Postgres version 9.2 and there's been features implemented around JSON in every release since then 9.5 gains the ability to actually go in and very easily modify parts of the JSON document without having to go through a bunch of extra work to split things out and bolt the document back together there's two different data types two different JSON data types that Postgres supports the plain JSON data type basically takes your JSON and it stores it as text keeps it in the raw form the only thing that it really does is it validates that it is valid JSON that's nice if you expect to be ingesting JSON and then not really doing very much with it it is faster to pull data into that data type because it doesn't have to parse it it just does a syntactical check the second type is JSON B or binary JSON that data type when you provided a JSON document it actually goes and it parses it out and it stores the JSON in a compact binary format so your original formatting and things of that nature are gone but the nice thing is that the access to that data is then much faster the other thing that you can do with JSON B is you can actually index intelligently on it there's two different indexing methods that are provided one only operates at the very top level of a JSON document but it knows about every single key the other one operates really at the path level so it knows what paths are in your JSON as I said there's a very rich set of very rich support for it as of 9.5 there's 12 operators and 23 built-in functions for it and importantly there's also very strong interoperability between the JSON data types and other data types so what I mean by that is you can do things take a Postgres row out of a table and say hey take this row of data and turn it into a JSON document a single function call is very easy and whatever your input data is you then get a valid JSON document out and this is just one example of the kind of interoperability that you have with JSON the next thing that Postgres features is really connect to everything that you could possibly want to connect to foreign data wrappers is the foundation for this those are added in 9.1 there are now just short of 100 foreign data wrappers in fact there are so many foreign data wrappers that there is a foreign data wrapper that reads the list of foreign data wrappers for you there are 15 foreign data wrappers targeting NoSQL technologies across 10 different NoSQL databases and there are 7 that are targeting big data systems like Hadoop and HDFS now if you need to talk to something that is not that doesn't have a foreign data wrapper provided for it first of all it is not that difficult to produce a new foreign data wrapper there are 2 projects one is called multi-corn the other is called I read it yesterday and I can't remember the name of it but basically what multi-corn does is you can write a brand new data wrapper in Python the other version of that is the same idea it makes it very easy to develop a foreign data wrapper in Rails or sorry in Ruby should all of those steps create your own foreign data wrapper the next capability that you have is support for multiple procedural languages and from the standpoint of interoperating with other data systems this really gives you the world because you can write code to run natively inside a Postgres database in Python in Ruby in Java so whatever data you need to get at there's a procedural language that will allow you to get at it assuming that there's not a foreign data wrapper that you can use already and again because Postgres is so extremely extensible if you have a language that's not supported like for example I look to see if the Go language had a PL language and I didn't see it so that might be an interesting project is to add support for Go in Postgres finally on the developer side Postgres data types are themselves extensible perhaps the simplest version of that is a domain which really is a data type that you put some additional checking on and also create a composite data type so for example Postgres does not have a data type for complex numbers like Python does but if you need to store complex numbers in Postgres you can easily create a complex data type that just has your real and your imaginary components as two fields to a single data type you operate on that as a single data type so from the standpoint of data it's one single number but that single number has two components you can also create brand new data types just last week I decided hey let's see what it would look like to take a NumPy NDE array, a Python data type and port that into Postgres and once I figured out really most of the difficulty here was figuring out some compilation issues between Postgres and Python in order to implement that data type it's like 10, 20 lines of C code and 10 lines of SQL code and here's a brand new data type now that you can make use of in addition to the type extensibility there was a feature added in I think it was 9.4 or 9.5 called transforms and these allow you to control how specific Postgres data types are relayed to the different procedural language functions so for example the NDE array data type I created a representation function, those of you that are familiar with Python will recognize the wrapper format and you know this is a function it says hey I return text I'm a Python language function the important thing is the transform for type NDE array that tells Postgres hey there's a special thing I want you to do for this data type so when you call this function in Postgres and you hand it an NDE array data type that gets treated in Python it's an NDE array data type so all the Python function has to do is say oh well here I'm just going to return the representation and that's what you get out so all these developer features ultimately what they mean is that if you're working with complex analytical tasks the last thing that you want to have to do is go learn a whole bunch of new tool chain go figure out a whole bunch of extra stuff you just want to be able to do your job and get it done and move on and these developer features make that very easy for you to be able to do that and any place where you see something missing and go oh man I wish Postgres did this the extensibility capabilities make it easy for you to add something that you don't have and by the way if anybody has any questions during the talk please just raise your hand I don't want anybody to wait till the end so next up is administration if you're going to maintain a big data environment administration becomes a really key facet to this and there's a number of things that Postgres has that honestly I think are actually superior to a lot of the big data options that are out there that are more traditionally used a big one of those is backup I think this is something that a lot of the big data tools they just go oh the data set so big I mean how can you back up 10 terabytes of data well data is your business most companies if they lose their entire data set just lock the doors because you're done so I think dealing with backups that needs to be a foreground thought that needs to be something that you think of upfront and Postgres has done that so there are two major ways to do backups in Postgres point in time recovery is really point in time recovery is an extension of the same kind of crash recovery system that Postgres uses to deal with a server going down so it uses the right ahead log and takes that right ahead log copies it to a separate place and that in conjunction with just a plain old file system level copy of your database that constitutes backup the really strong capability that point in time gives you is that when you need to do a recovery you can say okay recover to this exact transaction because if you look at traffic on the mailing list and when people are asking about backup questions I'd say probably half the time it's not because they lost their entire database it's not because a file system just went kaboom and everything's gone it's because somebody made a mistake and they said oh this table with test data drop table test data oh wait that wasn't test data that was production data and now you have this database that is all still there and it's up and running but now you can't get to this very important data so this point in time recovery capability makes it much easier to deal with those kind of situations because you can recover the database to the exact transaction before the mistake was made. The other backup option that you have is pgdump from a big data standpoint probably not what you're going to want to do it has gotten significantly faster it is now possible to run both a dump and a restore in parallel really the reason that I put it on the talk is depending on the criticality of your data the thing that I really like about pgdump is it's basically impossible to screw it up point in time recovery there's different ways to configure it there's different tools you can use to configure it it's very binary binary level you can't do things like take a point in time recovery on say an intel architecture and move it to a different architecture which isn't as big of an issue today but whereas pgdump it's a sequel level dump of the database so when you're thinking about disaster recovery I would encourage you to at least ask yourself the question hey should we be doing periodic pgdumps as a last ditch recovery safety net replication is replication with big data is interesting because a lot of times you're dealing with these very large data sets that you don't necessarily think oh I want multiple copies of it but then again a lot of times you're dealing with these data sets and tools that run in a sharded environment and as part of the sharding they keep more than one copy of the data so postgres you can do the same thing there's multiple different options that you can use to achieve that and that gives you a lot more flexibility probably the easiest thing to set up is just simple streaming replication so this builds on top of the same code base and the same concepts that are used for point in time backups but it extends it so that now it's a point in time backup that's happening continuously and at the same time that it's happening you have a server that's restoring that backup and that you can run queries against so you can stand up these replicas they're very easy to stand up you also have the capability for these to be truly synchronous replicas so what that means is if you have synchronous replication enabled and on your master you do a bunch of work in a transaction you go to say commit that commit is not going to return to you until the commit record is recorded by the replica as well so when that commit comes back to you you know that that data is stored not only on the master but also on your replica and in 9.5 you can actually have multiple synchronous replicas one of the other nice things about the synchronous replication is that you can control this on a transaction by transaction basis you can have a transaction in fact not only that you can have a transaction where you say look I just want the transaction to come back ASAP I don't care if the data is on disk so if you have tasks that need to deal with ingesting data at a very high rate this is something very powerful because you can do the ingestion get the data sent over to Postgres and get an immediate return from Postgres let your code keep going gain more throughput without having to resort to actual you know parallel processing or something of that nature or something asynchronous the other interesting thing that you can do with streaming replication is you can have the exact opposite of a synchronous replica you can have a replica where you say I want you to intentionally be four hours behind whatever is happening on the master this is very useful from a disaster recovery standpoint because again if we go back to the scenario of somebody goes and they drop a really important table and now you have to recover you don't have to worry about oh we have to go we have to get the backups we have to stand up a recovery server we have to start processing the backup no you have a server that's been doing all that now all you have to do is tell that server hey something bad just happened so I need you to recover to this point and go do that and as soon as it's done with that recovery it will come up and then you can pull the data off of there Postgres also has logical replication support with a PG logical extension this was just released not even a month ago I think it's very similar in nature to trigger based replication but the big difference is PG logical uses the logical replication framework that was first added in I believe it was 9.4 that is based on the transaction log so this is logical quote unquote statement level or trigger based replication that doesn't use triggers so it's extremely fast it is comparable in performance in fact to streaming replication the other thing so why would you want to use this one reason is streaming replication you have to replicate an entire Postgres instance all the databases all the data all the tables the whole line yard you have no choice no matter logical replication you can replicate only what you need so if you have certain tables that you know a subset of tables that are hit heavily you want to do some predictive modeling on them things of that nature you can take just those parts of your data break them out to a separate server and not worry about other parts of the database that aren't important and you need to support that workload the other thing that you can do with logical replication is this provides a mechanism for doing a a Postgres version upgrade where you can actually test out the new version of Postgres you can run queries against it if it looks good and you're happy with it you can switch your production environment over to it reverse the replication so you still have your old version of Postgres it's up and running it's got all the the current data if the new version doesn't work out in production there's a problem you can switch back the in place version upgrade that you that is more typically used PG upgrade that doesn't have that capability even if you use the the method of upgrading that copies all the data once you've stood up your production traffic well now your old version is out of date you don't have the same information from a multi master standpoint there is now binary data replication this builds on top of the logical replication framework it's very similar but instead of being truly acid as PG logical is this uses eventual consistency and it supports wide area replication there's a number of people that are running this where they have data setters spread across the planet and BDR is very happy dealing with the kind of latencies that you get on those kind of of internet links the one kind of downside to BDR is that right now for BDR to function there's a few things that it needs within Postgres itself that are not in Postgres yet so it is currently it currently requires a slightly modified version of Postgres so it's not 100% true community Postgres finally there are other replication schemes that are out there for Postgres Sloanie and Lone Deast are probably the two most popular there's a few other ones that are out there at this point in time I would really tend to recommend that you don't use these ones unless you have a very specific need PG logical or streaming replication will most likely solve all the needs that you have better than either one of these would so clearly when we're talking analytic workloads things of that nature performance becomes very very critical and this is something that the Postgres community really puts a lot of emphasis on there is a tremendous amount of work that happens to to constantly improve Postgres performance and I think that one of the really important things to understand here is that you know big data there's all this focus on ooh web scale we have to scale horizontally and for a large enough data set that's true but there are a lot of data sets that Postgres is very happy to run on a single vertical server and you don't need the extra overhead the extra expense of colo space for 10, 15, 20 extra servers running around when you can get a single large server that will do the same thing I've certainly run I personally run databases that peaked at over 6000 transactions a second and that was on a 4 terabyte OLTP database um running on I think it was like a $20,000 server and that was on IO hardware that had problems as well so despite some IO issues Postgres was able to scale very well to that and that was version 9.2 the version number here matters because as I said the community puts a very heavy emphasis on continually improving the performance capabilities of Postgres it wasn't that long ago where if you wanted to score if you wanted to scale Postgres past 4 cores it would ramp up to like 4 maybe 6 cores and then you had this downward angle well now it'll very happily do 80 cores um there are significant improvements that are made in every major release and typically the number that I tend to go with as a ballpark is every major version of Postgres improves performance by about 20% obviously that's very heavily dependent on your workload some people won't see it tremendous you know they may see 5 or 10% getting a problem area and they'll do a version upgrade and go oh holy cow it's twice as fast so why does this matter for big data well data keeps getting bigger it wasn't very long ago when a terabyte database I mean that was a big deal if you had a terabyte now a petabyte is a big deal you know so that's gone up by a factor of a thousand and what allows you to manage that kind of growth is your technology platform basically has to grow faster than your data is and Postgres performance improvements play a significant portion of that because if you're getting 20% year over year from Postgres that means you don't have to look at your own software stack and go man where are we going to get the next 20% from it's a very good comment because a lot of the NoSQL solutions are very very space inefficient and it's kind of easy to dismiss that with oh disk is cheap and yes disk is cheap but what's not cheap is the I.O Alvaro the creator of ToroDB gave a talk a couple hours ago in the other room and he's got numbers that showed comparing Mongo to Toro so it's sorry let me take a step back ToroDB is the MongoDB wire protocol implemented on top of a relational database specifically Postgres though theoretically you could run any relational database and the testing that they have done they see that storing the same data in Postgres versus storing it in Mongo it's anywhere from 68% of the size to 38% of the size so when you get down to 38% of the size that means that two thirds of the data that was in Mongo was wasted space basically and that makes a big deal from a performance standpoint so no that's a very good point it's certainly relevant so finally coming out in 9.6 after I don't know how many years of us wanting it Postgres is getting parallel query execution this is a tremendous amount of work that's been done by a handful of people right now it supports parallel scans I know that there's parallel join execution I don't know offhand if there's parallel index reads I don't know if anybody here does and it does support or they're working on support for parallel aggregate operation one of the things that people have a tendency to focus on with big data is this whole horizontal scaling and admittedly this is an area that Postgres is not as strong on but there actually are options here first of all is Postgres XL this is a fork of Postgres that allows you to run distributed transactions on multiple instances in a fully acid manner from a user interface standpoint it basically looks like oh it's just one Postgres instance it just happens to be spread across a number of different servers there are some other variations of this in fact Postgres XL grew out of something called Postgres XC so L comes after C you want the newest one there's also Postgres R there's also Green Plum which was very recently open sourced Postgres XL seems to kind of have the biggest weight behind it right now so that's why I mentioned it specifically the other option that you've got for sharding is a project called PG Shard and PG Shard I was going to put examples in and I completely spaced out and didn't do it but basically if you want to take a table with PG Shard and spread it across multiple servers you run one function to say hey this table I want you to shard it on this field and you then run a second command to say ok now that this is sharded go actually spread that data across multiple clusters that you have once you run those two commands you write queries against that table and they just get spread to all the different data nodes there's nothing else that you really have to worry about so very very easy to use very flexible the only downside is that with spreading the data out you don't necessarily have any more so if you want to do updates and in fact it does require that when you're doing updates you have to include the partition column in your where clause of the update so really updates are not going to be acid there are going to be spread across multiple nodes but as long as you are ok with that it's a really good option for getting horizontal scalability should you need it Postgres also has support for column stores for certain workloads certain data locality patterns a column oriented storage technique can potentially be a lot faster than doing the equivalent in a row based storage mechanism the C store foreign data wrapper is I would say the most up to date version of this it was created by Citus data it's basically one of the parts of their commercial technology and they've open sourced it for the community to use it is implemented as a foreign data wrapper so that's kind of the interface into it there is also an in memory column store there is a presentation on this at PG con last year and I did I did look when I looked it up there is current activity in github for this so there was there were commits back in I think October so there is activity on it but I haven't frankly seen much publicity on it so if you need a column store I kind of included it for sake of completeness if the C store FDW doesn't do what you need you might look at the in memory column store and there is also a foreign data wrapper to Monet which itself is a column store database finally on the performance side of things Postgres has support for GPU utilization for certain workloads this can provide just tremendous speed improvements because GPUs are all based on this idea of take a single or a handful of small operations and apply them to an enormous set of data as fast as you possibly can that's how graphics processing works and for certain activities in a database that actually lends itself very very well I would encourage you if you're interested in this go ahead take a look at the wiki page it's got some performance graphs on there and admittedly the performance graph is from an ideal best case scenario but the speed improvement is orders of magnitude like a factor of 100 or more so if you're pushing the envelope performance wise this can be a great thing and the nice thing is that it works natively once you load the extension it just goes and does its things so if it sees a query that it thinks would be a good candidate for running it's just going to go do it you don't have to sit there and manually hand code or anything like that finally another aspect of this that I think comes into play is there is really a very large very extensive community that has built itself around postgres and these are two examples that come to mind that I think are relevant to big data not anything comprehensive postgres is a restful API for postgres so if you want to be able to treat postgres kind of like is if it was a no-SQL database and provide a nice easy web interface to it postgres will do that for you also as I mentioned ToroDB is the Mongo wire protocol implemented on top of postgres so to kind of put all this stuff together I've talked about a whole bunch of different features just trying to kind of get you thinking about the different things that go into predictive analytics and data science and some of this big data stuff let's look at what happens when you actually put all this together so in a traditional analytics environment there's a number of different steps that get performed in order to go through the work of building a predictive model and trying to vet it out and trying to decide okay is this thing working is it not and really this is pretty painful so the old school traditional way to do this is first you would do a bulk export import of your production data into an analytic tool like SAS when you're dealing with a four terabyte database doing this becomes painful and then having to keep this thing up to date and having to do that every time that you need to spend a new version of the model it kind of stinks once you've got it in the tool now you can use the tool to start identifying correlations, patterns in your data things to be looking at and use that information to start creating your models and at this point you're basically creating your model by hand it's almost as if you're just writing the model as math formulas on a piece of paper once you've got an idea for okay these models here's what I think is going to work you'll do some amount of validation of the models based on the data that you've loaded into your analytics tool but ultimately you're probably going to have to take this model and run a regression back against your raw data set basically trying to simulate okay if this model was up in production when this event happened what would this model have said would it have made a better decision than the old model did and there's different ways that you can do that the way that I've seen it done is that you take the model and you write it in pure SQL and that can end up becoming kind of messy and difficult because even if the model itself isn't terribly sophisticated if you're dealing with a model that's looking at 50, 60, 70 different variables you've got a hand code well how do I pool each one of these variables out how do I present it and then you have to run the calculation as well and you're also worried about trying to do this in a way that's going to perform reasonably well so that you can run this test and it doesn't take three weeks to run so you go through some iterations of that and you use the information you gain to go tweak the model, change the parameters in it things of that nature finally you say okay I think we have a model I think we're good to go here Mr. Production Team Mr. Application Developer here's the model go code it and now you wait and six weeks goes by and eight weeks goes by and finally your model makes it out into production and now three to nine months later you can finally see whether all this work that you put into this model is actually producing good work if it's producing a significant improvement or not and I've talked to people that do a lot more of the consulting in the predictive analytics space and this is not uncommon this is a very prevalent thing that you see fundamentally because your data scientists and your analysts use a different set of tools and have a different set of needs and have a different way of thinking than your application developers do your application is frequently written in something like Rails well your modelers may be using Python they may be using R they may be using something else that you wouldn't really want to try and write your web application in this is the right tool for the analytics job this is the right tool for the application job and we have this big gap in between and that causes problems so when we started working with one of our clients they needed to be able to do this stuff and they've seen this kind of thing before so they were already very much in tune to this idea of look we need to make this this analytic workflow this has got to be made easier because it's just we can't be spending three, six, nine months on spinning models so if you take a step back and you think about creating a predictive model and especially executing the predictive model in a production environment there's kind of two things that go into this the first thing that you need is data and that sounds kind of obvious and silly but I think it's important to kind of segregate that out and separate it out because no matter what the data portion of executing your model it's a data problem whether your data is stored in a relational database whether it's stored in Mongo whether it's stored in React wherever it's stored the problem of how do I gather the data together that I need to make this prediction it's a data problem so the key facet there is solve your data problem using your data tool so in this case they are running on Postgres our methodology for pulling the data together is it's all SQL we have a view that goes and pulls the stuff together and then there's just a simple function that you call and say hey here's the person's identifier give me the data that I need to run this model and the function returns a Postgres complex data type that is every single piece of data that is needed by the model some of that is very simple scalar data some of it is arrays that that are actually arrays of complex data types themselves so there's an actual nesting that's happening here where you have this multi-layer document essentially that's being handed to the calculation function but in order to ensure robustness and to make sure that there's no surprises and that we know exactly what this data is and what format it's in we're not using something as malleable as say JSON it's done as an actual Postgres data type so there's no question about what is being handed back and forth so the data gathering is done in SQL the calculation function on the other hand and again we're keeping these two things separated very specifically so now the calculation you want to run the calculation in an R hey great fine you want to run it in Python okay not a problem you want to run it in Perl I mean yes we can do that it doesn't matter it'll run in whatever procedural language you want to as long as Postgres supports the procedural language you're good to go and the nice thing is is that these ideas have been wrapped up in a framework so there's a set of database functions and tables that are built around this notion of here's the data stuff here's the calculation stuff and this framework does things like when you go to set up the next version of a model you say okay here's version 3 of the fraud model here's the function that goes and gathers the data and that gets put into the framework and the source code of that function gets put into the framework here's the function that runs the calculation and the source code gets put into the framework so now we have this record that says oh here's the exact version and every time we do a run of the model we say well here's the model we're running we capture all the input data we store that as part of the execution this individual execution we then also hand this to the calculation function it does its thing it provides another Postgres compound data type we then store the compound data type of the results and now we have a record of absolutely everything that went into executing this model now for a lot of you out of curiosity how many of you are doing big data how many of you data science data okay good about quarter you know a lot of you if you're doing like weather modeling or something like that you're like what why why are you going to all this problem why are you going to all this trouble well Kirio's capital is a lending institution and there's a lot of regulations that are wrapped up around lending institutions so when they're at work one day and the auditor shows up at the front door they want to be able to answer auditors questions and part of this is the auditor can go in and pull individual loans or pull declined loans and say why did you decline this person alone did you decline it because they're a woman or did you decline it because their credit stinks the first one is bad the second one is okay so this framework was developed specifically specifically to to support those kind of ideas that now they can explicitly say here's everything that went into this model decision so those of you that are in a heavily regulated industry those of you that want that traceability you know that's that's something that might be very valuable others of you that don't care so much about the traceability a lot of that I think is still valuable you know this notion of separate the data gathering from the calculation but even more importantly take utilize Postgres as the thing that binds this stuff together and it makes it easy because now since all this stuff is done in Postgres we can do things like spin up a snapshot of production go mess around with the models whatever I don't care if you screw the whole thing up as long as you don't leak any sensitive data you can go test on it oh you think you got something up and workable you want to do a regression okay fine let's run the regression on a replica say or we could even potentially run the regression in production itself and just not you know not use it for decisioning and the nice thing is we would be able to prove oh nothing's decisioning on it because there's no records that indicate it I could easily put a trigger in place that says if you try and create a record deny it you know so there's all this flexibility that comes about because instead of this analytics team and the application development team I'm standing here in the middle as the data architect saying come on guys let's pull this in together here so finally you know one thing I do want to mention obviously I'm a big Postgres fan if you haven't picked up on that there is no such thing as a magic bullet there was a tremendous amount of hype around no sequel when the tools started coming out and a lot of people like oh this is going to solve everything and you look over the past year or so and all of a sudden now there's all these projects that are springing up around taking a no sequel technology and putting some form of sequel back on top of it because as it turns out sequel is a pretty good language for dealing with data maybe you can't do all of your big data analytics data science in Postgres that's fine there are certainly workloads where Postgres is not the right tool for the job it's not the right answer what I do think that Postgres has to offer is this incredible amount of connectivity and this incredible flexibility through its extensibility and the capabilities that it already has built in so that the parts that don't make sense to put into Postgres fine put it in the technology that makes sense but then use Postgres as kind of your hub use it as the thing that can pull all this stuff together and make it easier to do your job make it easier to work with and of course if you do have parts of your system where you really do care about immutable data quality and you need to make certain that hey when I say this thing is committed it's committed whether it's because of regulatory reasons whether it's because you're dealing with things like a shopping cart if somebody buys something from you I tend to think it's pretty important that when they click submit and you say commit that that record is stored when you show them the confirmation page it says yes your brand new widget 5000 it's on its way so questions yes so because because Postgres has this support for multiple different procedural languages you can run pretty much whatever you want to run in the database now I will say that obviously if you've got something it's going to just burn through 80 cores like there's no tomorrow maybe you don't want to run that on the database server itself but you can also because you've got this advanced language support maybe what you're doing in the database is you make a call into a procedural function that itself can go call out to a more distributed environment and then the data comes back in now on its surface that sounds kind of weird because why wouldn't you just have the application do it and the reason I would personally do it in the database is you're doing your data gathering you run the calculation you get the results back you probably need to store the results in the database too so really this is all still a data problem it's not until you've captured the data calculated and then stored the results that now you go back to the application and say hey here's the number that you were looking for Postgres does not and I don't think it should because it's got Python just as an example and I know that there's some nice neural net capabilities in Python and you can pull those classes in all you have to do is install the class library on the database server and you can run it right there native within Postgres again this is where you do run into the tradeoff of if it is something that is computationally intensive that you're going to be doing you may not want to run that directly on the database server the question then becomes okay what's the data component of that look like like how much data are you dealing with for a model run if you're doing something that's say finance based something that's not handling enormous amounts of data I think it's reasonable to say hey we just got a Postgres database we pull it together and then we fire it off to an external process and there is parallel capabilities within Postgres and cluster capabilities on the other hand if you're trying to do weather modeling and you're trying to pull in every temperature sensor on the planet for the past week okay Postgres maybe not the right answer so it depends on what you're trying to do really what I want to do with this talk is just get people kind of thinking because I think right now there's this mindset of well it's a relational database you can't big data relational that doesn't mix well it can mix yes I do not know off the top of my head I don't know honestly the thing is is that I think so when you talk about foreign data wrapper sophistication at the base level all that a foreign data wrapper is doing is just passing data back and forth it's doing protocol translation and it's basically turning something like a select statement into whatever you need over here where the sophistication comes in is when you're trying to do stuff like okay I'm going to join three tables locally and two tables over in this foreign data source and three tables over in this data source so what's the efficient way to plan that and push that all out that level of sophistication I suspect is not in the Hadoop foreign data wrapper because that level of sophistication is still being worked on in the Postgres foreign data wrapper to let you talk to another Postgres database so that I don't think is there but I suspect that for a lot of these tools it's probably not a really germane question because a lot of times they don't you know I don't think Hadoop even has the concept of a join maybe Hadoop does but I know a lot of the NoSQL tools what's a join you know you're pulling an individual document or an individual key so I think there was some questions over here no anyone awesome thanks guys