 So, please welcome Magnus Hagendah, he is president of Postgres Euro, and he will give a talk about Postgres replication. So give him one hand. Thank you. I guess you are the crowd who learned that you have to get to all the rooms very early at false time, right? So you're going to miss the next talks, because you're in here. So, that one is a little bit too loud. I'll speak a little bit softer. So, first, a quick couple of words about myself. I said my name is Magnus Hagendah. I work for a company called Red Pill Linpro, which is an open source services business in the Scandinavian area. I'm based out of Stockholm in Sweden myself. I do a lot of Postgres work there, sort of consulting, training, support, you know, typical open source business. Within the Postgres team, I am a member of the core team. I'm one of the committers on the project, and as mentioned, I am currently the president of Postgres Europe. Anyone of you who is a member of Postgres Europe, don't forget that we're about to run elections for next year, so you can vote everybody else off the board. You can't actually vote me off the board. You have to wait one year, and then you can vote me off the board. Anyway, we are here to talk about replication. So, does anyone remember this? Does anyone remember the fact that my clicker stopped working? Ah, there we go. Does anyone remember this? Who was around back in these days? And then you had this, right? And then with some of us who were sitting there, what about these clusters that I have that are replicating with Postgres, and it just works? The Postgres for a long time didn't have built-in replication, right? We had external replication tools. Some of them were easy to use. Some of them were, as I'd like to put it, actively user-unfriendly, but they worked. And that person, let's just say, wasn't really the case back then that Postgres didn't have replication? It just wasn't very obvious. It obviously isn't the case now. And the case now is also still not that Postgres has replication. Just like them, there are many choices. There are many more choices now than there used to be. Back then, everything used to be an external replication system that you put into your Postgres. Now, there are external and internal options. We still have to choose for them, right? You have to pick which one to use. Or, in fact, you can pick more than one, depending on exactly what your requirements are. We still have some that are hard to use. We have some that are very, very easy to use. Now, when we do replication, we can do it at sort of many, many different layers. Usually to solve different problems, sometimes to shift blame to somebody else. There are many different reasons to do it. Everything starting at the hardware, like hardware is also software, right? But at the things that are sold to us as if they are hardware, which is software that somebody else wrote and called firmware, all the way up to actually doing the replication completely in your application and just run across sort of separate databases. So, I like to put this through, you know, we have application-level replication. We have what I call app-in-database-level replication, which is basically trigger-based, right? It's basically an application that runs as triggers inside of your database. You can write it yourself or you can use something else. Back in the days when this whole Postgres doesn't have replication was the truth, then this is basically what Postgres had in the form of four or five external different projects that would do it. Now, we can do database-logical replication. We can do database-managed physical replication. We can do operating system replication, which at this point is basically things like DRBD or similar solutions that run transparently in the operating system. And then, of course, at the very end, we can do the hardware-based replication which today basically means a SAN, right? So, let's start. Let's go through this stack from the bottom and see what options we have in the Postgres world today. At the very bottom end, the lowest thing that we can do here is a SAN-based replication. This is basically your hardware takes care of the replication. It is typically a block-level replication handled by, you know, whatever firmware is running in your SAN. It is transparent to the operating system, and that's the whole point of it. In the OS, you get a block device, and you have the same block device somewhere else and it's all just magic. And as Postgres runs on top of file systems, well, you mount your file system on this block device and you automatically have replicated solutions where you can... Well, usually you can't mount both at the same time, but you can mount it at, for example, a different physical location written and fail over your database there. This is very common today, I think, in the scenarios that I work with our customers in, like, larger enterprises, everybody has this one way or another, especially, you know, in the world of virtual machines. And I'm sure most of you have, at least at one point, used a virtual machine, right? You probably know what this is now. And most enterprises use nothing else, right? There are all these massive, like, VMware cluster things. So it's a pretty common thing. It's a pretty simple thing. You can do this sort of anything from just a single rack replication to multi-site synchronous... And the good thing about these is that these solutions are sold to be guaranteed to never fail, right? Because there are multi-sites and they're following all the best practices and things. And, of course, things that never fail are perfect, aren't they? Yeah. Most people I know who relied on this for their database replication have had big problems with it, simply because there is no such thing as something that never fails. It's only a matter of when it fails. And if everything fails at once or only some pieces at any given time, right? It does work as long as it works. But when it fails, it tends to fail in a very spectacular way because it takes everything down. And what I unfortunately see as a lot of people are building is, their database was not the only thing that was built entirely on the fact that this storage can never fail. I've seen scenarios where, you know, people lost some six, seven thousand virtual machines at once. And suddenly the fact that they also lost their five Postgres databases was maybe not the biggest problem in the world. But so this works to a level, but it's, again, it's the whole thing. If you design everything you build against one thing that cannot possibly fail, it gets really painful when it fails. And it really is a when. There is no such thing as things that don't fail. It's also not very Postgres-specific. So we're not going to spend a lot of time on that one. We're going to move up the stack instead. And look at the operating system level. And at this level, we're basically, I mean, DRBD is probably the most common one, at least if you're in the Linux world. There are some other options around it. And we're basically doing the same thing as this replicated SAT, right? Except they're doing it in the form of a driver in your operating system. Instead of being a driver in somebody else's operating system, which is really what your san is, right? It's still just software. It works and fails in sort of the same way as your typical fans. It has usually a lot less performance, but it's, I mean, if you're only worried about hardware failure over your hard drives, it's not a bad choice for a system where you don't need high performance. But, I mean, it replicates your disks over TCP and it replicates them at a block level and you don't even know what data it's replicating. That obviously can get pretty inefficient when it comes to performance on the replication side. And again, it's not very Postgres specific. So we're going to skip past that one as well and see what do we have further up the stack in Postgres. Now we're getting to the actual Postgres sites, right? Database physical replication. This is a transaction log or wow-based replication solution. In Postgres, we have this in Postgres 8.3, which is quite a while ago. People were still saying that we didn't have replication in 8.3 and in fairness, it wasn't excellent. It was file-based. So basically the replication block was blocks of 16 megabytes worth of transactions, which is kind of not very real-time. It worked pretty well and this is the foundation that the sort of core replication options that we have in Postgres today are still building on the same foundation. In Postgres 9.0, this was turned into streaming replication, meaning that instead of replicating blocks of 16 megabytes, we would now replicate basically blocks of one byte. Replicate transaction log as soon as it's created. As of Postgres 9.1, support was added for synchronous, meaning that you can be guaranteed that when you commit the transaction, it is in at least two places. And the system will block and wait until the commit has reached the standby node or standby node before the transaction is released. And the way that this works particularly in Postgres, which I find very useful, is you can mix synchronous and asynchronous and you've been able to do that since version 9.1. You can say that some of your transactions are very important. They need to be synchronously replicated. You need the guarantees that they exist in multiple places before your application proceeds to the next step, whereas some other things are not that important and they can run asynchronously. And you can run this in the same system. You just, in your transaction, you just set, you know, set synchronous mode equals off and you get an asynchronous or set synchronous mode equals on and you get a synchronous one. Depending on your need, you should, you know, one of them would be the default and the other one would be the one that you changed into. We added support for Quorum commit in Postgres version 10, which is your ability to basically say, well, I have 10 servers and I want my transactions to be on five of them before I release the transaction. Or, you know, I have five servers. I want them to be on three. Numbers other than two is basically Quorum commit. 9.6 actually added the ability to have more than one, but they had to be in order and the Quorum commit one in Postgres 10 is sort of the one you really want if what you're looking for is more than two nodes worth of synchronous replication, at which point you'll just look at all your servers and basically count how many have received your synchronous transaction and as soon as enough of them compared to your configuration has received it, it's released and the transaction will continue. And there are still things being added to this. The thing that's common across all of these is they're all transaction log based. Basically, as soon as every transaction in Postgres when it's doing anything it's first written to the transaction logger the while and then it's written to the rest of the disk and this is primarily used for crash recovery but what we do here is instead of just using it for that we ship all these changes to the standby nodes and they basically run in crash recovery mode all the time and they just keep recovering so as the data shows up and the transaction log looking at it in the simplest form is like sending binary diffs of your database saying hey, change this piece of this disk block over here in this way and we just rerun that on the other side. So that runs pretty fast. The common thing, again, while level equals replica is what you need. This is all you need to set in your Postgres Conf basically. You need to add a little bit of information into the transaction log to make it able to perform this style of replication. As for Postgres version 10, this is also the default so you can run all of this out of the box on Postgres version 10. On previous versions you have to enable it and you have to configure the minimum number of users that can log in and do it as well. The other things that you want to set about this is the synchronous mode options that you can, again, you can set it on a per transaction basis where we have the ability to set it to off local on the remote apply. If you set it to off, that actually means that your transaction will also become locally asynchronous. So if your primary crashes and just reboots or something, you may actually lose your very last transaction. That is running in completely asynchronous mode. It will still guarantee consistency of your database. It will guarantee everything else except that you may lose, by default, about a quarter of a second worth of transaction worst case. Obviously that is tunable but something in that area. If you set it to local, that makes you get your traditional asynchronous replication. Data is written synchronously to the local disk but it is sent asynchronously over to your stand-by which can be one or 20 or 100. If you set it to on, then Postgres will replicate the data and when you commit, so all coordination happens at the commit level, when you commit on your master, then the data or the transaction will block until the data has been sent and written to disk on the stand-by. In synchronous mode, we wait for one stand-by. If you have ten stand-by's, we'll wait for the fastest one of them. Finally, you can also say it in a mode called remote-apply where we also wait for the database and the stand-by node to have completed the crash recovery of this transaction so that it's actually visible inside of that database. Because on these stand-by nodes, you can still open transactions. The stand-by nodes are read only but you can open transactions. You can get MVCC snapshots across all your tables. You're running on old data and things on the stand-by. So we use them for many different things. Obviously, the further down the list you go, the slower things will be but the more guarantees you have in the event of a failover of your replication system. We have the three different methods that we can do synchronous. The simplest way is to set a parameter called synchronous stand-by name and you list the names of your stand-by's and in this case, I have just S1, S2 and S3. As soon as any one of these have acknowledged that a commit has reached it, your transaction will be released and the application will keep moving on the master node. This we've had since Postgres 9.1. The first, which is the one if you just say like this, it's still synchronous stand-by. In this case, two parentheses, one, two, three. You wait for S1 and S2 to finish, assuming that they're online. If one of them goes online, you say if S2 is online, then it'll wait for S1 and S3 but it'll always wait for the first ones in the list. And then the quorum commit that we added in Postgres 10, then you set your synchronous stand-by names to any two in this case and it'll just pick whichever two of those are the fastest one at the moment. So this way you can say, you know, you're guaranteed, in this case any two, where you're guaranteed your transaction is on the master and on two of the stand-byes. So if your master dies, you still have your data in two places. Fully guaranteed at all points in that. Again, the common things of this streaming replication, I would say for most people who want to do replication in Postgres today, this is what you do. It's very easy to set up. It's pretty hard to get run. Personally, I like that in solutions. And in particular, once it's working, it doesn't just silently stop working. It's very easy to monitor. There's one thing you have to monitor. You monitor how far behind it is. There is no way that you can, like, lose individual tables or anything. It's either all or nothing. It's very efficient. It's very, very low overhead. And of course, it's built-in. It's nice. We don't have to go download the separate package or install it from whatever repository we're using or things like that. Getting streaming replication working in a modern Postgres version is really easy. Particularly, again, as we do this in Postgres version 10. You go to, well, obviously, you have your master node up and running already. You want to set up your stand-by node. It's basically two commands. First, you run the command PGBaseBackup, which will take a backup of your master node. So basically, we take the backup from the master to the stand-by, and then we provision from there. We'll give it a directory just where we want it, you know, violin, pg, sql, or wherever we like to have it. We'll give it a hostname, point it to the primary. It could be an IP address or a hostname. We'll give it a user. It has to log in. For streaming replication, it needs to either be a super user, or it need to be any user that has the special replication privilege, which is normally the recommended one. So use a separate user, and that has the privilege to do replication. Then we add the little switch, dash R, which says this is not actually going to be a backup. This is going to be a replica. And then the system will automatically provision a configuration file that sets it up as a replica. We give it what is known as replication slit to make sure that the that's basically registering the standby node with the master so that the master doesn't delete data that the standby node still needs if it falls behind, you know, if you lose the network or reboot the standby or something like that. And then you can add the little dash p thing in the end there, which is just get a progress report if your database is big. If this thing runs for a couple of hours, it's nice to see how far along it is. And then you've done that. You just start Postgres on the standby node and you're done. If you are using Postgres prior to version 10, you also add to this dash X stream so that your replica doesn't fall so far behind before it gets started that it never actually starts. This can otherwise be an option or an issue if you're running it on your database. This is not necessary in Postgres version 10 because it's been changed to be the default. But on previous versions added dash X stream and that's pretty much it. It basically sets it up to start replicating before it finishes the backup. It starts replicating immediately so that it doesn't fall so far behind that it might not be able to catch up. There are a few limitations obviously of this one. It's architecture and compile flag dependent. So you can freely mix different compilations of Postgres, different platforms between Spark and Intel or Windows and Linux which kind of works but it's not really supported. It's whole cluster only which is the thing that bites most people. If you have 10 databases you have to replicate all of them. It's the entire cluster of all of them are being replicated. The standby node is completely read-only. You can't write anything on it. You can't even create temporary tables on it. Basically the standby node does not have its own transaction log. So any operation within the system that generates transaction log will just fail and it's primary to standby only. You can only replicate in one direction. But this is usually not a problem if the problem you're trying to solve is availability. If you're replicating to gain by availability in your system then you're probably going to replicate between machines that are roughly equal. They're going to be running the same thing if you're running Red Hat on your primary you're probably going to be running Red Hat on your standby for consistency and things like that. You're going to be running on the same platforms but there are other cases where it can be a problem of course. Streaming replication in Postgres does not come with built-in cluster management. So no manual failover, no automatic failover it just provides the infrastructure. You can build a failover solution on top of it all the plumbing is there all the APIs are there all the monitoring is there but it's not shifted in there. And it doesn't support failback well it doesn't support easy failback particularly if you're trying to build these things yourself there are some very complicated cases because since the replication for example happens at commit level when a master might be ahead of a standby even if they're in synchronous because there might be uncommitted data and we can't control where we fail that's kind of the whole then it's not a failure really. So it's really easy to get started but if you want to build really high availability on top of it it can be a bit harder to be maintained. Of course that's when we can turn to these outside support projects there are cluster management solutions for Postgres they're just not built in. There are a couple of good ones that I would say are the ones that you should evaluate today. There is a management tool called Patroni it's designed to be automatic from ground up. There are some others that were designed initially to be manual and then sort of upgraded. Patroni is designed for automatic failover management automatic pretty much everything it uses it can plug into multiple different consensus stores it can use at CD, it can use zookeeper it can use console I believe the latest version which I think is out now can actually run without an external one as well and it has nice plugins to for example it ships with patterns for integrating with HAProxy so you point your application at an HAProxy and Patroni will control your Postgres nodes it will control your HAProxy and everything just works until it doesn't of course that's true of every solution as well and it's by now I'd say it's a very well tested solution the other big option that we have in the Postgres world today is a tool called RepManager or RepMGR this is one of those that was originally designed to handle replication clusters but with manual and has later learned to do automatic failover and things as well I found it traditionally it's easier to use than Patroni when it comes to dealing with manual failovers because Patroni wasn't really designed to do that so if you want to do that we can spend a whole other talk just discussing whether automatic failover is actually a good or bad thing that will depend on your situations we're not going to do that now but RepManager is capable of doing both of them it doesn't handle connection management in itself it has plugins where you can have it run external commands to reconfigure things you can still use it with like HAProxy and just have HAProxy pull your database make sure that it works but it can take a little bit more integration for those things but the advantage there of course is you gain a little bit more flexibility when it comes to those things and finally I'll point out there is a tool called Path I think it's the Postgres automatic failover it's resource agents and configurations and plugins if you're using a pacemaker one of the strong points of that one is if you're using pacemaker and coarsing to manage other things in your organization other types of HA clusters then the configuration is obviously very similar it's something that your operations people will be comfortable with and using things like that and the traditional way to set up here is well you set up a virtual IP and the pacemaker coarsing combination will move the virtual IP as necessary to whichever node happens to be the master node again and again well it's called Postgres automatic failover its focus is doing manual failover of course or maybe automatic so if you need to build like a high availability cluster these are all very good options there are some other options out there as well but today these are sort of the main options that you will find throughout the Postgres world so if we keep moving up the stack database logical replication is the next level this is the that many of you probably know it's new in Postgres version 10 we actually added a lot of the infrastructure already in Postgres 9.4 with something called logical decoding and then it's been built on that up until Postgres version 10 which is the first version that released with built in actual replication sitting on top of this now this still acts on the right ahead log right it still sits in the transaction log which means it's still very low overhead it's a little bit more overhead than your physical replication but not much from a performance perspective it can be a lot more overhead from a management perspective but since it's in the transaction log when we just change instead of saying while level replica we've moved to while level logical and now we have logical replication it's almost that simple then you need to set up a few things now the way that logical replication works is it reconstructs the changes on a pro-row basis so it's not one of those solutions that actually replicates your SQL statements and runs them again on the master but instead of just sending the binary differences they send the logical differences saying well I replace the row that's identified by this with this new content and does it on an individual row basis it's still fully transactional so it's still integrating with that and that's one of the main reasons why it can use or one of the good reasons it uses the while to make sure that that's where you have your commits and it can trigger things directly on the commits now being on a row basis instead of being binary across the whole cluster means that you can do table level partial replication you don't have to replicate everything anymore you can say I just want to replicate this one table over here and you can also pretty easily do what I like to call table bi-directional replication it's still easy if you say this table is going to replicate from node 1 to node 2 and this other table is going to replicate from node 2 to node 1 like we might have a master node for each customer depending on location and we just replicate them in two different ways and then we have a view on top of it or it can be partitions or something like that which the application works there are many scenarios where actually just being able to do table level bi-directional replication can be really, really useful and of course logical replication is slightly more complex than to set up than it is to set up physical streaming application but it is still very simple just as an example here on your master node well which will then be master node for this particular table you create the table every table has to be identified in this case if your table has a primary key it will replicate based on the primary key then you create this thing called a publication as a create publication test pub for this table it can be multiple tables you can also say for all tables and then it will dynamically add new tables and things like that a publication is the unit of replication so if you put 10 tables into a publication they will replicate as one unit and they will sort of move over together and these are things you need to consider for example if your tables with foreign keys between them you can't replicate the changes to just one of them then it's one of those funny things that might work most of the time but not all of the time and you want there so those are things you need to consider these are the cases where logical replication can become a little bit more complicated then on the standby node you create the same table maybe using something like PGDOMP in schema mode and then you create a subscription which is the other side of this it's a subscription, test sub connection and this is connecting to the database where we created the publication and then we just say publication test pub and this test pub here is of course that is one that we created over here and as soon as we create this publication sorry as soon as we create this subscription replication of this table will start and it will start doing it on a row by row basis as soon as I insert a row it will get replicated over to the other one now logical replication comes of course with its own sets or drawbacks in this it doesn't currently do anything other than data replication it doesn't just schema replication you have to remember to do alter table on all the nodes and you have to do it in like the right order and things like that, bad things can happen otherwise it doesn't particularly it doesn't replicate sequences so if you have your table with a primary key based off a sequence or a generated by column or something like that then the values will of course replicate but the sequence will still be stuck at one so if you use this for something like high availability and if you fail over then all your sequences are going to be wrong which is kind of like oops so it's not very good for high availability replication now there's no reason that we couldn't be able to add things like schema replication and sequence replication on a fundamental level we just don't have them today but that means that today this is really good for data distribution and it's not very good for high availability and it depends on the solution you have we also like something that is called failover slots or one of the guys working on them called them failover slots the general idea being if you're replicating from a master with logical and physical and then you fail over to your physical standby your logical replication will not follow so there are still things that need to be done when it comes to the database level logical replication but the built in one is still very good of course a lot of people use cases but it's not complete now if you want to build further on logical replication in the database there is an extension for Postgres called PG logical this is basically an external version of the same thing it's where it started it's been merged piece by piece into what is now the logical replication in Postgres version 10 and more pieces of it will be merged into future versions of Postgres but they have more capabilities today but it also means that it's less integrated and it's a little bit more tricky to work with you don't have these nice commands instead you call functions for everything and there are a few things that are you have to do this in the right order otherwise you know dragons show up but it does support sequence replication for example it does support row based filtering so you can say I don't want to replicate the whole table I only want to replicate rows that match this you can do column based filtering saying I only want to replicate these columns and not the other ones through it it supports merging and conflict resolution so you can have multiple nodes replicating into the same table in a destination for example or you can have local changes to a table and replicating into the same one and you can define ways for what's going to happen when there is a conflict because if you write from more than one place there will eventually be a conflict you may think that your application prevented that but it didn't it didn't testing it always does in testing and there are a number of other things that pgeological adds on top of this pgeological also supports postpress 9.4 not just postpress 10 and using pgeological you can accomplish what's basically well there's no such thing as zero close to zero downtime upgrades by using logical replication because physical replication was tied to specific postpress versions versions and platforms and compile flags and things logical replication isn't you can using pgeological you can replicate from postpress 9.4 to postpress 10 for example now you will be able to do things mostly these zero downtime upgrade things probably using the postpress built in replication as well but only from version 10 and we haven't released version 11 yet so you'll have to wait a while for that one now if we continue further up the stack we have the app in database right this is the old trigger based replication systems and the big ones of those are still out there right anyone ever deployed sloney did you like it okay yeah there's always one sloney is a really capable system but it's not very easy to work with and unlike the other ones it is actually really easy to break and when you do you're in for a lot of pain but it does things and even today even with pgeological even with all the built in stuff there are things that sloney can do that they can't you can set up really really strange things in your application with sloney we have buccardo and londiste which are also trigger based system that are basically simpler to use than sloney I think that's their main selling argument it's really that they're much easier to use than sloney and they still are they're still good choices for cases where the other solutions don't apply but they're not really your primary target for a long time they were the only one we had a lot more overhead every time you do an insert it's copied somewhere and this is all happening in triggers running in user space in the database and then you pull it out and you delete it and these tools all do all sorts of weird things to make sure the overhead is as little as possible but it's still going to be much much worse than for the logical or physical built-in resolution but there are complicated scenarios where you really need to do it such as you know who in here likes multimaster everybody wants it right everybody really needs multimaster that's my experience everyone needs it until they start looking into what they really need or in particular until they start realizing what it means because what most people want when they say they want multimaster is they want transparent multimaster they want their application to remain completely unchanged and things to just magically work unfortunately we don't have a lot of magic there are things in play here like the speed of light and annoying things like that that we haven't figured out how to deal with yet if you're doing a database in the form of your traditional relational database you cannot do transparent multimaster but there are people who sell it have you ever met for example a salesman from somebody who wants to sell you a solution based on Oracle RAC saying it's transparent multimaster it will scale everywhere and it will work perfectly anyone ever met a guy who tried to sell them that I know I have did you try to deploy it did it work it doesn't work transparently don't get me wrong RAC using a RAC-adapted application can do some very good things but if you just take your regular application very bad things can happen the same goes for all solutions so the completely transparent multimaster just doesn't exist if you want the consistent database of your traditional relational database system there are some options within Postgres still there is a tool kit or a fork actually in Postgres called BDR that has been developed a lot together with the Pgeological stuff that does support geographically distributed multimaster but you're not going to if you are, let's put it like this if you manage to actually run your application completely unmodified against it then you probably didn't need multimaster right goes for most solutions but if you engineer your application to work together with a solution like BDR yes it can solve those problems in a lot of cases probably not in a lot of them but in a lot of those cases there is a fork of Postgres it's not an extension it's an actual fork of Postgres which I think is based on 9.4 or 9.5 at this point they're working their way towards newer versions I think the plan is last I heard to still make it into an extension in the next major version of BDR should hopefully just be an extension on Topol Postgres but it's not there now but we'll hopefully get there all these trigger based solutions now, Bicardo supports doing multimaster the big problem with doing because all of the trigger based ones are asynchronous and the big problem with asynchronous multimaster is of course you will eventually get conflicts and you need to decide what's going to happen when you get a conflict who's going to win and if you can define those you can define very complex policies for dealing with that in a tool like Bicardo but you have to actually do it otherwise once you run into the conflict everything just stops and that's probably not what you intended but my general recommendation when you're looking at something like multimaster is first look about five more times to see if you really need multimaster because when we're talking about these things that were easy to break that'll be easy to break in most scenarios it is a much harder problem to solve than most people realize and we can maintain the consistent the consistency part that really is our relational database and of course we can build our application all the way up in the application itself and just connect to five different databases and run things in five different databases and commit them in all of them and everything will be fine but actually doing transaction control at this level can become very complicated and you can get into your application if what you're looking for is consistent replication which is probably why you're using a relational database in the first place then trying to build that in your application is probably not a very good idea it's setting yourself up for having to maintain something that will become really really complicated over time so where we take from this there are obviously more options than this around but there are a couple of main things that work main problems that we want to solve with replication today if what you're looking for is solving the problem of high availability then you should really just be using Postgres built-in streaming replication easy to set up, hard to break these are very good properties when we're talking about building for high availability because if it breaks it will always break in the wrong moment we know that and use the fact that you can mix synchronous and asynchronous replication in the same cluster or even between the same nodes because by doing that in an intelligent way and in a controlled way you can get most of the important benefits for building synchronous replication with almost none of the overhead by simply indicating the important things versus the slightly less important things because even on the slightly less important things we're talking about losing a lot of data in asynchronous mode we're still talking about losing a lot less than one second worth of transactions which in a lot of applications really isn't that much but it's very much more than zero we've broken the Boolean of zero and saying it is actually okay to lose a few transactions but if we mix those we can get a very good performance through it and if you want to build that up it's just going to be manually maintained replication and the built-in tools do a pretty good job if you want to go a step beyond that you need to look at things like automatic fail-over don't build your own solution for automatic fail-over look at primarily Patroni and Red Manager compare how they work to what matches your organization they're both fine tools they work in slightly different scenarios I would recommend that you evaluate them both look at the path if you are already using things or for other things going to be using things like pacemaker and coarsenck if you're not doing that when it's an extra piece of infrastructure that you're bringing in you're probably better off looking at these tools to deal with that but if you're building high availability don't use logical replication don't use pgeological, certainly don't use the trigger based ones who's actually not only buccardo or longistae actively in the organization okay, that's fair are you looking at replacing them or are you happy with them anyone using them and not planning to replace them is anyone using anything that they're not planning to replace well, officially yes because at some point your project has to officially be finished the other use case that we might need to cover is re-query offloading and if all we really need to do is offload pure re-queries then we can still use physical streaming replication because our standby nodes are available our standby nodes are read only you can still get multi-table and multi-statement transactional snapshots on your standby nodes they just have to be completely read only which goes back to the previous one yes, you might want to look at management tools and things like that this setup really easy I really stable that is a good choice in many cases another data distribution which is different in that now we don't just need to run simple re-queries on our data we might need to replicate different data to different places we might need to generate temp tables, generate reporting tables and things like that well, if you're using Postgres 10 use logical replication if you're not using Postgres 10 you should upgrade you should use Postgres 10 but if not, look at Pidgeological if you're using Postgres 9.4 or newer but also look at Pidgeological if the built-in one isn't good enough for you if you need the partition replication for example replicating some columns or some rows only then look at Pidgeological and yes, if you're using slowly you should probably look at upgrading your way out of slowly unless you are one of the people who are in one of these use cases that really only slowly solves then you're going to keep using slowly just be careful because it's easy to break but these are all good ways of dealing with your data going out in different ways well, what if you need both what if you need both high availability and re-uploading well, do both just put up another note it's a very common scenario say you use synchronous streaming replication between a high availability try again, high availability cluster of say three nodes and then you also use asynchronous replication to another node or two other nodes where you run the read queries for example you can freely mix and match these different types of replication all the streaming ones support that you have cascading replication so you replicate from node one to node B and then from node B to node C, D and E so you can replicate into read clusters or into a distribution node that replicates into many read clusters to conserve a network bandwidth and all sorts of things like that so in many cases if you move beyond the simplest needs you're going to want to mix different ones of these but these should all be sort of the base points of where you want to go for those things and that's all I had do we have a couple of minutes for questions? yes, we have three minutes for questions who wants to be the first one? yes you get to run to right in the middle and then throw the I think, yeah, I think you might have not turned it on for you so the question I think is how much latency is added by turning on synchronous replication and if the application ends up being blocking over time it really depends on the distance between your nodes if you're replicating like close to two nodes in the same data center or so then there's a little bit of latency but it's very very low obviously if you're replicating across the Atlantic why don't we have to go there and back I've seen a lot of scenarios where you're replicating between two data centers that are in the same city and usually the latency overhead is low enough that you barely notice it now if you have an application that's just transactions yes, you will notice the latency shift there of course but most of the time you have multiple concurrent things and things like that do you have a question? ok that's a bummer does anyone else have a question? hello anyone else? the one right there so the question is read replicas can you create separate indexes on the read replica if you have a binary streaming replication then the answer is no you can't you are unfortunate to correct in that you cannot do that now if you're using logical replication then you can put separate indexes on the receiving side but in the physical replication you cannot it's completely read only anyone else? in that case everyone asks a question in the middle in the PG pool on the replication side of PG pool my opinion is don't use it now if you put PG pool in front of a streaming replication cluster for example to have it handle like that's a different story that's a usable thing but it also has the ability to do like statement level replication in PG pool my recommendation that really is don't use it but the rest of PG pool is much better