 Welcome to Logical Replication Lessons Learned for the Data Warehouse. I'm joined by our speaker Pat Wright, a DBA who runs Utah Geek Events, and here to discuss using logical replication in a data warehouse system. My name is Lindsay Hooper. I'm one of the Postgres Conference organizers, and I'll be your moderator for this webinar, a little bit about your speaker. Pat spent a DBA for the last 15-plus years, and he's been working in the big data space with Hadoop Elastic Search for the past three years. He's been working on Postgres for the last two years, and runs the local Postgres Meetup Group in his area. Pat spent a SQL Server MVP, Passboard of Director and Organizer of several user groups, and he has a night job running Utah Geek Events, a 501c3 which helps run events for the community. I'm going to hand it off to Pat now. You can take it away. Thank you, Lindsay. I appreciate that. Appreciate that great introduction. I'll just start out with, as Lindsay mentioned, I'm Pat Wright. I've been working with databases for a long time. I like to volunteer a whole lot, and in my side time, I like to do a lot of photography and take pictures. As she mentioned, we absolutely are looking for questions just directly in the chat. Just drop them in there. I have the window open right here so you can see it. If you want to also, there's a great Discord channel out there for Postgres now, and I've linked it right here in my slide. If you want to head out to the Discord channel, honestly, if you want to give me a direct message in the Discord channel, I'm fine with that too. I'll watch that closely so that you can hit me up there. All of the stuff I'm going to talk about today is on GitHub at the link that's right down there below. It's basically just the scripts that I'm going to use today and the Docker compose information, so you can get to that directly from that GitHub link. Sorry about that. You can get to it directly from that GitHub link. All you have to do is go out to there, take a look at it, and you'll see all your scripts and everything else right there that I'm going to talk about today. All right, let's get started and let's get going on this. All right, so what we're going to talk about today is I'm going to do kind of a first presentation about what different types of replication are out there and what you can do with different relational databases. Much of this is going to relate directly to Postgres, but some of these things are going to relate to other relational databases out there as well. I like to make sure that you're informed about all the different things that are out there, not just necessarily just on Postgres, because it's important to understand that there's other database systems and you might be working with them in the future as well. So having that knowledge gives you better knowledge of what Postgres is going to do. So we're going to talk about these. This is all just going to be basically discussion, but please once again, if you have any questions on any of these slides as I go through, please ask them to me in the chat. After we get done with this section talking about the replication, I'm going to bring up a demo and I'm actually going to show you how to set up replication and then some of the things that we've ran into for replication for the last year. My day job, as Lindsay mentioned, is kind of a couple of different things that wear a couple of different hats. I'm a database administrator, database engineer for Podium Incorporated. That's kind of what I do all day in and out and that's where a lot of this replication is done. But I also run my own consulting and I run into different companies where we have to help them work on replication and different things. And then I also run at night events and help with Postgres user groups. I present from time to time. All of these different things are things that we do kind of on a day-to-day basis. All right, so let's talk about replication and what you should be concerned with. My slides are going all over the place. So let's talk about the most common one that we have every day, the master standby situation. This is a very common thing that you see all the time. You have a master, I mean server out there and then you have a standby server. If you have this in this situation, usually you have either asynchronous or synchronous. And what that means is that your app, when it writes a transaction to the master, if it's asynchronous, it's kind of what we call fire and forget, meaning it sends it over to the standby. The standby may have it, it may not, but it's going to get it in a certain amount of time. We call this replication lag. We call this the thing that says, okay, it does have it or it does not have it is based on the lag. Now, if you have it in a synchronous setup, it means that when the app sends the transaction to the master, it must also send it to the standby, wait for an acknowledgement and then come back up, back through the master and up to the app. So this means that it's going to be a little slower in what it's doing, but you are guaranteed that both of your things are in sync. This makes it so you can fail over, this gives you the ability to do HA and things like that. I put down here that one of the cons of this is that you get two points of failure if you're in a synchronous setup. That's not totally true because you can also add more standbys. And if you have more standbys, you can say as long as one of them gets the transaction, then all of them have it, then you're okay. You can still fail over to that one. So it doesn't mean that there's just a single point of failure here, but it is something you should consider. So whenever you're in this situation and you're in a master standby, which is this is one of more common things that I see is that if you have a HA situation or something like that, make sure you're in a synchronous situation because then you can fail over to the other box. If you're in an asynchronous situation, you might not be. Where is the asynchronous situation makes sense or where do you use it? I use it personally for our read servers. So we use a lot of read servers and we'll have a master's server out there and then we'll have several followers that are read servers essentially. And all we do is we do select queries from them and they are asynchronous. They're not necessarily synchronous because we're okay with them having a little bit of a delay. It's okay if they have maybe a minute or two or something like that, that's all right. So we use it in a situation where they're read servers to give us more read, horizontal scaling of reads. So that's a standard master standby relationship and one thing that you can use for that situation. Again, let me know if you have questions as I go through each one of these slides. So let's now talk about a multi-master situation. And this is something that Postgres doesn't offer out of the box. I believe, and somebody can correct me if I'm wrong on one of the chats that there are some extensions out there that'll give you something like this, but I'll warn you if you're ever in a situation where you're having a multi-master, which means that if the app writes a transaction, it writes it to both places. You can either do this through the app and the app has to do what's called a dual write and it writes it to both databases or the app writes it and then the database actually merges it to the other side and then both the apps can hit any of the master systems. These are a little more difficult and a little more confusing because basically you have to have conflict resolution in these systems. These systems have to know, am I writing to this database or am I writing to that one and who wins if they both write the same thing? So if app number one, the app up here in the left-hand side says, I wanna write an update and I wanna update his name to Pat and then app number two says, I wanna update his name to Bob, who's gonna win? Is it going to be Bob or Pat? And that conflict resolution is something that you have to work on a lot and your app has to understand that and know that. So it's something to keep in mind, it's something that is out there. You see this in a lot of the NoSQL databases, you see this in some of the other systems. Again, I don't see this too much in Postgres but I wanted to bring it up because I want you guys to know other different relational databases that are out there and what you can do with replication. Okay, so then we're gonna talk about kind of a, what we would call more of a slower or different use case, which is something like in log shipping or a copy of the wall files. What this says is that basically you have a master server out there and on this master server, you've got a whole bunch of transaction logs going on. That's a standard thing that we have right now to this day. Everybody's got transaction logs, wall logs going. You copy those wall logs to another server and you actually restore those wall logs and you restore them to that other server and then that gives you basically a duplicate copy of your server because everything is replayed to that other server. Some of the good parts about this is that if you're trying to do something offsite where you wanna take your database and put it somewhere else like in another data center, across the world, something else, this is a great way to do it because you're all offsite copying. You can copy those files, literally put them on a drive and ship them. I've actually done this in the past and send them somewhere else or you can simply just say, well, I would like to put it up to this FTP site and then from that FTP site, I'd like to copy it down onto this server. All of those things are possible and doable through this system and it makes it so that it's really delayed and that delay could be a good thing because that delay is great for disaster recovery. If something critical happens to your master database and something goes wrong, somebody truncates a table, somebody drops a table, which none of that ever happens, right? There's no developers that ever truncate or drop a table. But if those things happen, they can be delayed and they can never go through because you can say, no, I'm not gonna ship that log, I'm gonna stop that and then I still have a good copy of the data over on this other side. So it gives you the ability to deal with really offsite and disaster recovery sort of things. Now I wouldn't use this any heavy like replication scenario or anything that had to be fast. This is not a real-time situation. This is usually hours, sometimes even days for what this timeframe is. So this is not something that you'd wanna do for a production system if it's except for disaster recovery. Okay, so now let's talk about actual logical replication and this came out in Postgres 10. And just to be clear, everything I'm gonna show today is working off of Postgres 10. I haven't moved to 11 and 12 and I may even bring up a few things where I believe this is resolved in 11 or 12 but this is not something that I've personally gone in and validated or tested. Again, if you've got something in the chat, please jump into the chat and say, hey, I think this is fixed in 12. I would love to hear that sort of thing and let me know right now. I know that these work in 10. So logical replication really comes down to being able to copy specific objects from the database. So in 10, they created the idea of publishers and subscribers. So you can publish a list of tables or a list of items to actually publish to someone else and then someone subscribes to them. This has been done in a lot of different relational databases and different object replication. The nice things here is that you kind of control what is actually replicated. Instead of the master standby situation we had earlier where the entire database is replicated, the whole thing has to be replicated. In this one, you only have to replicate certain tables. If you have a database with, say, 1,000 tables but you only need 10 of them replicated, then logical replication is really a good choice for you because you don't have to replicate everything. You don't have to replicate the whole 1,000 tables. So keep that in mind as you're doing this is that you do want to create your publications and we're going to show that in just a second. You want to create your publications in a specific way so that they match whatever you're trying to do. You can change destinations. You can also have multiple subscribers to one publication. So this means that I'm going to talk about data warehouses but if you had several data warehouses or if you had several data lakes and you had one master server that had all the data, you could have five or six subscribers to that one publication. And you don't have to create different publications for every single one. I am going to talk a little bit about the performance aspect and a little bit about config once we get into the demo side of it. But for now, just know that you can put multiple subscribers to one publisher. So it gives you a great way to do that. Another really interesting thing about the subscribers is that you can change the tables on the subscribers. Now I'm putting big air quotes around this and you can't see me but trust me, I am. You can change the tables in certain ways and we're going to show that because the master still has the master tables meaning that if you add a column in the master, I mean, I'm going to show this, you will break the replication but if you add a column in the subscriber, you will not. So you can alter the subscribers tables. You can add different indexes. You can change things about the subscriber to a certain point. Not something that we'll talk about here in just a minute how much you can actually change to it. Mainly it's adding columns, different indexes. So what this means is that you could have a completely different reporting scenario for those tables. So let's say you have a highly normalized transactional system in your master and you want to get it into a denormalized data warehouse in situation in your subscriber then that's what's really going to make a difference is that you can change those tables and you can change those structures because of that. Not too far, again, you can only add columns and indexes but at least you can make a different pattern for what you're using where you're querying the data differently. The indexes would really help because you don't want to put these indexes on your transactional system but you do want to put them on your logically replicated system. Okay, so let's talk more about logical replication. No questions so far. I think everybody's sleeping still or you're just getting ready for, it's almost lunchtime here for me so I totally understand but you know, if you've got questions, please drop them in the chat. So a lot of these things I've already mentioned on this slide but I wanted to call this out and I wanted to show this picture so that everybody gets a good visual idea of this too. And full disclosure, I give huge credit to several nines here. This blog post that I've linked to down here at the bottom, if you're getting started with replication, this is the one you should be reading and you should be looking at. You should really be taking a look at this and they did a great blog post, a great write up on all of these things. I used it heavily and I can't say enough good things about them. So if you get a chance, run out to severalnines.com and look for this blog post on replication deep dive and they'll give you all the things that you need to get started. Okay, so logical replication. You can per table basis different index schemas where you talked about all these things. You can also base the limit, I mean, you can limit based on DDL. So meaning if you wanted to create a table out there and you said, all I need to know is whenever somebody deletes something from this table. If they delete something from this table, I wanna know about it. You could create a replication that did just that. Now, I know what you're saying is, well, we can do that in a trigger or something else too, but maybe it's easier to do it with logical replication. Maybe it's a better idea to send that data off to the warehouse and say, here's all the deletes for this table and then you can write that down to a table down there. And that would give you those deleted rows as well. Kind of not only as a backup, but a way for you to see that. So there's a lot of power there for what you wanna limit your DDLs on. I will be honest, all of mine just do all insert, update and deletes because I'm usually copying the entire table, but I could see a huge update. I mean, a huge use case for this for doing just deletes. The other thing you gotta worry about on logical replication is how much it reads the wall log. So for each publication, and this is where I hope someone will jump into the chat or maybe I'll bring this up and discord later too, is that in the version 10, when you published a table or you published items, it did not only look at the wall log for that item specifically that object. It looked for the entire wall log because it had to read the whole thing. It wasn't optimized for just one object. I believe in 12, they're starting to correct that or they have already corrected that and it's starting to do it for the objects only that you're looking at. But keep in mind, if you have a data warehouse or if you have a database and you say, let's say you put 20 publications on it and there's a hundred tables in that in different ways, those 20 publications, each one of those publications is going to read your wall log. And that could be an impacting on your performance on your transactional system. So you do need to be very cautious about how many publications you create on your source systems. Again, this is something to test in version 12 to see if it still does it to validate and make sure. And then the other big thing that you run into on logical replication are the setup in the snapshots. And I'm gonna walk you through how I do a snapshot right here in a second, but I can tell you right now I created a custom procedure more so for me to copy my own data to my snapshots. Many of my tables that I'm working with are either 80 to 100 gigs. Some of them are 200 gigs. Some of them are very large. Some have hundreds of millions of rows. So for me, it was just when I said copy data true on the publication, it really didn't move over really well. It didn't do it in a fast enough timeframe that worked for me and that worked for my system. So I would tend to, I wrote a simple little script that basically copied the table over, dumped it down in either to CSV or binary and then loaded it on the other system and then made the snapshot that connected. So that was my way of dealing with it. There's nothing wrong with you doing it with the copy data, with the copy snapshot sort of thing. Just keep in mind that if you're dealing with really, really large tables, you're gonna see that run in the background for quite some time, however long it takes to copy the data. And honestly, Postgres may be doing the exact same thing I'm doing. I mean, really, it could be the exact same thing. I just did it because it gave me more control over when I can do it and how I need to do it. And so it made a better sense for my system. All right, let's see here. All right, so here's some of the times that you actually wanna use logical. And a lot of these times we've already talked about and a lot of these things we've kind of talked about but I wanted to make sure this is clear of why this matters and why this matters to you and what's the value here of this presentation. Logical is really a great way to consolidate database systems. So for example, I have about five core Postgres servers. I have the product that we created, Podium, creates messaging, review invites, all sorts of different data about different things, but we don't come together. They're all in separate microservices. They're all in different places. So they all have their own different database server. That makes reporting very hard and our reporting teams need to be able to consolidate and say, no, I need to see this conversation was tied to this review invite and this item was tied to this person. These organizations were tied to these users. We need those all together. So logical replication became the way to bring those all together because we could publish from all the five different core databases and then subscribe to one data warehouse system so that we got all the data in one place. Now, I know what you're saying, you can also do this in ETL processes. There's also systems out there that'll do this for you or you can write Python or other things. Yes, those are all possibilities too as well. But for us, this was a great way for us to get up and running quickly just by using logical replication. And it did. We got logical replication for all of these five systems set up and running and making to a data warehouse that was answering questions in just a few short weeks so that it could get all the things to those right pieces. So it's something that makes it a lot faster than trying to write your own custom ETL. Now, I know a lot of people will also say, well, wait a minute, wait a minute, if you're doing data warehouse, you're doing ETL, you're doing data lakes, you need to cleanse the data. You need to change it. You need to be concerned with, hey, maybe this field shouldn't say this or anything like that. Garbage in, garbage out, basically, you got to be worried about that. Yes, logical does not answer any of those questions. It is not there to answer those things. It simply copies the data from one place to another. But that works again in our system. If you have to do some major complicated transformations, logical replication is not going to be the answer for those pieces. For those, you can look at the foreign data keys. I mean, the foreign data wrappers, you can look at your own ETL system, Kafka, something else that does that, or you can use logical replication to get the data to a different database and then run your process on that. Because a typical data warehouse flow is get the data out of its normalized state, denormalize it, get it into a staging scenario, and then run your systems off of that, run your ETL off of that. So logical can do that for you. It can make that staging system for you and say, here you go, here's all your data in one place, now run whatever you want on this, run SQL statements, Python, whatever you want to run against it, you can run it directly right to there. So that's one of the big use cases for logical is that you can really consolidate the databases down into one server so that everything is in one place. And again, if you have lots of databases and lots of different pieces, this is a great way to do it. Some of the limitations to doing that, of course, you cannot have the same table names, okay? It's pretty simple. If you have a system out there with users and you have another system with users, you cannot put two users tables on the same database. It will not work. So you're going to have to either move them into different databases, move them into different servers or change their names or something like that. We actually ran into this with several tables. We had microservices that had the same name table as something else. And so we had to go and do custom for that. We did not use logical replication for those things. The other big thing that I found that was really, really interesting to me and I don't, and again, this may only be staying inside 10 and maybe it's not in 12 or anything and more. But if you put them like for our example, we put them in the public schema. They were just there in the public schema. They didn't have a separate schema. If you put them in a public schema, they have to be in that public schema on the other side. It would not allow me to change what schema they were actually in. So the fully qualified name had to stay the same on the schema on the other side. I found that was very interesting and I tried and tried and tried to break that, but it would not let me, it would not let me type in and just put in, I would like you in the data warehouse schema, not the public schema, it wouldn't let me do it. So again, I'm not sure exactly why it was hard coded that way, but right now it is and that might have been updated in 12 as well. Okay, taking a quick water break since you can't see me. Let me get a drink really quick. Okay. So other reasons to use logical upgrade and migration paths. There was just a great blog post on this and I wanna say it was posted in the Discord channel on educational content about talking about this exact idea that you can now logically replicate the database over to a new server that's on version 12 instead of 10. So if you want to upgrade from say 10 to 12 or 10 to 11 or whatever versions, you can logically replicate all the data over there and then cut over to that new database. Whereas you couldn't really do that before where you would have a master and a follower. You couldn't do that because the follower had to have the same version as the master. With logical replication, you can change the version of the destination. So this means you can have a 12 server out there, you can test it, you can run things against it and you can cut over to it as well. This is a great idea and a great way to put it on and to solve that problem in the future of how we do sort of zero downtime upgrades. How do we do that sort of thing? You can now logically replicate and then cut yourself over. Definitely go check out the Discord channel. Like I said, I believe it's in educational content. I would look it up right now but I wanna stay on my slides. So if somebody wants to look it up and drop it in the chat, great. All right, and then different set of user schemas, you can use that so that you get a whole different set of people to look at it. So you have a master system that you don't want your reporting analysts looking at. You don't want them to be able to see that master system directly. You don't want them to get in there and deal with that transactional data. So you give them a copy of it this way and you say, here you go, here's your place to go and look at it and you give them permission on that side so you can separate this all out. And then of course the different performance footprint like I mentioned earlier, you can change indexes. You can change these things about it and that allows you to really do something different with how the system is used. I'll mention these limits down here just so that you're aware. All tables that you replicate have to have a primary key or unique key, all right? It's just gotta be there. It's not gonna work if it doesn't. I actually might have a little demo on it here in a second but we'll see if I have a chance to show that or not. And then doesn't replicate schema sequences, truncate statement or blobs. Now the truncate statement is one I'm probably gonna demonstrate here in a second because I'm using PG bench to do some work just to put some work on my database in the background. And I found that the truncate statement that PG bench does doesn't get passed over. And so my counts were getting off. I was trying to show this nice little demo where I validate the counts and then all of a sudden the counts were different. And I'm like, how are the counts different? This is working all the time. But I realize now is that there's a truncate statement inside PG bench. And so when it truncates, it ruins that count and it's off. So you gotta keep these things in mind if you're using heavy truncate statements you might have to do something differently there because it's not tracked inside there. Blobs also large images, stuff like that does not replicate over. Again, I believe that in 12 there was a change made to start including those or maybe it's still just on the roadmap. That's something to validate and find out. Jessica mentioned linked logical replication for upgrades message and discord. Thank you. Thank you, Jessica for adding that in. Okay, so let's go ahead and demo now. This is kind of what I'm gonna go through. I'm gonna do a setup and create of a publication and copy data on all of these little things. And then I'm gonna talk about a few of the pitfalls that we ran into as well. So I just wanna make sure everybody's on the same page. So you get the right information and then we'll talk about some of the pitfalls that we ran into. Okay, so just full disclosure. I'm not gonna make anything that's complicated or anything. I'm just gonna talk to you about my quick environment here. It's a simple Docker. It's got a Postgres standby server on it and a Postgres master server on it. I'm running both of these. As you can see right here, this is the log forum. They're running. It's really, really simple. There's nothing complicated about this system. In that GitHub repository I posted at the very beginning, I included this Docker compose. You'll see it's 20 lines. It's about as simple as you can get. There's a little database Envr and there's a database standby Envr. All you have to do is put in some sort of password there. I will be honest, I used password. It's because I'm going to destroy this entire thing after this presentation. It doesn't really matter. Don't use password ever in your production systems, but this is a demo and it's fine. I'm just gonna start up bench. I just want to get a little bit of bench going just so that it's got something to do essentially. So it's pushing some data inside there when we do this. Now let's talk about the different server and I'm gonna connect to all of these. Over here on the master server and I'm gonna actually, on another screen, you guys don't see this really, but I'm kind of pulling up what I'm gonna do over here because I'm just gonna copy and paste from my own notes. I know a lot of people like to do live coding and stuff, but yeah, I'm not one of those. I like to actually just type things. So on my master server, as you can see, I have a standard everyday PG bench table here. This is what PG bench creates. I initialized it earlier. It's all the standard stuff from PG bench. There's nothing special about it. And what I'm gonna start with is I'm just gonna publish. I'm gonna create a publication. This is really the first thing you need to do on the logical replication. You need to create a publication on your master server. And keep in mind when you create this publication that you really, there's not a lot of harm in creating the publication. A lot of people will tell you that, well, I don't want it to fill up my wall log and all of that. We're gonna talk about that in a second and when it can fill up your log wall log, but just creating a publication doesn't hurt anything, okay? It's just a publication that's out there. No one has subscribed to it. No one knows it's even there really until you get somebody to subscribe to it. So very simple syntax, create publication. You call it something. I call this one bench for table. Now, if you have multiple tables you wanna put inside your list of publication that you just, right there, you say for table and then you just separate them with commas. You just add a few more tables inside there and you can change that. If you want to alter a publication that already exists, you can say alter publication, add table and you can add another table to the publication as well. So these are things that it's really easy to do. If you're looking for the syntax, go check out either the several nines article or just go look at the Postgres docs directly. These are really simple commands and stuff. Okay, so I've created my publication on the master. Now I need to go down to my standby and my standby does not have the table, okay? You have to create the table right now. As I mentioned earlier, it does not push over schema. The snapshot can snapshot data but it does not snapshot schema. So I have to actually create the schema, the table from this other side. I've already got this ready just to make this simple and easy. And again, this is in the GitHub. This is already there as well too. So again, I'm creating the same exact table that we have on master. They're basically identical. Yes, I use the simple DDL statement that I stole from master to create the table. Okay, and then we're gonna talk about actually creating the subscription. So right here, we're gonna come down and we're gonna create the subscription. So as you can see, again, very simple, create subscription. The name of whatever your subscription is gonna be, I called it bench underscore history. Your connection URL is just the connection URL. Once again, please do not put password in this connection URL. I'm doing this for demo purposes. You should not be doing this. On all my production systems, we actually use a PGPASS password file. And what I did here instead is I said password file, you can put this in the connection string and I put a path to where the password file is. And inside there is a password only for that connection. So I know that these are the only really used for my replication pieces. I strongly suggest you do something like that or something that says these users are just really used for replication. They're not used for everything else. And you should use some sort of password file to not plain text your password here because when you look at the subscription information later on, you will be able to see that password too as well. So it's a good idea to put this into a password file and set it that way. So you give a connection information, you say what publication on that server do I wanna connect to? I wanna connect to the bench publication. That's what I called it over here, create publication bench. So that's what I'm connecting to. And then right here, I'm gonna say with copy data equals true. Now, what that means is that I want you to pull all the data over from the master server and give it to me. I want you to copy the data and put it in there. This is the step that for me, we didn't do this a lot. We say copy data equals false. We copy the data ourselves and then we create the subscription afterwards. Again, it was used for large tables and it worked for us very well. If you wanna go into depth on that and discuss it more, hit me up on Discord or something and I'll tell you all about it. We also have code written kind of to help us do this to manage this and I'm happy to share that. It's just not ready for a GitHub repository. So, okay, so created and then this little statement right here too, this is what you need to keep in mind. Created replication slot bench history on publisher. Now what that means is that we've opened a slot now on the publisher to say track what's going on. This is the slot that says, okay, I'm actually gonna know what's going on in my system and what it's doing. So here is where I will type in a demo which is always a bad idea, but you know, oh, see, I told you I was gonna get it wrong. This is why you don't type in demos. There it is. I forgot my ass. Pluralizing, right? So as you can see here, and I'll make this a little bigger so you can see it right down here at the bottom, it has now created a slot, a slot type of logical and it says that it's active. These are the things that you should really check from time to time. It's active and it has an active PID because it's doing something. This tells you what LSN it's working on and where the last LSN is. We're gonna show you another stat for this and monitoring this in a second, but it's good to know that this is there and that's what it's connecting. If that slot goes inactive, that's when you can run out of space. If the slot goes inactive by something happening to the subscriber and it's no longer getting data, then that's where you run into problems and that's where this slot you need to know how to remove and get rid of it so that you can safely keep your production systems running. And I'll talk a little bit more about that in just a second. Okay, so just like that, we've got it really, really quickly. And now we have a bench history. Didn't I say I wasn't gonna type anymore in demos? There we go. So we've got to count here. We've got to count here. Now, obviously they're a little off because there's gonna be a second or two delay, but if I turn this on and I just watch for a second, you'll see that they're pretty close. I mean, obviously again, they're copying the data over. I don't have the fastest computer in the world, but if I do this, if I come over here to bench and I say, okay, I want you to stop, stop what you're doing. And I come back over here, now they're gonna sync up and they're gonna be 100% happy and they're gonna say, yep, I'm good. I'm synced up, no problems. So that's my point is you can see that they stay completely in sync. Okay, now let's break it. Let's do something bad as I talked about earlier where the truncate statement doesn't work. I mean, the truncate statement doesn't get passed over. So PG bench specifically truncates its history table. And I did this demo on purpose so that I would get a truncate statement to come across. So when I do this and I start that up again, when I go back over here, that one went to zero, this one did not. And it did not do that because I did not, I do not pass over truncate statements. So now they are essentially out of sync. But keep in mind, Postgres doesn't think they're out of sync at all. It thinks they're 100% happy because it does not know that it said, well, I didn't know that I didn't send the truncate statement, it thinks it's in sync. So there's sometimes a need that you need to count, you need to do counts like this and validate your data in different ways from your master to your subscriber or your publisher to your subscriber. Because if somebody comes along and truncates or something, these could be wildly different and you may not know about it. So it's a good idea to do validations on your own, besides just saying that, hey, is my replication working? Yes, it's working, but your data is actually not correct because something has come in that it doesn't know how to pick up. All right, so now let's talk about the other thing that we ran into a lot of nightmares. And so now that we've got this setup, because again, it is that simple to set up these things. There's not a lot of other setup you need to know besides these things. I'm gonna walk through the config in just one second and talk about the config parameters you have to set. But the other things that you wanna worry about is all these little nightmare situations that we get into over time. And probably the biggest one is changes. So if you're like me, you're a DBA and you're managing probably 20 or 30 developers or maybe you've only got five developers or something else or engineers and they love to go and make changes to the system. But that doesn't mean that they tell you about those changes. How many people actually know all the changes that are going through their system? If you do, I really relish your job and if you're hiring, let me know, I'll come and talk to you. So if they make changes to the table, like let's say they add a column, let's say they change something about the structure of the table, it's going to break replication on the master and you won't necessarily know about it except you get alerts inside your log. So let's go ahead and change this table and I can do a very, very simple little add column, nothing major. And once I push that over, and I'll come over here to my database log and it'll come up and it'll say this. Oh, wait, yeah, data's gone through. There we go. I was looking for it, I couldn't read properly. Right there on your standby, it says logical replication target relation, public PG bench history is missing some replicated columns. This little error message tells you that something was changed on your master and it wasn't changed on your subscriber. Now, this is where you're going to start to be concerned about wall logs and everything else because if we go back to here to this replication slot, this is no longer active, that's not good. That means the wall log on the master is currently filling up. Another way to see this is if you do PG stat replication, it's gone. So there was one here earlier and I didn't show it to you, but there was one and now it's gone. So in other words, it says, hey, I'm not even active. I'm not doing anything. So over here on my master, as statements and transactions are coming in, my wall log is filling up because it's keeping track of everything going on because it believes the subscriber is going to come along and fix itself and it needs to. But so it's still keeping track. So this is where you run into wall log situations where your wall log is going to fill up, your master is going to have problems, that sort of thing. So how do we fix that? Well, obviously, you come over here to the standby and you just add it back in and then once you do that over here, it will start to say, I'm catching up. So this is a nice little status too that you can find out right here. It says, I'm catching up. It means that I fell behind for a certain amount of time but I'm starting to catch up. And what you normally see is that it moves from catch up to streaming. Again, I have a very small demo here, very few transactions. If you're talking about gigs of data or hundreds of millions of transactions, that could take longer to catch up. But basically that's exactly what it's doing is it's catching up. So now back over here, the data is flowing through again. The log, the error message went away. It said, yep, I got no problems, missing replicated columns, it just stops doing that, stops complaining about it and it says, yep, everything's good, I'm back to normal. So that's by far one of the more common things that I see all the time. I mean, our servers and our systems are usually deployed sometimes two to three times a week, sometimes two to three times a day. They don't always add columns but they do make changes a lot. And so for me to try to track all of those changes, it just doesn't work. What we did in our system is we actually went out to our logs and we have Elastalert set up to watch for this statement and then it publishes into Slack and it says, this column is missing or it sends us this message and then we can just go out and add whatever columns needed through a little procedure. So it's pretty quick to resolve, it's pretty fast but just keep in mind, if you don't know about it and you don't have anything out there looking for it, it could very much hurt you in the future because if this goes false right here, if this says that it's not there or that line is missing or the replication slot is not active, you're filling up your wall log. So keep that in mind and make sure you keep an eye on that. So let's talk about one of the things that you actually need to set inside the config. So probably the one that you have to set, I mean, like more than anything else is you have to have a wall level of logical inside your master. If you have any other logic, I mean any other one or logical or above, you can do full as well, those work just fine but anything below logical, you won't be able to actually create the publication and create subscription to it. It'll let you create the publication, it won't let you create a subscription. It'll say you have to be set to all level logical. So that's one that's critical that you have to do no matter what. I will talk about these other ones. I included these in the GitHub thing. I don't know how best to show them, unfortunately I didn't open a notepad but let's talk about them. Max wall senders, max replication slots, max worker processes, looks like I duplicated some of these, max logical replication workers and then max sync workers per subscription. So the max wall senders, max replication slots really just depends on how many publications and subscriptions you're doing. If you're doing a lot of publications, and again I have a few, I have several servers that have only one or two publications but I have some that have more, you wanna increase your max replication slots because every publication uses a slot. So if you only by default, I think the default config is only for four. So you can really only create four replication slots for publications. So you need more replication slots if you're going to do more publications. On the subscription side, on the standby, yeah on the subscription side, you also wanna say max sync workers per subscription if you have lots and lots of tables. So if you have one big, like you have a publication and it's got 500 tables in it, you definitely wanna have more max sync workers per subscription so that it can work and more and more of those threads to get all the data moved over. Now please keep in mind that these obviously are going to impact your database systems as well. So you need to test this and find out what the normal or what your good operating number is. You should be running this at load and stuff so you can figure out how many subscriptions I have, how busy is the server, all of these sort of things you should look into. And then max wall sender is the same there. If you have a lot of publications, a lot of subscriptions increase your max wall senders because it'll help out significantly. Okay, I'm just about close on getting a few more things. I'm gonna wrap up here in a minute. I haven't gotten a ton of questions. So if you're waiting on questions, please give them to me but let's talk a little bit about the whole monitoring thing again. And I have these monitoring statements are already in the GitHub repository but I'm just gonna paste them in here too. So you can see some of the very generic simple things. If you wanna know if you have a publication, there's how you see a publication, it's really simple. If you wanna know what tables are in the publication, same thing as well. Again, pretty much as simple as you can get. Same with subscriptions, if you wanna know what you're subscribing to. The one thing I'll warn you here on the subscription side of thing is make sure that you're in the right database. I tend to connect to the Postgres database and I always run this and I never see anything. Make sure you're in the database so you put a subscription on or else you won't see it. The other thing here is this is where I showed you that connection info. As I said right here, it's in plain text. So don't do this, okay? Put a password file in there, make sure that you put it off to a URL. Don't leave it just like this. It would be a much better situation. And the same thing holds true here. If you wanna see what tables that it's subscribed to, you can see that or no. Yeah, did I type that wrong? Well, let me look really quick. I thought I had that. Maybe the tables don't show up. No, you don't have tables. Yeah, see, I need to double check my things. My statements, I'll update that in the GitHub. Copy and pasting too much last night, that's what happens when you do too much late night work. Okay, so and then already up here, I showed you replication slots on the master and that's where you see the replication slot. Again, you always wanna know that this is active, even setting up a alert around this and say, is this false or is this doesn't have an active PID? Where we've actually seen this as well is that we do have Kafka out there and Kafka uses what's known as a Debezium connector or some of them use Debezium connectors. And sometimes that connector will die. It will literally say, okay, I crashed, I can't do anything. It leaves these slots open. So we actually look for that and say, is it active or not? And if it's not active, then we have to destroy the slot because we know that it's gonna start filling up our log if we do that. So in that case, you have to be very wary of what these slots are doing so that you don't fill things up. It's a good idea to keep an active look on this whenever you can. Stat replication is very similar as well, but it only shows up really if something's there. If the logical replication slot is connected, but not active, nothing shows in stat replication. So it's a lot better idea to look at replication slots and look for active or not when you're looking inside there. Okay, and then one other good monitoring one we like to use is we take a look at this to see if we're really far on number of gigabits behind and you can change this. You can see right here, all we did was divide by 1024. So you could take this away and it would be megabits behind or whatever else. We have some really large things going on and so sometimes we are gigabits behind. So this is something that we'll look at and take a and see what is out there and what, how far behind we are for this. I'm not 100% sure why it actually shows that it says that it's behind right now because it's really not. It might be a little tiny bit behind. I guess that's why it's 0.09 but there's always a minor delay. Anytime this is a high, high number then you'll wanna be looking at this and we usually use this one. We have to do a really large update or a change to our tables or some sort of backfill. We go and take a look at this gigabits behind to see how far behind we're back on that. You've given some great resources for Postgres to Postgres replication. Do you have any resources that you recommend for Postgres to others such as Elasticsearch? So Mike, Mike Storey, thank you for your question and I am going to say there is an awesome, awesome extension for that. It's called ZomboDB. If you are looking to do something and that's okay, so let me rephrase that. ZomboDB connects your text from Postgres into Elasticsearch. So if you're trying to push directly into Elasticsearch there's not a known extension out there to me right now. The way we do it, because we are using Elasticsearch in our system right now is we have Kafka Connect and we have KSQL that runs some streaming processes and then we push directly to Elasticsearch ourselves and we do that through our own connectors. So that's right now how we get into Elasticsearch. But I would strongly suggest checking out ZomboDB anyways if you're trying to play around with Elasticsearch and Postgres because if you can keep the data in Postgres and you just want Elasticsearch like capabilities in Postgres, ZomboDB is your best bet. And you can hit him up on the Discord channel or just go and check out on ZomboDB. Hopefully Mike that answered your question. There's no, that I am aware of right now there's no direct connection from Postgres to Elasticsearch. We use Kafka and then we push directly until Elasticsearch through basically Elixir and KSQL through a custom process. All right, looks like I've got about 10 minutes or so. Let me check this really quick ahead of another. So let's go over clean up really quick. I just want to show these things so that you make sure that if you run into some of these situations here's some of the things you have to do. If you run into a time when for some reason let's say you drop the replication slot on the publisher, drop replication, see, is it gonna be logical? No. All right, somebody help me out. I didn't type this one in. It's, this is one I always forget, physical, no. Logical replication slot. Let's try this, see if I get it right. Bench underscore history. See, you forget these things, you don't type them in and you don't find these things. Why did that go all the way down there? Ah, there it is. You can see how much I often do drop replication slot, right? Okay, so let's say you come into here and a master and you can't do it, so you can't do it. Now, why would I want to do this anyways? This is bad, this is good that it stopped me. But let's say that this subscriber, this subscriber was connected and let's say the master actually died. Let's say the master just was shut down for whatever reason. As a matter of fact, I could, well, let's say the master is shut down. If that happens over here, you won't be able to disconnect it from the subscription. So it'll be upset and it'll say, hey, I can't get to this but I also can't get rid of that subscription. So one of the ways to deal with that is that if you say drop subscription or if you say alter subscription, you can disable it like that. Did I call it the wrong thing? I had PG in there. So you can disable it and then you can say set the slot name equal to none. And what that will do is that will allow it to disconnect without actually, without the having connections to the master at all, to the publisher. Because again, there's times that your master and your publisher are gonna die. They're gonna be completely gone or there's something wrong with them and you have to get this disconnected. You have to get the subscription disconnected. So you do something like this to actually disconnect it. Now, when I do that up here, it's going to go away from the replication. Again, it's not in use. See, it's gone. And if I look at the replication slots, it's still there, but again, it's not active. I've now shut it off from the subscription. Once again, keep in mind that that can mean the wall lock can go up. But if we're saying the master is gone, then this gives you how you can get away from the situation if your master is gone and you still need to drop the subscription. And then once you do these little steps, you can just simply drop subscription bench history and now it's gone. And so now you've completely removed it from the system. Again, this will still be there. This will still be there and you'll still have that problem. But then you can come here because it's no longer active and now I can drop the replication slot. If you find yourself in a situation where your master database is growing exponentially on its disk size or something else and you have subscriptions out there and publications, I would go and check this replication slots. If there's one here that's not active, that's false and it has no active PID, if you drop it, you'll probably clear up your disk space right there. It's a very common thing that we see all the time is that something has disconnected or dropped and that's why you're not seeing it. So keep in mind though that obviously if something did disconnect there, this synchronization, this data and the master data is no longer in sync. You're going to have to resnap shot. So that's something that you've got to be very careful of too. If you get into a situation where you have to drop replication slots, you have to do this, you're probably gonna be resnap shotting data because any changes during that time are lost. So you need to make sure that you keep in mind that as soon as you drop that slot, it's a resnap shot. And if you have millions of rows of data or if you have gigs and gigs of data, that might be a big snapshot that you're gonna have to do. We try to avoid snapshotting as much as possible whenever we can. Okay, let's go back to here. Set up and create. We went through all of these things. I don't think I missed anything. Absolutely, if I did, please let me know. So let's just talk about the summary and the key items here that we tried to talk about today so that we can get wrapped up. Logical replication is very good for a data warehouse and reporting needs in my opinion. It gives you a lot of features that you can do just specific things. If you're out there with a lot of tables, five, 600 tables, something like that in your database, this gives you a lot more capability to be very specific about what you wanna do. The pitfalls to this is that if you have a lot of changes going through your system, column changes, bar chart changes, column data type changes, all of these things, it's going to be harder on you. It's just that simple. That doesn't mean that it's impossible. Doesn't mean you can't make process around it. Just know that it's out there and you're going to have to be part of that process to make sure that these changes are made. And then of course, the snapshot and reloading for large tables, if you have 500 gig plus one terabyte tables out there, if you've got really, really large tables out there, be wary of how long that snapshot is gonna take and how long that reloading the tables are gonna take. In a lot of cases, it's fast, but if you're talking about a WAN situation where there's a server in one location and one very far away, you might have a very long load time for that sort of thing. And so it might be very difficult. And anytime you have to resnap shot, you always should think about it because a resnap shot takes a lot to move all the data from one place to another. And your master system and your subscribing system are gonna be working in the background. They're gonna be doing a lot of work in the background to do that thing. So keep those things in mind whenever you're doing it. Okay, I leave this slide up basically for any more Q&A, any last minute Q&A. The one thing that I will mention too as well is that please leave me feedback. You can go right to this little bit.ly link right here. PG-PW-feedback. It's a simple little Google form. I just try to get better at my presenting whenever I can and I love to get feedback from anyone about my session. Please be blunt about it. I even give little tips right at the top to say, here's how you can leave some great feedback. Also, go check out the GitHub link that has all these scripts and everything in here as well. You're welcome to use the scripts however you want. Again, this is nothing. No rocket science here, no special things, but just go out there and anything that can help you with that, that's great. If you see an error in the scripts, let me know. I'm gonna go make a few more changes to them here in just a minute. And then if you connect with me on LinkedIn, please make sure that you actually let me know that you say, hey, I watched your presentation. I loved it. Otherwise, I don't usually connect with people on LinkedIn. So that is all for me for today. Thank you very much for your time. I really appreciated it. And if you have any other last minute questions, feel free to drop them in the chat. There was one last question really quick. Is there something like multi-client to master sync? If client one is sync to master and then adds a column to master and days later client two connects syncs, what he's talking about there is very much merge replication of different types. So you would have to get into a master master situation to handle what you're talking about there, William. Feel free to email me and we can talk more about it and get that information. But yeah, that's definitely a master master situation. Thank you, Lindsay.