 Welcome to today's Postgres Conference webinar, What Next in Logical Replication? We're joined by Amit Kapila, who will discuss the basic architecture of logical refuelcation in Postgres, and then cover the various ways in which it can be helpful to others, including new enhancements and improvements related to logical replication that are currently being discussed in the Postgres QL community for future releases. My name's Lindsay Hooper. I'm one of the Postgres Conference organizers and I'll be your moderator for this webinar. A little bit about your speaker. Amit is a Committer and Major Developer in Postgres QL. Over the course of a 20-year career, he's developed deep expertise in not only Postgres, but also Oracle and in-memory databases. He works at Fujitsu as a Senior Director. Welcome, and with that, I'll hand it off. Take it away. Hello, everyone. I am Amit Kapila. I work with Postgres SQL Open Source Community as Postgres Committer. I work for major features, development of major features, and I participate as an author, reviewer, and Committer in the community. And I work as a Senior Director in Fujitsu. Today, I am here to present you about logical replication, like what next in logical replication you can expect in Postgres SQL. So the basic agenda of this presentation is that I will first start with logical replication basics. Then I will go into some use cases for the same. Then I would like to highlight the key features developed for Postgres SQL 14, which is due release in September or so. And then I will talk about what could be there in Postgres SQL 15 and beyond. One thing to note here is that Postgres SQL 14 work is yet not released and community is entitled to revert or change some feature which is getting released. And Postgres SQL 15 is still just started. So all of this is based on what I could see being proposed in the community at this stage, but nothing is promised what could be done for Postgres SQL 15. So to start with logical replication is a method to replicate the data selectively unlike physical replication where data of the entire cluster is copied. So it works basically on the publish and subscribe model. Like one node will publish its data and other nodes can subscribe to it and receive all that data. The node on which publication is defined we usually refer it as publisher and the node where subscriptions are defined we usually call it as subscriber or subscribers. So then we could also allow to build the cascaded replication system here where the data pulled by subscribers from the publishers can be republished and it can be allowed to fetch by other subscribers. In general, the subscriber database behaves as any other Postgres SQL instance and can define it own set of publications. Now let me try to define what is publication and subscription. A publication is a set of changes generated from a table or group of tables. And in future it could be group of objects like sequences or for schema like this. So basically publication is a way to generate the set of changes for defined group of objects. And each object or table could be added to multiple publications if required. And then we can also define which combination of operations or which operations data we need like insert, update, delete, truncate. We can either want the data for all the operations or we can define which operations data we need on the subscribers. Subscription can subscribe to multiple publication and it initiates the connection to the publication node. One of the things we have to note is that we don't have DDL replication at this stage which means schema definitions are not replicated and users need to manually create the schema on each of the subscriber nodes. So the object here we are talking table must match on both sides using a fully qualified names to allow replication. Here fully qualified names means the schema in which it is defined should match also on both sides and the columns also should match on both sides. So how the basic replication works is that it starts by copying the initial snapshot of data from the publisher. And once the initial data is copied, the changes on the publisher are sent to subscriber as they occur in real time. Basically once we get the initial data which is there in the publisher before our replication starts, we get data change by change based on each operation. Now, sometimes as I have told you, subscriber is an independent node. Applying while applying the data on the subscriber can generate conflict if it violates any constraint like primary key or any null constraint or anything like that or any check constraint, sorry. So basically it is quite possible that while we are replicating the row, somebody on the subscriber has already inserted that row, in which case it can violate some constraint. And as of now, a conflict will produce the error and the replication will be stopped. So such an error needs to be manually resolved at this stage. In future, we might want to build a system where such errors could be resolved automatically as well. So a setup of logical replication is really, really very easy. The only thing parameter you need to change is ball level logical equal to logical in PostgreSQL.conf file, then start the server, connect to a database and create publication for required tables. The simple example is with this command, create publication name, then the tables for which we want replication to happen. And then on the subscriber, no separate setting is required. You just connect to a database, create the subscription for required publication with a simple command like this. The above will allow the replication of users and departments, table data to the subscriber node. So here I have given the example of one publication and one subscription, but there could be multiple publications, as I have told, and one subscription can subscribe to all the publications, which internally means all the tables or the objects publicized by those publications will be replicated on the subscription. Now I will briefly mention about the use cases for logical replication. So it allows us to send the incremental changes in a single database or a subset of a database to subscriber as they occur. And we can also consolidate multiple databases into a single one, which could be used for analytical purposes, OLAP databases kind of thing. And we can also replicate between major versions of PostgreSQL and we can give access to replicated data to different groups of users. One could also imagine to build a read-only or write multi-master database based on the logical replication. Next, I will talk about key features in PostgreSQL 14, the release which we are expecting to come in September or so. So one of the major features we have developed for PostgreSQL 15 is to allow the streaming of in-progress transactions. Prior to this for large transaction, there is always a big apply lag as we only replicate the transactions at the commit time. So if the transaction is very large, we don't start even sending the data to the subscriber till we encounter the commit. So this will introduce a very big apply lag. Now, because we start streaming the data to the subscriber after a certain memory threshold, this allows us to send incremental data for a transaction on the subscriber. So how it helps is that earlier the data of large transactions were spilled to disk even when we don't need it. So this feature saves a lot of disk IO required for such transactions. For example, the transaction could contain the data of multiple tables and one of those tables is not even publicized. So all the data will get rejected but because we don't have a way to find out, the data gets accumulated and we need to spill it to disk. So this feature saves a lot of disk IO required for such transactions. The performance improvement is proportional to the data of such transactions. We have seen the performance improved by two or more times due to this feature for large transactions. And now this feature also adds various APIs to the output plugin. So that not only the built in logical replication but the users of the decoding output plugin or the people who have written their own output plugin can also stream the in progress transactions. So the main APIs are stream start and stop. So each set of changes are sent between start and stop. So say we are sending 10 changes. So for each of, say there are total 100 changes and we decide to send them in chunk of 10. So for each of the 10 changes there will be a stream start and stop so that the subscriber can recognize the boundaries and it can accordingly take action for those messages or changes. Similarly, we need to for the commit and abort we need to send stream abort and stream commit. And then for each of the change, the message, the stream change API and stream message and stream truncate APIs are introduced. So this is mostly about this feature which is the streaming of large transactions. So apart from this, we have done a performance improvement in logical decoding where if a transaction contains a lot of DDLs, we improved the CPU usage and decoding performance of such transactions. It has been observed during our tests that decoding of a transaction containing a truncation of a table with 1000 partitions would be finished in one seconds, whereas before this work it used to take four to five minutes. So you can see how big difference this creates. The basic idea was that instead of doing the invalidations on each change, we do the invalidations at the command level. So this reduces the overall number of invalidations that gets processed, which helped us in improving the performance. So next, like while introducing these features we also came across the requirement where we want to allow user to see the performance of logical decoding. Like right now there is no way for the user to know how many transactions are streamed or how many are getting spilled. So sometimes it is difficult for the user to see why it is taking so long a time. To allow users to administrate the logical decoding, we have introduced a new view, which is PG-STAT replication slots. This will contain one row for logical replication slot showing, which will show the statistics about its usage. So here there are two type of statistics. One is spill statistics and other is stream statistics. So I think I forgot to capture here. We also capture the total bytes that are sent. So spill stats tell us about the amount of data that gets spilled to disk while we are decoding. Now why data gets spilled is we have a memory limit that if the changes crosses that memory limit, we spill it to the disk. And stream stats tell us that when the number of changes exceeds a certain limit, how much data we have streamed to the subscriber. So yeah, this is about this view where users can really administrate their logical decoding performance. Then the other key improvement we have done is in the initial table synchronization. If you remember, I have told in the beginning of the presentation that the first step of logical replication is to initially synchronize the tables. So the initial table synchronization involves copying the initial snapshot of table. And then the table is brought up to a synchronized state with the main apply worker by applying individual changes. So here while doing this, all this work is done in a single transaction using temporary replication slot. So this, that mechanism has a few drawbacks. For example, for the whole duration of the copy and the sync in the table sync phase, we will hold the wall till it is complete on the publisher. And any error during the sync phase will roll back the entire copy, which is really very painful for large copies. And then there is a risk of exceeding the CID limit because even the multiple transactions get merged as a single transaction during this phase. So we have solved all these drawbacks by allowing multiple transactions in the table sync phase and by using permanent slots and origin to track the progress of table sync. So this work helped us to improve the table synchronization mechanism. As such, users might not see some user visible difference, but some of the users who have large copies or large data table synchronizations might see these drawbacks get removed. So this is mainly about this improvement. So next I will talk about, we have also introduced a way to transfer the data in binary format among logical replication nodes. So the users need to provide an option while creating a subscription or if they can alter it afterwards so that the data from the publisher be sent in the binary format. By default, this option is false and the data is sent in the text mode. Also, sorry, even when this option is enabled, only data types that have binary functions for send and receive will be transferred in binary. And this mode is generally faster, but it has a slight drawback that when we are doing cross version replication, if the subscriber lacks binary receive function for the data type, the data transfer will fail, which users need to be aware of. So yeah, next important thing is, in this release, we are also going to allow decoding of prepared transactions. So this also in a sense reduces the apply lab. For example, earlier, the transaction data is only sent at the commit time. There could be a lot of difference between prepare and commit time. So users won't get the data till commit, but with this feature, they will receive, they can receive the data at the prepare time. And this will also allow the plugins to decipher the transaction at prepare time and routed to another node if required. So I will cover in future slides how this feature could be used to build better, sharding system or a better multi-master system where the read-only data or the read-only queries could be scaled. So some of the miscellaneous features we have added in Postgres SQL 14 related to logical replication are, like we have added a new syntax to add and drop a particular publication. Earlier, we don't have a way to add or drop one publication. User needs to use the whole list of publications, which they want to keep as part of subscription while changing it. And then we also enhanced our internal to get the messages via PG output. So we have provided a new option messages to the PG output plugin. This allows decoding messages that gets generated via PG logical emit message. This is useful for PG output plugin users that use it for change data capture. So this is mostly about the features which you will see as part of Postgres SQL 14 release for logical replication provided none of this gets changed or reverted by community due to some reason. But this appears to be a very good list of features which I think will help many users of logical replication via Postgres SQL 14. Next, I will talk about some of the features which we are discussing in the community for Postgres SQL 15, which will happen next year, but there is no as such guarantee or any, you know, word from the community side that these all will be released by Postgres SQL 15. This is based on my observation on what is being discussed in the community. So first we will allow the inbuilt solution of logical replication to stream prepared transaction. As I have mentioned few slides back that we have started allowed decoding of prepared transactions in the Postgres SQL 14. In 15, we will see that those transactions are going to be streamed on the subscriber. So this will really help us to build a conflict free logical replication because now if there is any conflict that happens on the subscriber while applying, user can get a chance to roll back its transaction on the primary. This will help us to support two phase commit via logical replication, which I think is a very big win aside from just supporting, aside from just getting the benefit of apply lag. Now, as I have mentioned earlier, another use case with this one can imagine is to build a solution for scaling out reads. Because of two PC, we can ensure that on subscribers we have all the data committed on the master. So this means one can design a system where different nodes are owners of some set of tables. And we can always get the data of those tables reliably from different nodes. And then we can have one external process that will route the reads accordingly. The right can always happen on the master or the publisher node. Then the next bit feature we are discussing is to allow the publication of schema and sequences. As of now, users, if they want to specify publication for all the tables in the schema, they need to specify it one by one. But with this new, excuse me, but with this new option for all tables in schema, users can actually give the schema name and get the replication for all the tables in that schema. Similarly for sequences, and the beauty of this syntax is that users will be able to mention the schema as well as the tables which are non schema specified. So various objects could be specified in a single publication with this syntax as well. Now, allowing to publicize schema will help users to logically replicate all the tables in a particular schema. And similarly, once we allow the replication of sequences, this will reduce the manual intervention by users where they need to keep sequences in sync on the subscriber. So I think this will be again a big step if we are able to achieve this for PostgreSQL 15. The next thing we are discussing from quite some time now is to allow logical replication from physical standby. As of now, if the publisher nodes goes down, it is not easy to set up the data from the publisher node. So if we allow the replication from physical standby, users, it can automatically switch over to the standby, physical standby, and the replication will continue to work. This will allow existing subscribers to connect to physical standby for existing publications and get the data. And this feature will also provide a way to continue logical replication after the standby is promoted as master, which is not possible as of now. Or users need to manually do multiple things to achieve this. Then the other feature we are discussing is, we call it as row filtering for logical replication. So you can imagine that for any table while specifying the publication, users can specify the where clause to allow only those rows to be filtered and sent to the subscription. So this will help us to again reducing, to avoid sending the data which is not required by a subscriber. Or in other words, we can say that this will help to further replicate the selective data. And I think this will also help in sharding the data at various nodes. One can imagine that one table could be sharded to different nodes using this scheme. Now we are also discussing for the column filtering for logical replication. So one can imagine a syntax where a particular table's columns could be mentioned and we will just send those columns data. This will help us in replicating when the target table doesn't have the same columns as the source table. And also for the cases where the source table has some sensitive information which they don't want to replicate to the subscriber. So the last but not the least feature which we are discussing is in some way will help the conflict resolution on the subscriber. As I have mentioned a couple of times during this presentation as well that while applying the changes on the subscriber there is always a chance of constraint violation. And this will allow the, this will stop the replication. As of now, users can resolve the conflict manually by either removing the conflicting data or by skipping the transaction which cause conflict via PG replication and origin advance. But both of these are options are not so easy for users to follow. So the new option we are trying to provide to users is to specify the transaction ID for which they want to skip the transaction on subscriber. This option needs to be carefully used otherwise users can cause replica inconsistency. And additionally, we are planning to show the error transaction information via the stats view that will help users to easily identify the conflicting transactions. So thanks, you can contact me anytime at amit.capila16 at the redgmail.com. So this is all I have for today.