 Hi and welcome to PG pool 2 performance and best practices. My name is Lindsay Hooper and I'm one of the conference organizers. I'm here with Mohammed Osama who's a database architect and post-gres consultant at Higo software and he's also a PG pool 2 core committer. Mohammed's been involved with database development since 2006. He's a core committer for the open source middleware project PG pool 2 which we're here to talk about and he's played a pivotal role in driving and enhancing the product. Prior to coming to open source development Mohammed was doing software design and development with a focus on systems level embedded development. Thank you Lindsay and good morning good evening depending on whichever times when you are. So for the next three hours I'll be talking about PG pool 2 how to set up and the best practices. Okay this is me Lindsay has already given a nice detailed introduction so I'm Mohammed Osama you can call me Mohammed and I'm a senior database architect with Higo. Prior to that I was associated with Enterprise DB and I'm kind of working in post-gres for the last 12 years and I'm also active member and core committer of PG pool community. I'm the author of some of the exciting PG pool features and the most notable of them all is the modern watch dot for PG pool 2 which we'll learn as we go through the training and that is the high availability component of the PG pool 2. Okay enough for me let's get on with actual content that matters. So this is the agenda for the two days training so basically the training consists of four sections. First I'll give you a brief overview of PG pool 2 what it does its architecture its value proposition. Next we'll move on to the simple example walkthrough that is related to setting up the basic PG pool 2 setup. By then I'm hoping we'll have an idea on how to get started with PG pool 2 and set up the basic cluster with two postgres backends. So after that we'll try to build on that example we have just done and try to add a new stand by postgres backend to the cluster without disturbing the service. And yeah for all these examples I've added a small how-to videos in the slides in hope to make things easier to understand. Initially I was I wanted to include some interactive exercises but then I realized we may run out of time in doing so so and would not be able to cover all the things so let's see if the videos do any good. Right in the later part of the training after the example walkthrough I'll discuss the main features of PG pool 2 which would include load balancing, connection pooling, authentication, securities, watchdog, management interface of PG pool 2. For each of these features I'll try to cover the configuration and the best practices and their usability. And finally if we still have time I'll try to touch on some of the more exciting features and PG pool 2 admin. Right I hope you'll enjoy the training get get something useful out of it. And yeah we'll have a question answer session as Lindsay told you at the end of the training but you can also stop me or just type in the question in the chat window at any time and I'll try to answer that as best as I can. Okay so PG pool 2 let's start with it like what is PG pool 2? Well PG pool 2 is around like more than a decade now and I'm sure most of you already have a good idea about what PG pool 2 and what it does so I'll try to be quick and go through this slide. So it's a cluster amendment tool dedicated for Postgres and sports wide range of Postgres versions like from 7.4 this is when that was a version when PG pool 2 was created and up to the latest version that is 12. And it also sports the Postgres drive products like Amazon Aurora Redshift and proprietary flavors like EnterpriseDB's ETS. So yeah and it has the same license as Postgres so you can just pull it from the pull the source code make your own changes and make it like modify it and like market it and whatever you want to go with it. So yeah and it is used for managing the Postgres cluster and does that have the functionality of automatic failover we learn about that and you can also attach the and the new server and you stand by your repairs servers after re-syncing it without using the online recovery or a manual methods we'll again we'll discuss this and yeah this is pretty much it and other than that it provides a connection pooling curicash a washdog that is a high availability component and load balancing. Okay so so from the architecture point of view PG pool is a proxy or a middleware server it's it between Postgres instance and the client application and understand the Postgres front end and back end protocol that means that application connecting to Postgres through PG pool to see no difference and require no additional changes or modifications to work with PG pool too. Also since like Postgres because of its license and feature as you know has been adopted by many companies to create their own modification as we discussed in the previous slide and so it kind of works with all flavors of Postgres like AWS Aurora and Enterprise DB's EPS and and Redshift and so many others. Okay so if you talk about architecture just like Postgres architecture it also PG pool also uses a multi-process architecture and each child process of PG pool to handles one client connection at a time but unlike Postgres the life cycle of the child process that handles the client connection does not end with the client disconnection. PG pool too spans the configured number of child processes at the start-up time and each child process keep waiting for the client connection and as you know while in Postgres like Postgres Postmaster main process listens for the client connection and as soon as the new connection is received the new child process is spawned we call it back end and that connection is handed to that back end which serves it until the client disconnects and that back end lies with the client connection. So this is this is kind of a difference like both are multi-process architectures but PG pool too spawns all the child process at the start-up and each child process waits for the connection while in Postgres like the processes are spawned as they are needed. So this is a very important architectural aspect of PG pool too that needs to be considered when we are creating a PG pool cluster because this affects a total number of concurrent client connection PG pool can handle. Similarly it affects the number of pool Postgres back end connections we'll definitely discuss this in detail when we'll be discussing the connection pooling of PG pool too. Other than the child process which handles the client connection PG pool also spawns some other processes for internal tasks like health checking, watchdog, replication check, life check and again yeah the thing we'll discuss in detail. Okay so yeah this is this is this is the process structure diagram I will just quickly go through it the green rectangle in the middle upper middle corner it's the main process that is responsible for spanning all the other processes and it also performs some tasks like monitoring the child if any child goes down it responds it deliver the signals and most important of all it performs a failover and failback operations on the back end nodes. Then the pink process on the left yeah these are the child processes we just talked about they handle the client connection they are the actual workhorse of the client connections and then all the blue ones on the right and in the middle they can be categorized as worker process and they use to perform like they are used for different kind of tasks like some of them perform the health checking of the back end node and each each back end health check is performed by a separate process that means if you have three back end nodes configured for health checking then the pg pool will spawn three health check processes well this is again this is this is the recent change in pg pool 2 previously the health checking was done by the pg pool main process and but then we did not have the luxury of fine tuning the health checking for each back end node other than the health check processes pg pool spawn pcp process which listens for the pcp client connections and other child process like and and yeah and one more thing like the pcp client child the the pcp process also spawns its own child process like for whenever a new pcp command arrives separate processes spawned by pcp server process and that that command is handled by that process and when that one finishes the pcp child process dies with it so yeah other than that there are replication child process botched off check process as well so this is this is just overall a view of like what is how the multi-process architecture of pg pool 2 looks like from the top all right so yeah this is like how you you can get end up get up and running with the pg pool 2 the current version of pg pool 2 is 4.1.x the third digit would be like points to the point releases and we release one major version every year normally with each new release of postgres then there are many point releases in between the source code can be fetched from the grid repository that is hosted on the postgres grid and grid server then then you can see that i've listed some links for the documentation in pg pool 2 admin but the the the pg pool admin is a GUI-based management console for the pg pool 2 and can be used to perform tasks like online recovery and configuration but to be fair the management console hasn't gone sorry but to be fair like the management console hasn't got much attention lately and you may find it not not very robust and attractive just recently the community has started working on the next 10 pg pool admin and hopefully in a year time we'll have a much better GUI so stay tuned for that and finally there's a link of the task list wiki page if someone is interested in working on some feature for pg pool 2 or wants to get an idea of what community is working on currently they can just just go to that link and we list everything we are working on on this wiki page okay so yeah these are the links for mailing list bugs porting system yum repository similar to postgres pg pool has general and hackers list where journal is more for a user level discussion and hackers for sending the patches discussing new feature and more code like stuff so yeah just in case you you need to like send something to the community okay so so now coming to the actual training like i think most of us must be already familiar with the dockers and docker environment well in short docker is just a platform as a service product that uses os virtualization to deliver software in package called containers and as you know like dockers are lightweight and can be scripted so they are the best choice to create and test the cluster with multiple nodes for that reason i've created a docker based examples for this training and i've placed them on my github account and all the example walkthroughs will will will do in the training they also refer to the same docker environment so basically the the pg pool to yeah okay so i've listed like i've created three examples and and they are they're all hosted on the on my github account which i've like given the link to and the first one is pg pool underscore two pg underscore node setup that is like in this example you'll see the set it set up it sets up two pg pool pg pool two node with two post-test backends and you can use this to test a new stand by node add a new node to the cluster without disturbing the service similarly pg pool three pg node setup this is like the pg pool setup with three backend nodes and the last one pg pool two washdog example it sets up the pg pool uh with the washdog like we'll learn this when we go through so when when washdog is involved you you install multiple pg pool two nodes so in this example the the setup use three pg pool two nodes and two post-test backend nodes so total five systems works together and you can just try try that out and to test the different features of washdog okay so so if you want to try the example you would need a docker engine and docker compose install on your system both are part of the standard docker product desktop installer which can be downloaded from the docker's website so you can go on and and try that and for running each example i've also included the readme which is placed along with the examples but it's it's just very simple you just go into the example directory you just write docker compose build and then docker compose up and you'll have the working system with everything set up okay just a little more a little note on the example environment all the examples are based on centOS 6 but of course you can change it to your linux flavor of your liking by a very little modification also for the post rate postgres instances the example deploys pg11 and uses the latest available pg pool 2 version that is pg pool 2 4.1 so yeah this is this is about the all about the the the resources for the training so coming to the clustering mode and scope of the training okay like you you may already know like pg pool 2 can be configured in three different modes the the first one is the master slave mode in this mode pg pool 2 expects that postgres back and node configured with pg pool 2 have the active replication working between them and one of the nodes in master or primary is is a master or primary while the rest of them are connected to that primary master but whatever you like to call it uh node as a standby or slave node personally i like to be named this parameter as primary standby mode rather than the master slave mode but that's how things are like it's called that just before i even know the pg pool 2 so we have to live with the master slave mode right now anyhow in master slave mode pg pool 2 expects that the primary postgres node can serve both read and write statements while standby nodes are actually hot standby and can serve the read queries again the master slave mode has two sub modes i've listed them there stream and sloney this is to tell pg pool 2 what kind of replication is configured between postgres system but most of the time we'll deal with the streaming application mode and that means pg server and are using the built-in streaming application sloney was used for application before postgres adds its own replication and since pg pool 2 is quite old so it still sports that mode maybe there are people who still uses sloney for the replication i don't know okay the second mode is replication mode which is basically we basically do not expect the configured backend of postgres server to be using any kind of replication so pg pool 2 tries to do the replication by itself when you set it in the replication mode written if you write so so as the name suggests in this mode pg pool 2 does the statement based application and it does this by sending the right statements to all configured backend and load balance the right read queries again it was quite popular back in the days but when there was no standard application solution for postgres but now like i i don't think so it's it's it's very much in use these days and the last one is the raw mode and in this mode you just use the pg pool 2 as a connection pooler and proxy server between blind and pg pool 2 and it does not expect like that the servers connected to that are actually replicated or something so this training and all the examples only focus on the master slave mode with streaming applications and that is the most popular and most useful string mode with pg pool 2 currently so so we'll just focus on the first the first one and yeah i'll just and give the overview maybe the time permits about the other other modes okay right before we get on with the actual configuration and setup or discuss various function of pg pool 2 i want to discuss a little bit about the value proposition of pg pool 2 like by and when we need to use the pg pool 2 so so i think that that's a valid question so so basically pg pool 2 as the name suggests was originally built for connection pooling and over the years a lot has gone into it and after that after the addition of load balancing module it used more it is used more as a load balancer than as a pooler so maybe we'll change the name from pg load balancer or something i don't know yeah and we all know the pooling and the connection gives performance benefit for a lot of workloads especially if your workload is kind of like each connection do a very small amount of work and then disconnects and then new connections come in so so for that kind of workload the pooling gives you a good performance and yeah so another good use case of pg pool 2 is it can add an extra layer of layer of security to your database system like we can hide the database engine from public network by placing it in dmz zone demilitarized zone what we call it and let the pg pool 2 interact with external traffic and we can hide the server in the postcard server in our local network so these these are two two good uses of pg pool 2 but the most useful of all is the load balancer as they say data is the new goal you know so no matter the use case small or big every database system installed in a real world required and then c and fault tolerance and like relying on a single system and hoping nothing will ever go wrong just doesn't work it doesn't work so in short like depending on the organization disaster recovery strategy we all deploy multiple replicated database and if you talk about postcards more often than not we use streaming application to have a real time copy ready to switch over to guard against the failures in the streaming application and primary standby is pretty much the only feasible way currently available to build a high availability that applies to us as there is no as you know there is no good production ready robust multi master system that exists today for the postless so we all use master slave or master standby kind of configuration so what this means is practically when we deploy a primary standby replicated system we end up utilizing 50% or less resources available to us as the standby post just sits there and keep waiting for something bad to happen with them with the primary so that it can take over so this is where the pg pool 2 comes in real handy right it can take take off the real load it can take off the read load from the primary database server and shift it to the idle standby and this is not on this not only give the performance boost but also help in reducing the cost and system requirement for the server for example if a particular workload requirement for a database system was to serve 1000 concurrent clients and let's say it requires x amount of performance out of the system so distributing the load would mean that same number of concurrent client would only need x over 2 or maybe you can call x over 1.5 amount of performance from the system so you can you can install a lower cost cost system just saying like so having a load balance is kind of a must have and it saves the cost and gives a performance yeah then then another question comes in like if you talk about load when it's a we can find many other solution currently exist in the market but what makes the pg pool 2 most attractive and obvious choice is that it does the seamless read write split that means you don't need to implement or change anything on the application side to unleash the load and balancing capability all you need to do is to plug in the pg pool 2 between client and server and you get it working you get the working load balancer other solution either require modification in SQL statement or use a separate read write connection to perform load balancing so so the pg pool 2 load balancer is a complete solution which doesn't require any more work you just need to plug it in so so I think this is this is the best use case of pg pool 2 okay so first thing first let's start with the overview of pg pool configuration parameter how we can interact with them and stuff like that so this is just a trivial stuff I will try to go over it as quickly as possible well the pg pool configuration is inspired by postgres configuration system so pg pool dot com follows the same syntax and format as postgres dot com so it's four kind of parameters boolean numeric string enumerated you are all familiar with each one of them so yeah in general there are three categories of configuration parameters change in value requires restart the other kind they can be changed by reloading the configuration and the third kind is they can be changed within the current session we'll discuss like how we can change the parameters in current session okay so just a second okay okay and there's there's a little difference between postgres and pg pool configuration parameters since because of nature of pg pool 2 we require to configure similar parameters for different nodes like we require to configure multiple postgres backend nodes with pg pool 2 so for that purpose pg pool 2 uses a node id or node number suffix with the parameter name for example for example when you want to configure the backend hostname for two postgres servers then the first one will be configured by backend underscore hostname and zero the suffix zero means that we are configuring the zero second and then the second one can be done using the second underscore hostname one and yeah the suffix starts from zero and can be negative number and goes up to the i think most of the parameters can suffix can go up to 255 so yeah zero to 255 and can be a negative number so this is this is a bit different from postgres like postgres doesn't doesn't follow this convention but for some requirement in pg pool 2 we have to do this okay so so like this is again kind of a trivial stuff but how we can change or interact with the with the configuration parameters that's first thing is like you can you can just write them in pg pool 2.con file and similar to it is similar to postgres.con and it it is located at the etcpgpool.con this is the default location but you can also specify it using the minus f startup parameter and you can place it anywhere you like to so yeah so this is again the hash at start of the line marks that this particular line is commented or pg pool will not read that configuration parameter and yeah this is this is the thing which pg pool 2 has added and we have added like two new commands in pg pool 2 that is similar to the show commands and set commands of postgres and pg pool 2 space show and then configuration parameter name this gives the the value of the post pg pool 2 configuration parameter and this is similar to the show command of postgres just you need to add a pg pool 2 before before the show and then you can also show the configuration parameter growth pg pool 2 pg pool space show then configuration parameter growth the group consists of a related configuration parameter for example if you write pg pool show back and it will give you the all the back and related configuration you have done in the pg pool configuration file and again just like show all you have pg pool show all command okay again like if you want to change some parameters value on the go you can use pg pool set command and this is again similar to postgres set command and it sets the value of session local configuration parameter for current session the change value gets reset to default one of the users users session ends and yeah you can see the synapses of the command like this is what we can use and there are some some like pg pool reset configuration parameter and pg pool reset all these reset the parameter to the default value but as what is set in the pg pool.com file all right so let's let's start with configuring our first pg pool 2 setup with one primary and one standby node this is this is what we want to create one pg pool 2 one pg pool 2 node that is connected to a primary and a standby server postgres server and the streaming application is working between primary and standby so let's let's try to create the setup as shown in the diagram again as i've told you like all the example walkthrough are like done on the docker setups and you can you can also try that out from using the docker environment i've uploaded on the github page so the first step in first step we need to set up a primary and a standby postgres servers so you have a primary postgres server server and first thing you need to do is to create a replicant slot on on the master we call pg master is the master postgres server and pg slave is the standby or pg standby is the standby server and so you can create create the replication slot by using select pg create physical replication slot and then you have to give the slot name and we use the pg underscore standby underscore one as a slot name okay i i hope you can see the video so i am on the bottom of the slide and it i have created a video by doing the same steps so okay so once you have the replication slot setup the next thing you need to do is you need to take a base backup on the standby server just to make sure like we have to we have a copy of the postgres data and that that is replicated so for that we'll use the pg underscore base backup and yeah and create a standby or a slave server on on on the second machine that is pg slave okay once you're done just just go on and start the postgres service on the pg slave and you can do it by service postgres then start because we are using the centOS 6 or and install the postgres using the rpm so yeah just just writing the service postgres then start we start the service and next thing we need to do is we need to configure the pg pool dot configuration and for the first thing is we need to configure backend hostname for both master and standby since like you can write ip address here but but i've already configured the hostnames in the docker environment so we'll just be using the hostname so backend underscore hostname zero will get you'll assign pg master and to backend underscore hostname one we'll use pg slave so so this is this this will do the the backend configuration that's it because we we are using the default ports five four three two if if if if your server is running on a different port then we can change the port number as well or anything but for now just we'll just set up the master the hostnames and we are good for the configuration other than that set the load valence mode to on and then master underscore slave underscore mode to on these these are the two configuration we need to change and load valence mode is not all that important everything will work fine but only the load when it's in the not work but let's let's do it right now so yeah another thing we need to do here is we need to enable the health checking and configure the health check we will use it we may require it later on when we'll be testing the failover and failback scenarios and for that we will set the health check health underscore check underscore user and we have created a pg pool user in the database or you can use any any user you can you can create a new user and put the username here and then you need to put the path health check password so that pg pool to health check and connect to the server using the username and password so here i've written aes password and i'll i'll try to explain like what is aes password and why we are doing that so to enable the health check we need to uh we'll change the health check underscore uh period to 10 you can change it to any value setting it to zero disables it otherwise currently we will set 10 seconds for health check period and 10 seconds for health extra time out and the other thing we need to do is we need to set the sr check period sr stands for streaming replication so again setting sr check period to zero will disable the streaming replication health checking streaming replication period checking otherwise uh setting it to 10 means it will pg pool to do will will check for streaming replication lag every 10 seconds and similarly as for health health check database we need to set up streaming replication check database we use postgres for uh database for that purpose and uh sr health sr check user and sr check password again these are these are the user and password of a postgres the user that is created on the postgres site again we are using aes password i'll explain it later so this is this is all we need to do that if you if you are trying like if you try to do that then everything is working fine like we have to set up now pg pool 2 with one primary and one standby node is is ready to use and the active load load balancing is working in that so let's let's discuss some important notes on the configuration we have done so far passwords like this is this this is the very important thing like so if if you go through the pg pool configuration file or you'll see pg pool 2 requires four database users to perform its internal tasks and these are health check user sr check user which is streaming replication check user recovery user and wd life check user you might not need all at every time but these are the four users that is required by pg pool to to perform the specific tasks so yeah we can we can use a single we can create one database user in postgres backend and we can use that user for each uh each of these tasks we just configure the same user for each each parameter but yeah for auditing and security requirement we may need to use a different user for each one each separate task again yeah that depends like what kind of the requirements are so so so the thing is setting a plain text password in pg pool dot com says the security risks we all know that like if anyone anyone can get hold of the configuration file they they they'll just they know the database user the database password and like the configuration files are not all that protected like the other thing so so it's it's it's not recommended to you use the plain test pass passwords so since version 4.0 pg pool 2 allows to use md5 or aes encrypted passwords in pg pool dot com before that there was no way like before version 4.0 there was no way you have you have to write the actual database password in the configuration file and that was a security risk so i decided to work on that feature and i implemented that this particular feature in 4.0 now you can you can use md5 or aes encrypted passwords so yeah the pg underscore md5 utility that comes with pg pool 2 can be used to create md5 password and similarly there's pg underscore enc utility that is added in pg pool 4.0 along with the feature that that that can be used to create encrypted passwords and the password does encrypting using aes 256 bit encryption and yeah you can just whenever you use the particular type password typing in pg pool confile the like for md5 passwords the password must be prefixed with md5 prefix and similarly for aes the password must be prefixed with aes that's how pg pool knows like with what kind of encryption the password is using this is this is just a simple like how can you how we can use pg underscore enc utility to create an encrypted password you just execute the pg underscore enc and write the db password and okay as we know like it's aes encryption and for doing encryption we need a key we like without without key we cannot do the encryption so so for that purpose and the pg enc utility and also the pg pool 2 they expect that the the key file is present in slash home slash postgres dot pg pool key this is this is the default location of the key file so whatever whichever file is present they are pg pool 2 read that file and use the content of that file as a key to encrypt the passwords and of course you can override the override the path which is used for the key file path or you can provide the key using using the environment variable we will discuss that in detail but but yeah this is this is the point like pg pool or pg underscore enc uses the key file to encrypt the passwords and then that same key file is required by pg pool 2 when it it has to decrypt the password to use it to connect to database so so but just just execute the pg underscore enc type in the password and you'll get the encrypted password and just copy paste that password in the pg pool config file and you're good to go so this is this is a recommended way of using the passwords in pg pool confile okay so so now we have created a single single node pg pool two instance which is connected to two backend nodes now let's let's use that the same setup and try to add a second standby or a third postgres node without interrupting the service step by step like this is this is what we have done in the previous example walkthrough and this is now what we'll try to make it to previously we have only one standby and one primary node now we'll use the same setup it will add the standby to the to the existing setup so pg pool does provide a way to to add a new standby node without interrupting the service and there are basically two ways to do that one one is you can manually configure the standby attach make it and attach it to the pg pool two and the other other ways online recovery so we'll look into both of them one by one so first the manually attaching again for the for the example walkthrough we'll use the same docker environment and add another third node postgres back and as at work so this is what we need to do if we want to manually attach a new standby node so as as we have done in the previous example first first thing we need to create a replication slot in the the master database or the primary database and we use the same command pg underscore create underscore physical replication slot and let's give it a name pg underscore standby two so yeah and then the second step is we need to create an online backup and that is that can be done by using pg underscore base backup and you can see in the video like i'm performing that i've recorded the steps we have done and once you have a backup ready just go on and start the postgres service and yeah so so till that point we have a third is the second standby node connected to the and to the master node but that that node is not present in the like that node is not configured in the pg pool so it can be used by pg pool two for loan balancing and stuff so after doing this just verify the master node has picked up a new standby and we can we can use it using the pg step replication view and select from the you can see in the video like over standby two is up and running so now let's let's add it in the pg pool configuration so we don't need to stop or do anything with the pg pool two just add go on and edit the pg pool dot confide and as we have done in the previous step add the second standby node and we can do it by adding the configuration for backend underscore hostname two and we'll write the pg pool node because the third system we have we are using is its name is pg pool node so write that in the hostname configuration and we'll leave everything as it is the port number is default port number which is four five four three two and just leave everything in that just go on and save the pg pool confide and reload the pg pool configurations so to reload the configuration we use service pg pool reload and that will that will make the pg pool two to reload the configuration file and doing so will not affect any any connection disruption or anything so we have just done that and now let's just connect to the pg pool to see if the third node is like picked up by pg pool two or not so for that we'll use show underscore pg pool two node nodes command and you can see like third node is configured but its status is coming up as down this mean although the node is is already configured in pg pool two but it is not it is not ready to use by the pg pool as as now so this is we have done show nodes okay so what we need to do is we need to attach that newly added standby tool to the pg pool two and we can do it by using pcp underscore attach underscore node command this is the utility that comes with pg pool two you just need to execute the utility using and put and provide the node id because we have added back in a score host name too so you have to put the same node id in the pcp attach node command and yeah you're ready to go so that's login again and see if the node is here you see that the node is up and running and it is now it can be used for load balancing and everything and we have added a new node without without interrupting the service sorry so okay now this this is this this was a manual method where we did everything manually and then attached it with the uh we attached it with the pg pool two using the pcp attach node command now let's try to do the same using the online recovery that basically only recovery performs the same steps as what we have done in manual manual way but it automates automation doing that and one of the benefit of using the pcp online recovery is that you don't need a physical or you don't need access to the systems and you can you can just do it using the commands or the the pg pool two client rather than actually actually intervening on on the master node or standby node so yeah so let's let's let's try to do the online recovery again and we'll use the same setup but we we created in the first example where we have one we have one primary and one standby node and for that and for online recovery to work pg pool two comes with the pg pool underscore recovery postgres extension we need to install that extension on the master server and then another requirement is that we need a password less ssh connection enabled between master and new server new server is where we want to restore or we want to create the second standby node and then we need to create two scripts that is recovery first state script and that should be that script should be placed in the master nodes data directory and the second is the remote start script again that that needs to be present on in the master node data directory so this this is kind of done once like whenever we are setting up the pg pool two cluster we create the scripts and place them there so that whenever we need to add a new node we we have we already have the scripts placed in the master node and same scripts will be used to recover no no matter how many nodes we want to recover from there okay so let's discuss about recovery first state script like this is this is executed by pg pool two and i've written the exact select command which pg pool two use to execute the script on the master or on the primary server so this this pg pool underscore recovery is the function that is added by the pg pool recovery extension when we install the extension we get this function in the database and it it it gets passed with recovery first stage script that is the script we have replaced in the data directory and then other parameters are recovery hostname there that is the hostname we want to recover the standby node on recovery data directory again the data directory where the data of the the standby will reside and then the master backend port and node id node id is whichever the node id we want to recover it to like in this example we use node id 2 because we already have node id 0 which is primary node id 1 which is standby 1 and it will recover the standby 2 or node id 2 and then recovery backend port this is the port number which which will be used for the the recovered node so okay so but what we do in the recovery first state script so basically the responsibility of recovery first state script is to create the replication slot on the master pg node which which we done in the previous example we we we done that manually by using pg underscore create replication slot command and then it has to perform the base backup again we did that manually previously but now this is the responsibility of this script to do that then then the third thing it needs to do is to set up the appropriate postgres.conf configuration file and for example if you want to use a different port number we can specify that in the pgporto.conf and then the recovery script needs to update that in the postgres.conf configuration file and finally you need to set up the recovery.conf so that the standby should point to the correct primary node so this is this is just just the snippet of the example recovery first state script we normally used so you use so you can see like in first we use slack pg underscore create physical replication slot command this will create the slot on the master server and then we SSH to the to the destination host which will host the standby too and then we use pg base backup and takes the online backup so so this is this is the and I have included the example scripts in the docker environments and they are also present on the pg pool website so normally you don't need to change much in the recovery first state script so you can just download the script from the pg pool to official site and you can use that after changing the name or maybe ssh key parts and that you're ready to go with that the second the second script that is required for the online recovery to work is the pg pool underscore remote underscore start so this this script gets executed after the recovery first stage command and again this is this this script is also executed by the pg pool underscore recovery extension we need that to be installed on the master and then I've written the exact slack command which which which executes this pg pool remote start and that is the slack pg pool remote start then the recovery host name and then the recovery data directory so what it does is it basically starts the server on the on the or start the standby node and normally we you normally we use just pg ctl to in the script this is this is the sample script and this is this is it just need to execute one one simple a single command and that is pg ctl minus l or whatever the way you need to start the postgres server you can use the service script or something like that so so these are the two scripts we need so and okay now now let's configure the pg pool 2.conf so for for the online recovery and for the backend node information we we just do the same thing we set the backend host name that is pg pool 2 pg pool node this is the server host address where we want to recover the standby node and then we need to set up the data directory again backend underscore data directory node 2 that is where we want to place the data of standby data and then we need to configure the recovery user we have already discussed like we should provide the recovery user and username and then the recovery password again password for that user the recovery user and we can use aes encrypted passwords or md5 encrypted password we can also use the simple test password but that is not recommended and okay so once we are done with that then we need to write in the recovery first stage command and that is the name of the script without the s h prefix so since our recovery this is since yeah this you need to put in the name of this recovery first stage script and this is recovery underscore first underscore state just put the name in the configuration parameter and next thing you are good to go so this is this is just example like of doing the same first we created a extension that is pg pool recovery extension create extension pg pool underscore recovery that that will do that and then after after editing the pg pool configuration file we need we will execute the pcp underscore recovery underscore node and you can see in example like here we are basically adding the back end information this is back end hostname underscore hostname 2 and then we set the weight we set the data directory and yeah in this example i'm using ip instead of hostname so you can use either ip addresses or hostname so we are good with the editing the configuration file now next thing we need to do is we need to restart reload the pg pool configuration and yeah this is done just verify pg pool has picked up the new node and you can see now the third node is added and but it is still showing as down because we haven't executed a recovery command and just go in and use pcp underscore recovery node and put in the node id and use the name and password for the pcp system okay pcp recovery node command successful so we are good and we just just show node and you you can see the third node is added and it is up and running and attached to pg pool too so so now if you want to recover another node just go and edit new edit the pg pool configuration file you don't need to take any backup you don't need to log into the system just do that and you you execute the pcp recovery node command and you will have another standby attached and up and running in no time so so this is this is how we can use pcp online recovery and pcp attached node to add a new standby node in the existing setup without interrupting the service and while we were doing that the user may be connected to the pg pool too and pg pool too will be serving them without any disruption so this is this is a good way if you want to add a new node if you want to add a recovered node or something so next thing we'll let's discuss about backend health and failover so backend health check as the name suggests health check is a process which priority connects to the configuration configured postgres backends to detect any error on the server or network when the error is detected pg pool 2 performs either a failover or a degeneration depends on the backend configuration so we'll discuss like when the degeneration is performed and when the failover is performed so so these are the configuration parameter which we can use to tune the health checking so the first one is health check timeout and it configures the timeout and second for connecting to the backend postgres this is this is similar to the tcp ip connection timeout but but we want to we want to detect the error quickly so so this health check timeout sets the the time which over tcp connect all should be it before giving up so so this is this is health check timeout and then the health check period it is interval between health checks and if you set it to the zero then this will disable the health check again this that is not recommended and we'll discuss why it is not recommended so then the health check max reprise again if if this parameter sets if the health check fails how many number of tries we should perform before before marking is as a failure or as unknown is down so this is again an integer value and the sets setting it to zero mean the first first health check failure would mark the node as down and it will go in and perform the failover and then health check retry delay game it configures the number of seconds between doing the successive health check retries these are the three parameters we all we already discussed and work with the first one is health check database so you can set if you want the health check process to connect to a particular database and normally what we do is we create a database let's name it PG pool 2 and we don't put any data into that for just just in case if even somebody gets access to their database then then it should have no no real data so this is this is this this parameter sets the the database name which health check will try to connect to living living it empty will make it equal to try to connect to post this database first and in case of failure it it will try to connect to template one database and if that fails to then it will give up and say it's a failure then health check user this is the user that that should exist in the postgres server side and the health check password again this is the password for that user and you should use an encrypted password here okay okay so basically all these all these health check parameters we just described like health check time out health check period health check max retries all these parameters you can use these parameters with and without suffix node id suffix the parameter value if you use without suffix like you just write health check underscore timeout is equal to 10 that means that this is this is a global setting for health check timeout and each each health back end node will use the same same health check timeout but if you use a suffix then for example if you use health check underscore timeout and one this this will set the timeout value for only back end node one rest of the rest of the nodes will keep using the keep using the default values and the the the value where we use the suffix will override the global value for for that particular parameter so by doing this you can you can have you can use a different configuration for different back end nodes like for example if we want to have 10 second in between retries for for back end node one and five second between retries for back end node two because of any reason because of the network situation or because of the vicinity or so we can do that using the suffix at the end of the in that particular parameter health check parameter name so that's that's that's pretty much about configuring the health health check next let's let's come to the automatic failover part so automatic failover and this is a mechanism in pgpool to to to detect problematic nodes and and really configure the standby nodes to follow the view primary like for example if your if your system is working with one primary node or two standby nodes and your your your your primary node goes down so so the automatic failover mechanism of pgpool will will kick in and it will detach the problematic node if it happens to be primary then it will just remove the primary node from the system and reconfigures and first thing it will do is it will select one of the standby nodes to and make it the new primary node and then we configure all the remaining standby nodes so that they can follow follow the new newly promoted primary node so the automatic failover mechanism or the failover thing can be triggered in three ways the first thing is when the health check monitor in the node failure the health check we discussed previously if it detects the problem it will inform the pgpool tool which is not able to connect to the particular node and the pgpool tool to go on and and start the failover procedure on the failed node the set the second is if a reading writing failure to the poster that can happens again this is configurable there there is a parameter in pgpool conf that is failover on backend error if you set it to on only then the reading writing failure on a postgres backend will will cause the failover otherwise that error will be ignored I mean the connection that connection will be terminated on which the error has occurred by the pgpool will not perform the failover I recommend I personally I do not recommend to use this parameter and we'll discuss like why and but but you can also you can do that like if you want a failover to happen anytime if the reading writing fails on the backend you can do that and the third way like by a remote pgpool tool node if you are using washdog and everything is working fine on your side but somehow the other other pgpool tool nodes connected to the washdog they they they see some problem in the in the in connection or in in in any of the node they can inform the other node so that so then your your node will also perform to go on and perform the failover again we'll discuss that further in detail so this is this is just just this is just to show how the the failover works so this is the diagram shows that the setup we have like we have one pgpool node and one primary and two standby nodes connected to it and they all all their string the application is working between them so let's suppose that some problem happens with over primary and we are not able to reach it anymore so so what what what happened is one of the standby will get promoted to the new new primary node if you see that the last bottom one node which was standby node in the previous is now selected as a primary node and the postgres will the pgpool will make that third node as a new primary node and made the second standby or the the center one to follow follow the new primary node so this is this is all done by automatic failover mechanism and yeah similarly if if primary node just keeps on working but somehow the other the standby node fails then then no failover will happen if the failover mechanism will kick in but it will not alter any system configuration it will just go on and remove the problematic standby node from the cluster and then yeah your system will keep working without without disruption and now with two nodes previously we had three nodes so yeah the standby node failure is just just handled seamlessly okay so these these are some relative configuration parameters and first of all the backend node flags and these are set for and these can be set for each backend node and they can underscore flag and then the node id suffix and so if you set it to allow to failover that means that node if that node fails or if the health check on that node fails then perform the failover and you can also set it to disallow to failover when you set set the backend flag to disallow to failover then what happened is like PG pool 2 will try to even in case of failure PG pool will not perform the failover and it will keep trying to continue to connect to the to the problematic node and until that node comes back alive again the service will not like service will be stopped like you cannot connect to PG pool 2 so that is not that that is I'm like there for the historical reason I don't think so that is much useful so most of the time we'll be dealing with a lot to failover backend flags so so this is this is the other thing is failover on backend error as I discussed like I don't recommend using it when when this is set to on PG pool 2 consider the reading writing error on postgres backend connection as backend node failure and triggers a failover when this is set to off PG pool 1 will only report an error and disconnect the session in case of in a reading writing error so the reason we don't want to use this because this parameter is too aggressive and even a small glitch or even a very smallest of the problem can make the node to do for failover and eventually failover is a very expensive process and we don't want to be doing failover on the small glitches so health checking is the better way detecting errors and triggering the failover then there's there's another related configuration parameter that is called detach false primary and again this is this is this is pretty new one and I think it was added in 4.0 and before that we don't have that and what it does is automatically remove the false primary back end from the cluster and there can be situations when you can have multiple primary nodes because of some misconfiguration or some some other problems like so you don't want to have a split brain on the backend side like you don't want to have two different versions of data so this is very very useful when when detecting the false primary and remove it from the server so again it only works in the streaming application mode and the stop the detection process when pgpool 2 doesn't find any primary like if you don't have any primary node then there is no point looking for the false primary because at that point we don't have even a real primary so in case of multiple primary nodes and nodes stand by nodes in the cluster pgpool 2 will remove all the primary nodes because it doesn't make sense like there are primary nodes but there are no standby nodes attached to them and in case of multiple primary and multiple standby nodes pgpool 2 will find the primary node which the standby nodes are connected and for doing that you use the pgstatwall receiver view to try to figure out which primary node actually has the standby nodes connected with it and then remove the other one the the check is the check for the detached false primaries will perform every sr check period sr is for streaming application check period so we we piggyback this check along with checking the replication delay so if you disable that then you'll effectively you will also disable detached find false primary detection so this is the flow chart like but when the false primary will get detached so if you see if you have in the bottom left corner like we have two primary nodes primary one and primary two while primary one is have two standby nodes while primary two doesn't have one this is easy thing then pgpool 2 will remove the primary two from the system and detaches it but if you have only two standby and no primary then pgpool has nothing to do and if you have one primary then everything is good but if you have one primary connected to stand one standby and other primaries connected to another standby like they are two identical primary standby system then this is the case like pgpool can decide which one is the good primary and which one is the bad or the evil one so it just gives the warning message and do nothing like it and you you have to do it manually okay so so next next is the the other related configuration parameter which affects the automatic failover so auto failback again this is this new thing in the pgpool 2 we we don't have it in the older versions and what it does is like it enables automatic re-attaching of the failed standby nodes for example as we discussed like in case of a standby node failure that the problematic standby node is finally removed from the cluster without disrupting the system because we have a primary already working and the standby is already working so so in this in this case again it only works in the streaming replication mode and it is very helpful for the lousy and networks where the standby node gets detached because of some network glitch and how it how pgpool 2 detects if the standby node is healthy it use the pgstat replication on primary node to see it verify if the detached standby node is healthy and streaming replication is working on it and if pgpool finds out like the the streaming replication is perfectly working on it and somehow it is the prime the standby was detached from our system then it just go on and re-attach that as re-attach the standby again again this the the check is performed on sr underscore check period and if you disable a streaming replication check then this functionality will not be available so sr check like it checks the delay the streaming replication delay it it it performs autofailback and it performs the detached false primary so it is very very like it is the same thing is used for three main purposes and autofailback interval you can definitely configure the interval in seconds between execution of autofailback like for example you have a very lousy network and your standby node is coming and going every every other second so and you don't want to do that because you don't want to keep detaching attaching the uh uh node standby nodes so you can configure the autofailback interval like if uh if a failback has happened the next autofailback won't execute until the specified amount of time has passed after the previous failback this is helpful in preventing the frequent failover than free failback in in case of unstable networks i don't think so we still have such kind of unstable networks but but yeah you can you can configure this particular parameter if you want to then there is a search primary node timeout like it's configured the maximum amount of time in second to search for the primary node whenever you search the configure the pg pool 2 in streaming application mode at whenever the pg pool starts it first thing you do is it look look at for the for for the primary node and then if if it can find the primary it keep looking for it and this search primary node timeout variable like sets the timeout like after how many seconds you just give up and looking for the primary and just uh go on and without primary you can't do much with the pg pool to like because your right statements won't work but still yeah and at the time this search primary node timeout kick scene is after the failover whenever a failover happened for example your original primary node failed and then the failover is performed then uh then after performing the failover pg pool will again look for the for the primary node and again this time this timeout second setting will kick in if it's not able to find the primary node within this particular amount of time then it will give up looking for the primary node so yeah this is an important configuration parameter after that we have failover command follow master command and fail by commands these are the three three commands these are the basically the workhorse of the the the whole failover system the first one failover command it configures the command to run when the postgres backend node goes down or is gets detached and the the second one is the follow master command that configures the user command to run after the failover on primary node in case of a primary node failover so this is this is where we uh where we reconfigure the overstand by nodes to to to point to the new primary node and the last one is a failback command this is the command which gets run when if a new standby node gets attached to the to the pg pool to cluster we'll discuss each one so this is this is just a description of failover command the it is that it configures a user command to run when postgres backend node gets detached and you can pass the following back and like all the blow mention parameters to that command like person d person h person p and these are just the these are just information you want in the in the failover script to do particular calculations and similarly okay so so this is how how the failover works this is the mechanism of the failover so the first thing pg pool to does whenever whenever the failover process starts is it kills all the child process that this mean terminate all the active user sessions this only happens in case of a primary node failure in case of a in in case of a failover process started because of a standby node failure it will not kill all the child processes so we're discussing about the primary node failure if the primary node goes down the first thing pg pool to failover does is it it will terminate all the active user sessions then it will involve the user provided script configured in failover command this is the script we have configured in this failover command and then the on in the on the second step it will execute the failover command and then then this is the responsibility of the user provided command to promote one of the standby to the new primary and in case of a primary node failure failover and once that is done that's the failover command script is gets executed the next thing failover does is it it executes it executes the follow master command I think we'll discuss that later so okay so once the failover command is finished pg pool two will respond the child process and start accepting the new connections and as I told you since version 3.6 in case of a standby node failure pg pool will only kill those sessions that were actively using the fail node others user session remains uninterrupted before 3.6 we used to like we used to blindly kill all the all the child process whenever a failover kicks in and so that was that that was not a good thing like even in case of a standby we used to have a system disruption and like like every every session needs to reconnect with pg pool two so we have improved this particular thing in 3.6 so what we do in failover.sh so normally this is this is the normal we can we can definitely you can change it we have quite a lot of information present in the failover.sh we can deploy some other other mechanism to look for the primary or for anything but in the normal working what we do is we pass the master node id as argument that is that can be passed using the percent m to the failover script and then that master node id basically master node this master node id is a little different from the primary node id primary node is the one that is the streaming replication primary node it can can can be back and zero back and one back and two back and three it doesn't matter basically the primary node when we refer to the primary node it is it is the node that is the master or the primary node which is serving the other standby nodes on the postgres side but when we talk about master node id it is the smallest node amongst the alive nodes for example suppose we have a three node cluster node zero is a primary node and somehow it fails so pgpool2 will set the next the the master node id to the one is equal to one provided node one is alive and if the node one is also down then it will set to the next node id which is two but if all the nodes are down then master node id is set to minus one so the whole purpose of the master node id thing is like it it gives the id number of the node the smallest id number of the node that is alive in the system if the master node id comes out to be minus one this means that there is no alive postgres node currently available for pgpool2 to work on so this is this is just this is just the smallest node id and what we do normally is we promote the which are the smallest node id to the become a new primary of course we can have different we can we can deploy different mechanisms or some some other way to to identify which node should become a new a new primary but currently what we do in the failover.sh is we just promote the the standby node that is pointed by the master node id to to become a new primary which might not be a very good thing but this is the simplest thing to do but yeah you can do anything in the failover.sh like it's okay it's open so so the what we do is like quickly bail out if the past in master node id is less than 0 that mean there's no alive nodes we have nothing much to do in the failover.sh so so just exit the script so but if the master node id is some valid number next we check if the fail node was a primary node or a standby node and we can easily do that by comparing the fail node id that is passed into the failover script with the old primary node id if the both node fail node id and the old primary node id are same this mean the fail node was the primary node and we need to create we need to promote one of the standbys to become a new primary node because the old primary is gone so for that if we identified that the fail node was a primary node first thing you do is we drop the replication slot and from the primary node and we can do that just like pg drop replication slot and if the fail node was sorry in case of the standby node failure we just drop this drop the replication slot from the primary because because we know that that standby is not available but in case of the in case of the primary node failure we we use pg underscore ctl minus w with promote to promote the the standby node to the new primary so this is this is all we need to do in the way either promote and standby to the to the new to become a new primary node or in case of a standby node failure just drop the associated replication slot from from the master node these are just these are the two things the failover dot s h needs to do so yeah this is this is i've just listed the the basic example of the failover dot s h if you see in the start we just log like if the master node id is less than zero this means all nodes are down so skipping the failover next thing you do is we and we identify we try to look at if the if the fail node was a primary node or was a standby node and in case of a standby node failure we are just dropping the replication slot by calling flag pg underscore drop replication slot and otherwise we are promoting the new we are promoting the the node pointed by new master node id to the to to become a new primary node so so once the failover is done we will have we will have either a new primary node or we will have a standby node that fail standby node removed from the cluster the next thing is the follow master command so okay so just in case when the primary primary node goes down what we did in the failover dot s h was we promoted one of the standby to become a new primary now there's another thing which needs to be done and that is we need to we we need to do like for example we if we have three nodes cluster and one primary and two standby nodes and somehow the primary node goes down now what we what we did in the failover dot s h was we promoted standby one to become a new primary now there is one more step needs to be done that is we need to reconfigure the standby tool so that it's primary it should point to the then over new primary node and it should not keep waiting for the for the data from the old primary which which already is failed so this this this step is done in the follow master command and again just like just like failover dot s h you can pass plenty of information in the follow master dot script with these with these options like person d person d person d person d so I've just written everything like what this particular parameter mean and you can just look up in the in the pg pool documentation to see if you want to pass particular information then which parameter you can use so what we do in failover dot s h as follow master command is used to recover the slave from the new primary as I just told you so the recommended practice practice is to first check if the standby node is running or not so we can do it by using pg ctl minus w minus f so yeah and uh just to make sure if if somehow we we had a network problem and the failover is caused by some other issue and we must stop the the old primary node so that that's first thing we do then we create a new replication slot in the new primary and that is again we use a pg create a physical replication slot and the next thing is a bit trickier trickier part now we have to make sure that the standby node should be connected to the to the new new primary node and we can use pg underscore rewind utility for doing that and if somehow the other pg rewind fails then what we normally do is we delete the old data directory and use the pg base backup to take a new new backup from the to the take the backup from the new primary node and when then we attach that particular backup through as a standby to the to the new newly promoted primary node so this is these are the steps like we normally do with in the follow master dot s h so examples of the follow master and failover dot s h you can find these examples on the pg pool two sides sites i've just pasted the link for that and these scripts are also part of the docker environment i've i've made them a little modification from the original scripts which are present on the website and did some some enhancements so so you can you can also try the scripts which are part of the docker setups hosted on the github account so the next thing is let's let's configure the failover in pg pool two step by step so because the failover is a very core component of the pg pool so i want to i want to emphasis on like what we need to do so the first thing is we need to edit the failover dot s h script and create a new script if you want to because by default it's not present there so we just go on and paste the sample failover dot s h and we can download it from the pg pool two dot net official site and just paste it there and then edit the follow master dot s h script or create one and then again paste the example from the same site for the follow master dot s h and set the execution permission for both of the scripts and you can just do it using csh mode plus x and that's it and then the next thing is we need to we need to put the path in the for these two scripts in the pg pool dot count file and for that for that we just set the follow master follow under failover underscore command and for now we are just passing in every possible argument in that script and like if you can see slash ctc pg pool to failover dot s h then personally person edge person edge for for non we are just we are just passing it each and every argument that can be passed in the script although we might not be using everyone every single one of them but but it's good to have it configure we have it configured this way and similarly we'll do the same for the follow master command and we'll just point the follow master command configuration parameter to point to the follow master script and we'll just pass in every every every possible argument in that front and yeah we are ready to go so once we have done that's that you can see in the video like let's let's try it out like if if the failover is working first just connect to the pg pool to and see do show pool underscore nodes and see if we are connected to three nodes next we'll just go on and stop the the master node so for that we'll log into the docker that is hosting our our master postgres node and we'll just do the service stop so service postgres then stop and this will stop the service so hopefully the and the health check of the pg pool to kicks in and it detects the primary node is not available at the moment so it will do perform the failover and it should pick up another node to become an understand my note to become a primary node so let's look into the in the logs like we can see that child process with pid exited with successful and worker process and for commuter process this is the kind of log that appears when the when the failover happens like it disconnects all the time so you see like pg slave the post the the failover has picked up the node number no number one that was pg slave and made it a new primary node so yeah this is this is how the failover works like whenever the problem or the primary node goes down the next node the failover picks up the next node and make it master so this this that that was it for the failover and the configuration and the next thing is watchdog and well let's let's see what what is watchdog so if what we have done so far if you if you consider that like what we have done is like we have installed multiple pg pool multiple postgres nodes on front of pg pool two and effectively not only we distributed the load amongst the primary and the standby we also achieved a high availability kind of a high availability like if our primary node goes down then one of the standby becomes a new primary node and effectively we have a very very minimal disruption in the system like normally the failover process takes around a couple of seconds or maybe four or five seconds depend upon the data and so even in case of primary node failure all you have is a five minute of a few seconds of disruption in the system and and then everything starts working again like like nothing has happened so so this this this is the the automatic failover mechanism has has provided us the high availability on the database sites so database site and but but if if you consider consider the the whole cluster we need to consider like what will happen if over pg pool itself goes down because if you see we have three postgres nodes if one fails another one will take over but we have only single pg pool node and if if that node goes down over our service will stop working so effectively pg pool two is a single point of failure in the whole setup and now we want to solve this one we want to resolve this single point of failure so how we do that basically for doing that we install multiple pg pool two nodes like in the example in this case we instead of single pg pool two node we use two pg pool nodes and we can use more than two but for for this example we are using two nodes and these two nodes are connected to the washdog washdog is the built-in feature of the pg pool two we'll discuss in detail but then then instead of using the using the the hard core ip or the ip address of the pg pool two node we use a virtual ip virtual ip address and we use the virtual ip to connect with the pg pool two so we'll discuss the virtual ip in detail yeah so yeah okay so now let's let's see like what happened if the primary pg pool node goes down it becomes available the other the second pg pool two or the other pg pool nodes in the system will will detect that failure and they will acquire the virtual ip onto themselves so the the clients will see no difference like client was using the the same virtual ip to connect to the pg pool two one pg pool node one and now once that node fails the the same virtual ip is now acquired by the other other pg pool node and the the clients will see no difference that service will keep working only the current sessions will get disconnected and this that this whole process will take less than a second and and you will see no disruption so so this is all this this is all done by by the washdog process of the pg pool two so what is the washdog water these are some process of pg pool two to handle the failures of pg pool two node so what it does is it does the live checking of other of the pg pool services and all the all the pg pool nodes in the cluster we call it the cluster where we have multiple pg pool two nodes so they do a mutual monitoring of each other and to detect the failure and they at the startup or in case of a failure they do a leader election to select the master primary node so we the washdog use the bullying algorithm to select the select the new active or new master node so they do the do the election leader election and there are plenty of parameters they considered during the election like which node should be considered the best possible candidate to become a new leader and yeah this is done by washdog another thing washdog do is virtual ip control virtual ip control is is the one like where we switch the virtual ip from like this first system to the second one this is the virtual ip control mechanism of the washdog and yeah it also ensures the same view of postgres background on across all the nodes because we don't want like for example if if the pg pool one was working and it has it has its own view of postgres node for example in this case it has a postgres node one as a primary and other two as a standby so what we we don't want it like when the virtual ip is shifted to the other one it should not have a different view of the system like it should also have these same three nodes with the same three settings with the same health checking mechanism so that the clients connecting to the to the pg pool two to the postgres through the pg pool two should not feel or see any difference so any any intermediate setting changes that that happens between the failure and and before the star after the start of the service they should get replicated on all the all all the other nodes so this is again the responsibility of of the washdog module that it should make sure that all the pg pool in the in the cluster should see the same back end view and and finally the distributed failover management whenever a failover happens like all the nodes should coordinate with each other about the failure of the node and then all the nodes should coordinate to make sure they all select the the exact same node as a new primary we don't want that if the if the failover happens like the pg pool one selects node id one as a new primary while while another pg pool two node selects some other node maybe node number two as a primary so that will cause a split brain on the database side on the server side so we don't want that so so this is the responsibility of the washdog module to to coordinate and do a failover in such a way like every every node agrees on a same failover and perform a same steps so that so distributed we call it distributed failover management it is also just one capability of the washdog so if you look at look at the washdog and what are the logical components so what one thing washdog does is the life checking the the life checking module of the washdog is responsible for monitoring the health check of other pg pool two nodes in the washdog cluster so we can do it that life checking using two method one is curie and one is hard bit in the curie mode basically what but washdog it does is like it sends the sequel curie which is normally select select one that's it the smallest curie possible to to the to the pg pool to which it needs to monitor and if the curie was successful and then it says that the node is healthy otherwise it marks in case of failure and mark the node that's down so this is this is a curie base health checking and other is hard bit and hard bit is a more more advanced way like it has its own channel it includes the udp sockets and sends a periodic hard bit signal over the udp sockets and again receive the same from the other node and whenever it the hard bit misses and and you can configure the hard bit like how may the interval between the hard bit and everything and it can detect failure by the missing hard bit from a particular node so here using the hard bit is a is a is a better way for the health checking because the curie mode consumes one additional back end connection and one one additional child pg pool child child process which which will be used to process that particular curie so it's not expanded the better way is to use hard bit but of course you can use curie if you if you don't want udp packets to be flowing on your system or yeah so another thing life checking and does is it can check the upstream connectivity and by thinking some known server for example your service is like we are making some service which is available on the internet and so even though over local network would be working fine but we don't want like it gets disconnected with the from the the internet so so what we can do is we can we can set up the upstream server or we can configure the upstream server connectivity by by configuring a ping to some known server for example google.com like we can configure ping google.com after every 10 seconds and if the ping result is successful then keep keep continuing the service otherwise disconnects like we can have similar kind of requirement especially in the case when they're like if the cluster consists of multiple availability zones for example if we are running the cluster on amazon and one pg pool is on west coast another one is on the east coast so so sometimes we we want to use the upstream connectivity check just to make sure the overhaul data center is not isolated at the moment so the next thing is leader election module that is again in the part of the watchdog and it is responsible for selecting the best candidate to become active master pg pool 2 node and this happened the the leader elections happens at the startup time and also whenever the the existing primary node or the active pg pool 2 node fails so it ensures the existence of the quorum and it guards against the split brain syndrome so yeah the quorum i will i i'll try to explain the the quorum thing in the coming slides i guess the other thing is delegate ip controller and this is the one which is responsible for acquiring and releasing the virtual ip from the pg pool 2 node depend upon the current watchdog status of the node for example if the watchdog node was primary and for some reason it has to resign from its primary node status then this ip delegate ip controller will release the ip from from the the system and then then whichever node becomes the new master or new active pg pool 2 node if the delegate ip controller on that pg pool 2 will acquire the ip and this is this is like from look of it it looks a very simple thing but we need to make sure that we don't have a same ip address acquired by multiple system at any time because this will call the this will make the network go haywire so yeah this this this delegate ip controller module make sure it's like like we release and acquire virtual ip and effectively and again the last thing is that the distributed failover management this is what i've already explained like this what it does is it synchronize the back end node status among all the participating nodes and coordinate the consensus for the back end node failure whenever one if for example if you have three pg pool nodes in a watchdog cluster and only one of the node sees the failure of a particular back end postgres node then instead of doing a failover even if it's even it's when it's health check mechanism is informing the pg pool to that this this node is not reachable so in in the watchdog mode you can configure the pg pool to like whenever you see the failure even whenever you detect the failure you just don't do the failover by yourself you must coordinate with other other nodes and if all the nodes or if the majority of the nodes agree about that that failure only then perform the failover and otherwise just make the node current time and we'll discuss the current time nodes so this is the concept of the watchdog we install multiple pg pool two nodes as a best practice it is recommended to use odd number of total nodes and minimum three nodes and make all all nodes aware of each other like we configure a watchdog nodes for like in pg pool dot pg pool dot conf we configure all the watchdog nodes like if if the clusters had three nodes then we need to set the settings of all the other nodes in each pg pool conf file we'll do that in just a little bit then the active master node acquires the delegate virtual ip address all the client uses that virtual ip to connect to the pg pool service and all pg pool two nodes part of the watchdog cluster keep health checking the node and as soon as the failure of active master node is detected by the cluster that the whole cluster goes into the election mode and then the cluster selects a new master active pg pool two node and the new active node requires the virtual ip and as i told you the client applications keep on connecting to the same ip address and hardly notice a problem so let's do a step by step or a walkthrough of configuring a pg pool two cluster with a with watchdog so we'll we'll use we'll create three node watchdog cluster for that we'll use three host server one host server two host server three host and we have given pg pool underscore server one pg pool underscore server two and similarly pg pool underscore server three to the respective hosts these are the host names so first thing we need to do is we we need to enable the watchdog on on all the servers these are the the the settings we need to do on each each separate node so normally what we do is we create a pg pool two con pg pool dot com file we do the settings and then we replicate that particular configuration file on all all the nodes we want to create the cluster of so these these are the setting we need to do on all that thing and these are setting that will stay same on all the nodes so first thing is use underscore watchdog is equal to on this will enable the watchdog and the second thing is set virtual ip on on all three servers and for virtual ip the configuration parameter name name is delegate underscore ip and we can use whatever ip address we want for this example i'm using 172.2.0.0.99 so this is the ip address which the client's the client application will use to connect to the service so put in the same delegate ip address on all the nodes and then we have to configure three commands if up command if down command and arping command these are these are just three commands that that are required to bring up and bring down the virtual virtual ip so normally in the linux and the linux we use ip adder add and ip adder del commands to to assign the virtual ip and depending upon the system configuration then you need to specify the label or the device which on which you want to bring on the virtual ip so you you normally these configurations stay same for all the systems but you can of course you can use a different different if up and if down commands if required and the arping command is just just used after bringing up or bringing down the virtual ip just to announce like if it announced in the network like i have i have acquired the ip or i've released the ip so so that every node have have the latest arp they will update it after that okay so the next thing is if command path and arping command path these needs to be set by default they are set to the right location in the pg pollute gone but but if if your flavor of the linux or if for some security reason like you have placed these these binaries in some different directory then you can specify these commands command path using this configuration parameter and then you need to set the wd port this is nine thousand is a default boss dog port and you can change it to do any value you want but you need to make sure like that wd port of particular system should be configured properly on each system so this is this is what you need to do on all settings up till now up to this wd port needs to be same on all all all the nodes in on the virtual either is three node cluster four or five so these these setting needs to be same on all the all the nodes next we'll do the settings which which are different on each node so first let's let's configure over pg pool underscore server one so we will set the wd hostname to pg pool underscore server one this is the hostname of the machine that that that pg pool two will be working on for this particular pg pool two node will be running and then after that we need to set the ip addresses or host names of other two nodes remember we are using three node clusters so so we need to set the other pg pool hostname zero and other pg pool hostname one and similarly heartbeat destination because we are using the heartbeat mechanism for the health checking so we are also setting heartbeat destination zero and we set the server name of other two servers in zero and one so for zero we are setting pg pool server two and for hostname one we are setting pg pool server three similarly for server two we'll set the hostname as pg pool server two and now for hostname zero we are set we are setting host server one as hostname zero and server three so this is this is just just just a configuration like each server have the needs to be needs the configuration of other two servers so for the self we will use the wd hostname for example for pg pool server three we'll use wd hostname as pg pool server three because it is running on the pg pool underscore server three host and then we need to do the setting for server one and server two and we will use the other pg pool hostname zero and other pg pool hostname one for that i guess well this is like i i hope you you guys understand like so basically i have created a docker environment with the exact configuration then you can also download and have a look at the configuration and download that from the github site so so these are few few key concepts of the watchdog system existence of the quorum like we can say the quorum exists in the watchdog cluster if the number of live watchdog node that is number of pg pool two nodes in the is greater than the total number of configured nodes for example if the total configured node watchdog nodes are five then for the existence of quorum we must have at least three or minimum three alive nodes three or more alive nodes and in case of in case of the three node cluster we have just created we must need at least two alive nodes then or only then our quorum is complete and for even number of nodes though it is not recommended but if somehow the other you want to use the even number of watchdog nodes we regard the quorum exist when the half of the nodes are alive for example if you deploy only two node cluster then even if the one node is alive that that mean the quorum exists in the system for the four node cluster minimum two nodes would be required but but that they're using a even number of watchdog nodes will it will be prone to split brains and ROM so so it is not recommended the next the next thing is current sorry quarantine backend nodes so the quarantine backend nodes are same as detached nodes except that when the the these the node goes into the quarantine state there's no determination performed on it for example in case of a fail fail standby node the failover mechanism kicks in but in case of when the node goes into the quarantine state there's no fillable mechanism is performed and unlike on the failed nodes PG pool 2 continues the health check on quarantine nodes and as soon as the quarantine node become reachable again PG pool 2 will silently add back the node to the like remove the node from the quarantine state and make it alive and healthy so another important part is like if if a primary database node goes into the quarantine state on the active or a master watchdog node then that master watchdog node resigns from being a master or free from being an active watchdog and let some other PG pool to to become a new master because because master node or a primary node is a really important for the continuation of the service without primary node you cannot you cannot serve the right statements so as soon as the primary watchdog node the as the master watchdog node that is the that is the one that has the virtual IP acquired it sees like the primary node on it on its PG pool 2 becomes quarantine is just resigns from being a master and let some other one and in this in that particular election which happens because of its resignation it tries and downwards itself so that it should not become the master again and let some other node becomes a master so yeah this is this is very important thing to know and how the node goes into the quarantine well it's simple like for example if you have a consensus enabled for the watchdog for the back-end node failure and somehow the other the one of the node thinks that the back-end node is not reachable but that back-end node is reachable by all other all other PG pool 2 nodes so what happens is the the the the the PG pool 2 node which on which the it is on which the particular postgres node is marked as failed it asks other nodes just to build a consensus build a consensus on the failure and if all other nodes board against that consensus or minimum like it does does not get enough board for the failure and then it marks that node as quarantine because it because this node is not reachable from this node but other other PG pool 2 are not aggling on the on the failure so that node is temporarily marked as quarantine and until until it becomes healthy or reachable again so yeah this this is a bit a little twisty to explain but this is this is how the quarantine the node gets into the quarantine and and whenever the primary node gets quarantine on the active voidog node that voidog node designs from being active node or a master node so next thing is like how we can secure the watchdog communication wd auth wd underscore auth key parameter like configures authentication key which can be used for all the voidog related communications and yeah because you don't want any any any client just to give just to send a signal on the voidog and and and make the voidog like think some something has happened like for example without wd auth key i can i can create a client which sends exact replicates exact packet which which which a node sends in case of a back and old failure so so to god against such kind of such kind of intruions or problems like you can configure wd auth key and then all the communications voidog communication will will use that key and like will encrypt the message using that wd auth key and then then if if some message arrives which is not encrypted and wd auth key is configured or encrypted with the wrong key then that that that message will be rejected by the watchdog so effectively this wd auth key should be should be same on all all the nodes and also the packets like the heartbeat package they they also use the same wd auth key if configured and they they they are also encrypted using the same key so yeah yeah this is simple just it's just a key that that is used for the for encrypting the watchdog communication okay so so upstream server connections so as i told you like the you can configure trusted servers and you can you assign a comma separated list of servers to check upstream connectivity PG pool 2 will send a periodic ping to each each of the server in the list and if if none of the server from the list respond back and then PG pool 2 will consider itself as isolated from the network even and commits a suicide and this means the PG pool node will shut down itself so yeah this this is for a particular kind of workload where we are as i told you like there we need to have some configuration where where our PG pool 2 nodes are installed on different availability zones and we need to make sure like the availability zone or the whole data center is connected from the external world in that kind of scenario we use trusted servers configuration and ping path this is this is the the ping utility path which is used by trusted server configuration to do the ping like by default the ping you can you can change if you want to have a different location for the ping path you can use this this configuration to change there okay behavior on escalation and de-escalation so before before discussing the behavior like escalation is the process when a PG pool 2 watchdog node become master coordinator active whatever you want to call it so whenever our no PG pool 2 nodes become master node or active node this process is called escalation and that whenever the the master nodes resigns from being a master then this process is called de-escalation in terms of PG pool 2 so we can configure what what should happen in case when when the escalation and de-escalation happen so the first thing is clear memcache on escalation so you can configure caching on PG pool 2 like PG pool can cache the lead queries and then serve serve when the same query comes in and from its own cache without going to the back end so so we can configure if in case of escalation and escalation if you want to clear the memcache so this is a boolean parameter we can do that then w de-escalation command and w de-escalation commands these are optional scripts which we can provide and for example w de-escalation command if you provide a script in w de-escalation command then this script will be executed before bringing up the virtual IP from the active node and this is very important like you can send in the email from there if you want because if you want to notify the admin like something has happened because now a new Bosch node is becoming active this means the previous one has gone down so you can configure to send IP emails to the administrator or another thing like for example if you're using some cloud environment for example amazon cloud and you and you don't want there is no like virtual IP kind of system exists there and they have their own mechanism like elastic IP and instead of virtual IP you want to use elastic IP as a floating IP which should get transferred from one system to another and all the clients use you get that elastic IP to all the clients so that they can use to connect that IP they can use that IP to connect to the service so what you can do is you can script that acquiring and releasing of elastic IP using these w de-escalation command and w de-escalation command scripts in that case you will what will you do is you leave the virtual IP as empty and instead you will write two scripts for escalation and de-escalation and in the escalation script you will call the amazon's web service whatever they gave you like for acquiring a new for acquiring a particular elastic IP and you you'll execute that that system command or that that API and in w de-escalation you'll use another API which your cloud provider may provide for releasing that virtual IP so these these two these two escalation these are although these are not mandatory and everything works fine if you are using the relegate IP but but you can do lots of things using these two optional scripts and you can call it hooks or whatever so these these are the two two things so the the escalation command it executed before before bringing up the virtual IP and similarly the de-escalation command this command is executed before bringing down the virtual IP so yeah you can use it for plenty of purposes okay the next thing is backend failover consensus i've touched this topic a little bit previously so as you know failover is a very expensive procedure and it can cause a loss of running transaction and it and it can make the current session disconnect for some time and then so it is it is it is really really really very important to make sure that we only do failover when it's absolutely necessary so for that purpose uh pgpool 2 node part of the vod doc cluster like if you if you have configured the pgpool 2 in the vod doc then the pgpool 2 can nodes can coordinate with each other to vet the backend node failure before proceeding through the generation of failover so this is this is very helpful to guard against false alarms and temporary network glitches and yeah so so you effectively this can this can save lots of lots of valuable data and valuable time again this is this is kind of a new feature in pgpool 2 and i think 4.8 was first introduced in 4.0 and i've authored this and before that i used to get so many complaints from uh from on the community like uh the the the small there was small network glitch or there was a small cpu glitch and or that caused the failover so after after adding this failover backend failover consensus functionality like literally there there's no no more similar complaints like if like because now you have a you have a way to to guard against some such network glitches so uh pgpool provides uh few parameters uh for controlling the failover behavior and these are failover when quorum exists this is this is the one like only do the failover when the the quorum is present in the cluster for example if you have configured the postgres the washdog with with three nodes then and at that particular time there's only one node alive in the one washdog was alive then no matter what happens then if you said the failover when quorum exists to on then failover will not happen and the node will only go the fail node will go to the quarantine state but then and the system will wait for the quorum and whenever the quorum become available again only then the failover can be performed so the second thing is failover requires consensus so when enabled pgpool 2 will only perform failover degeneration when the washdog the first thing is in the washdog cluster holds the quorum and the second thing is after getting the minimum vote equivalent to the equivalent to the number of nodes required for the quorum for example for the five node cluster if we get minimum three votes in favor of failover only then we will perform the failover for example if we have five node cluster and two nodes say yeah we we can see the failure of the node and three node other three nodes says like no we are we are happy with how things are and we are able to reach the node the particular backend then the failover will not be performed the the problematic node will get quarantine on two of those nodes which were on the favor of the failover and the rest of the three nodes will see no difference and if it was a primary node and the quarantine node was the primary node and it was the failure was seen from the active pgpool 2 node then that node will resign from the master node and one of the node which sees that everything is working fine will become a new master so this this make thing the whole thing very smooth and very good so in the last parameter which controls it is allow multiple failover requests from a node again for if if for some reason or other you want like if a particular node sees the failure multiple times multiple times then it can it can cast multiple votes for the failure without this this particular parameter enabled then we need in a five node cluster we need three in three votes in favor of failover from three different nodes to proceed for failover but if you set allow multiple failover requests from node then you would then if the quorum is present and you get three nodes three votes in favor of failover even from the same node then the failover will be performed so you might not be using this particular configuration very often but this again this is for for some scenarios where where you you want to make sure like if if a node is continuously not not responsive on something on some particular pgpool you you want to you want to bring down that node and repair it or do some is manual action so so it can be used there but but this is this is not a normal configuration parameter which you set set it to on on on every other installation okay so enable consensus with half words so basically this sets the algorithm for computation of majority rule majority rule is used for quorum and consensus calculation and but but when enabled a single node can cast multiple this is what we have so so what happens is if you have if you have or even number of nodes so this enable consensus with half words if set to on this mean then only when we get the votes then this that would mean only 50% of nodes are enough for completing the quorum or for deciding the majority but if you set this to off then then this mean the half plus one votes are required this this only happens like this only become effective when when we when we have even number of nodes but again it is not recommended to use even number of node and so so it's better like the recommended way is to use odd number of nodes and because then we can decide the clear majority every time so this is this is where I list all the things like best practices we need to follow the first one is we use odd number of PG pool two nodes minimum three and the other thing is enable failover failover consensus with backend health checking yeah and this is very important like the failover consensus does not work properly if the backend health check is not enabled on on one of the node or on all the nodes because because if you disable the health checking then then the it is possible then that the even in case of failure the failure is not detected by one of the node and it so it will not cast the vote in favor or in favor of the failover and eventually the genuine failure will will get an unnoticed so whenever you set the failover on consensus to on always use backend health checking and another important point is always set the similar configuration for health checking on all PG pool two nodes so we don't want that like if one node is using 10 second between health check retries and other is using 100 second then again the consensus will not work because the consensus mechanism fail automatically fails after waiting for some time like if it doesn't get the like no vote is if some some node doesn't give a vote that means that this that is a negative vote so so the whole algorithm expects like every node uses a similar a similar amount of time between the health check interval so so this is this is another important point like we always always use the similar configurations for health checking and then wd priority priority sets the node priority and make it more likely to become a active or a master or a coordinator watchdog node so we can manipulate this wd priority priority on different nodes to make sure like to make like which node should become a master and in case of that and that the current master failure which node should should be more likely to become a next master so you can you can use this wd priority parameter to to to make sure like like this this doesn't give wd a higher value of wd priority doesn't give 100 percent guarantee that the node with the highest priority will get get selected as a master because the the algorithm which lacks the master node it it also takes care of so many other things like number of coordinate nodes number of active connections number of failed attempts so but wd priority higher wd priority gives a better chance of winning an election for the master node so you can you can change this parameter as far as you're liking like you can set the priority of the systems like which one should become the master first or in case of the failure which which one should follow so and and similarly I don't recommend using upstream connect connectivity unless it is absolutely necessary like I've seen so many so many people complaining like their pg pool to node shuts down itself because they they use the wrong upstream connectivity settings so it is like everything will work fine everything works fine without upstream connectivity unless unless we have a scenario where as we discussed like where we want to make sure like the whole data center or whole availability zone is always connected to another availability zone other than that never use upstream connectivity it it causes more problems than it solves so in in normal circumstances so this is this is kind of a best practice for using the washdog this is a typical deployment of a washdog system we have application servers which uses virtual ip to connect to the washdog to the pg pool to and in this all the pg pool to connect pg pool to washdog processes they connect to each other and do information sharing and mutual monitoring and then then all the pg pool to part of the cluster connects to the same backends using the unique ip of the or the host name of the backend servers so this is this is just a the simple visual representation of the typical system typical deployment okay i think i have to speed up things because we are running out of time so next is the pcp interface so what is pcp pcp is that administrative interface of pg pool 2 for performing various management tasks pcp stands for pg pool control protocol and for that pg pool 2 exposes a socket interface and you can monitor and perform various tasks using on that interface so for configuring the pcp you need a pcp.con file and by default pgp will expect that pcp.con file present in the same directory where the other configuration files are present like pg pool.con and you can override that configuration like the config file part using minus capital f in the startup in the command line option and the pcp.con although from the name it it seems like it is a it is a configuration file but actually it you it has only username and password for the for the pcp connections like you can set username colon md5 in pgp password in the pc in pcp.con file and this has nothing to do with the database username and password and the these user doesn't need to be present in postgres backend server this is only this this particular username and password is only used for pcp connections and and yeah you can use the pg md5 utility which is which is already present which comes with the pg pool to to create md5 password and you just copy paste that the password it gives in the pcp.con file against the username and you can use that username password to connect to the pcp so this is this is how we can configure the pcp just use pg md5 and my password or whatever password you want to require and it will give you the md5 hash for that particular password then just copy that password and open the etcpcp.con file and paste that password in front of the username after the colon for for example in this case i'm using pcp in the score user as a username and then my password as the password and and that's why paste it the my passwords md5 hash in front of the pcp username and yeah we are good to go so this is this is what we need other configuration we need to do for that for example pcp listen address we can set it all and we can set it like empty string if you want to only allow unix domain or connections on pcp and this is by the way the recommended way like we don't normally we don't want to expose the pcp interface to the to the external word so the normal way is to only allow unix domain connections on the pcp interface but of course you can use tcp and all all other connections interface then there's a pcp port and this pcp socket directory port is a port number which is a tcp port number in case of a unix socket unit socket port number and the pcp socket directory is for the socket directory directory where the unix socket file will be placed so you can configure any socket and directory where and then then you need to provide the same directory to the to the pcp utility to connect to the pc pool to using the pcp interface so these are just trivial things and and so once we have done these expressions you can you can just try a pcp node info command to make sure that if everything is working fine so here i've used pcp underscore node info minus u pcp user that is the username i've put in the pcp.conf file and zero this this mean give me the node information of no number back and no number no number zero so it digitally will print out all the information for that particular node and in case of any problem it will give give the or a bit the error so you can just try executing the pcp command to verify if the pcp is set properly in the pg pool config configuration so another thing is pcp pass file password file and i think you got most of you are already familiar with using the pcql pass files like you can you can use a dot pg psql pass file in place in the home directory and then pcql uses that that file to for password less connections to the database so same so this this kind of things are very important if you want to script something so so pcp interface also exposes that same thing like you can create a dot pcp pass file in a user's home directory or if you want to place it on some on some other location you can export the pcp pass file environment variable and you can specify the pcp pass file path there so if you if you create that pass file then then pcp utilities will use that password file to get the password for particular user and then they will not prompt for for the password this is very useful for scripting the pcp commands and similar to the psql pass file the the the pcp pass file format is same host name colon port colon username colon password so you can start use star in first three fields like star colon star colon star then password this mean for any host name for any port for any username use this particular password and like you can you can set use it to specify the password for specified user for a specified home host name as well and permission this should be 0600 must disallow any access to the world otherwise the the utilities will not recognize the pcp pass file and and ignore that so yeah this is this is very important if you want to script pcp utilities so these are the utilities which comes with pg pool and and these currently there are 11 utilities and maybe we'll have more in the future so i'll just quickly go through each one of them the no pcp node info it just displays the information on the given backup node id so if if you want do not want to connect using a sql client you just you just use pcp node info and you'll get the the exact status of the node so it gives the result give it gives a host name port number status load balance weight status so you can just read it out from the documentation the next one is pcp attach node well this is this is kind of an important utility you can attach a backup node using this utility and i think in the first example we did use that utility after for manually attaching the attaching the standby to the postgres and like what you do is you if the node is somehow detached or failed because of some problem or you want to add a new node so you just configure that the settings in in the in the pcp pg pool.con file and eventually you use this pcp attach node utility and that that that will make that node recognized as a part of the backup node cluster in the pg pool too so yeah and yeah fail failback command gets executed before the actual attaching of the node is performed by the pg pool too similarly like attach node this is a pcp detach node and it it detaches the given node from the pg pool too so you you can also provide a great minus minus gracefully option that makes the pg pool to wait until all the clients gets disconnected from that particular node again this is very helpful utility and this is effectively like you can use this if you want to do i mean maintenance on some on particular standby for example then you can use this utility to detach that from the from the pg pool too so that it is no no more used by pg pool too for load balancing and stuff and then you can use whatever maintenance you need to do and then you can use pcp attach node to reattach back the same node and if you do if you call the pcp detach node on the primary node that will call the failover for example if you detach the prime the working primary node then pg pool will not ask any question then it will just go on detach that node and if you attach that primary node that that will call the failover and the node the cluster have to select a new primary node somehow either so it is not recommended to detach the primary unless like you really want to then then this is pcp promote node and again if if everything is working fine and for maintenance or something you you you want the old primary you want to take off the old primary node and you want some standby to become a new primary this is a manual manual way to do a failover you can say rather than so you you execute a pcp promote node and give a node id then then particular node will become then pg pool too will execute a failover and that particular node will become a new primary node and the old primary will get like will get connected as a standby so so this is again this this can be used for the maintenance purpose and but but we are very careful when you and when you want to use this command then this is pcp recovery node we have already did a test of recovering this is this is for automatically attaching a new node or new node to the cluster we have already performed an online recovery test earlier in the example so i think we have a pretty good idea then these are pcp node count and pcp prop count both these are just just they just give the different kind of information information node count just just prints the number of nodes currently back end nodes currently connected with the pg pool 2 and pcp prop count displays the list of pg pool child process with the id's and like you can see which which child process is serving which clients and status like this then this is a pool status and displays the configure parameter value as defined like it effectively it prints out the current pg pool dot com file values so so so if you want to see which which values actually effective you can use this pc a pcp pool status and you'll get the configuration values of all configure parameters then then this is prog info and yeah this is just just gives the information about all child processes and again this this this one is a useful useful utility you can see which like connected database name connected user name for each single process process start time up connect connection created time time and protocol version connection reuse counter like this is for so this is quite a useful information you can get using pcp prog info similarly pcp watchdog info it displays the watchdog status of all the nodes for and it gives the idea like if you have three node cluster it will tell you which which particular watchdog node is current current master and if the virtual ip is assigned on it or not and then priority and everything like that okay then pcp stop pg pool 2 as the name suggests it just terminate the pg pool 2 process it's another way of stopping the pg pool 2 if you want to like it is useful if you want to script it for example in certain case if you script detect something you want to stop the pg pool 2 then you can use this particular pcp command to do that and it it sports three more smart fast and immediate and these are similar to the postgres mode like immediate kills it immediately fast wait for the connection and what fast and smarter kind of a same just they just wait for the connections to like disconnect before before killing the pg pool 2 so the next thing is authentication and security and i'm just just just pick up my speed here because i think we are running out of time so so this is this is the authentication model of pg pool 2 like client connects to pg pool 2 and it has to pass to the pg pool 2 as authentication and then in turn pg pool 2 authenticates from from the postgres on behalf of that client so theoretically you have to pass to two authentication steps and this is and again from 4.0 you you can effectively use different authentication methods between client and pg pool 2 and in turn pg pool 2 and postgres again this is this is very useful thing for example you can use md5 authentication between client and pg pool 2 while between pg pool 2 and postgres you can use some other password authentication method and so yeah for some this can be useful in particular kind of setups where where where maybe you want to use SSL authentication or certificate based authentication but you still want to have a password or something like that so yeah you can do it and it was not possible before version 4.0 but now you can use you can use different authentication methods so these are the supported authentication methods with pg pool 2 like trust we all know this is not this shouldn't be called as an authentication but it is a trust like you don't need password to do to log in then md5 then scram and then pam and SSL certificate authentication and plain password authentication these are just similar method as supported by postgres and these are also supported by pg pool 2 and just like just like postgres pghba.com pg pool also have a similar kind of file that is called as pool underscore hba.com and it's sort of the same purpose as as but it does as pghpa does on the postgres side like and you can pinpoint the connection type client IP range host name database name username like you can limit the connection depending upon these parameters like if you want only particular range of ip to get to to to allow to connect through pg pool 2 then you can specify that in pool hba.com and then you can specify the authentication method like you want to force that particular authentication method to particular ip ranges then you can do it like you can do a similar things as you can do with the pghba.com when you connect to postgres directly so yeah this is this is the hba record format you you can see that it's it's our most exact to copy of pghba.com record format so there are few differences this is again the example of pg pool pool hba.com and yeah you can just see like you can use host name you can use addresses you can use IP ranges and everything I yeah these are the differences so same group this this keyword is not for database field is not supported like in pghba.com you can provide same group in part of database name so that the all the database part of the same group should should match with this record but because pg pool 2 does not have the visibility of the databases and the databases exist on the postgres side so you cannot use the same group but what pg pool 2 does is it it matches that database name from the startup packet with the record in the hbs side and if both have the match exactly then only then it it allows it so it has no information about the groups and database groups so that's why it's not possible to use the same group similarly loop name following plus sign are not supported because again we don't have any information available in the pg pool 2 side and yeah another thing is it it doesn't matter if the database or the user does not exist in postgres if you if you provide a username in pool hba.com and the the same user connects to pg pool 2 it will pass it will pass through the sorry it will pass through the pool hba. pool hba authentication step of course it will fail when when we'll see that the the same thing is not the same user or same database does not exist in in in backend but for for passing the pool hba.com the user doesn't have to be existing or the database doesn't need to be present on the postgres side so these are these are few few differences from pg hba otherwise they are pretty much the same and they serve the same purpose so these are some of the best practices for securing the database the first thing is we can consider pg pool 2 as adding another security layer to the database system and then we can use pghba.com to only allow the connection from pg pool 2 ip address or pg pool 2 hostname so that there there's no external access available to the to our database system and then we can use pool hba pool hba.com to allow access to the user so even if something like even one of the system is compromised again it still the users will not be able to connect to database directly and so so effectively we have two layer two layered security using the pg pool 2 hba.com and pghba.com and yeah yeah so so this is kind of a best practice so normally what we do is we only allow single ip access in postgres side and then control the users from the pool hba so this is this is the best practice in there so pool password file okay so so there there are kind of authentication method which require a password to be present on the pg pool 2 side for us to make sure that pg pool 2 can authenticate from the server so for that purpose pool password file like if you want to use md5 authentication or if you want to use scram authentication they can only work if we if we place the password in pool password file and it is it is same like it contains a list of username call and passwords in lines for each each user we want to allow use to use scram or md5 authentication and you can store the passwords in any format in pool password file like you can store md5 format a es encrypted format plain text format or scram sha256 format of course if you use md5 store md5 formatted password in pool password file then only plain text and md5 authentication will work and in case of a es encrypted password all authentication can work without any problem because this is this is local encryption and the pg pool 2 does have a key to decrypt that so these these are kind of the passwords you can store and this is a note on using the a es encrypted passwords in pool password file so i've touched this on in the start they like as we know for a es encryption we need encryption and decryption it's a it is a symmetric encryption so we need a same key to encrypt and decrypt so we need to have this key present on the system and by default the pg pool 2 looks in users home directory dot pg pool 2 pg pool key file and if the if that file is present and has appropriate access then then whatever is written in that key file is used as a key to encrypt and decrypt and yeah it's and all the all the passwords in the pool key file and will be encrypted and decrypted using the same so effectively if you don't want you don't want to store the key in the encryption in the file you can also provide the encryption key from the command line argument like minus minus enc dash key or minus capital key you can provide an encryption key using using when you are when you'll be starting the pg pool 2 in the command line but but i think that the best way is to use the key file and place it in some some some secure directory and yeah use the environment very variable pg pool pg pool key file to point to that directory from the terminal so so this is this is pretty very secure form of encryption and if you keep the key file secure your passwords are always secure so this is the pg pg underscore enc utility that comes with pg pool 2 you can use this utility to create and store the es encrypted passwords pool pass file so yeah this is yeah you can also provide the encryption key as a command line argument or it can read the encryption key from the file and it can also read the encryption key from pool key file environment variables path and yeah it can do pretty much everything and like for as far as passwords and if you if you give minus m option to this it will automatically go on and update the pool password file like you don't have to copy paste anything like it will automatically update a pool password file which can be used by the pg pool so yeah this is this is just an example of like working of pg enc i have just created a pool as key file i've changed it permission like we changed it to 0600 i've exported the pg pool key environment variable to point to that file then i've used just pg enc then use a dp name and username and yeah there i'll type the password and it gives me like it it is using the slash home postgres postgres key file you can see there there is encrypted password is stored in the pool key file pool password file so it's pretty much pretty easy to use the enc utility and store the encrypted passwords similarly like pg enc we have pg md5 utility it does the same thing but it stores the password in md5 password in md5 format so yeah i think i'll just skip this one because we are already short of time so this is this is this is the but pool password file is used for it is used to authenticate clients with pg pool 2 and used to authenticate sessions with postgres so so the pool password file serves to a purpose when the client comes in the pg pool 2 will use the username and password from in pool password file to authenticate that client and then if the the authentication is successful if it then it and and pg pool required to connect to the postgres using the same user it it uses the same pool password file and it depends upon the type of the configuration type of the authentication you are using in some cases it does not need the authentication pool password file to authenticate for example in case of clear text password authentication it does not require pool password file but in case of scram or more complex authentication methods the pool password file might be referred so pool password file must use the same password for user as are in the databases otherwise the authentication the backend authentication won't work the pool password file uses password type prefix in password string to identify the kind of password yeah for md5 password you have to prefix the password string with md5 for a yes and for text like from scram like these are the you need to you need to prefix the password with the appropriate type of encryption that password is using this is for the pg pool 2 to identify like what kind of password is present in the file so this is these are the configuration related to the authentication enable p pool hba this enables the pool hba setting it off will disable the hba it's pretty much obvious from the name of the name of the parameter then pool password file this sets the location the name of the pool password file to be used you can use a different name if you want to allow clear text from authentication well this is a pretty dangerous one to have but you can of course you can use that for example if if you allow pg pool to use clear clear text front authentication authentication then if some user tries to connect to pg pool 2 and pg pool 2 does not find the appropriate user a password for that particular user in pool password file which is required for the back handle authentication then if this variable is on then what pg pool 2 will do is it will ask the client side application to switch to clear clear text text authentication and if the client agrees with that then pg pool will get the password from that client in clear text format it could be ssl encrypted the whole communication channel but the the actual password will be in clear text format in store and then pg pool will use that clear text password to authenticate through the back end of course you you might not need to use that and you you need to do a proper authentication you need to provide do a proper configuration in pool password file but this is this is this pg pool does allow this clear to fall back on clear text front end authentication if if it doesn't find the one that the the appropriate password in pool password file but yeah it's you can you guess that this is the dangerous one to use okay the load balancing i think yeah so this is this is the load balancing like right queries are right requests are sent to the primary server while the read queries or read requests are routed to the standby server this is known and load balancing you can switch on the load balancing using load underscore balance mode is equal to on that is all you need to do so the load balance does work in replication and master slave mode and in this mode read statement inside the transaction block or load balance until the writing statement is issue in in this particular transaction curing curies using system tables on long tables and temporary tables are always sent to the primary node and yeah you can guess for the obvious reasons and a slack statement having a writing function called do not get load balance yeah again because we cannot execute a write function on a standby or on a hard standby because on the read transaction there are a lot there slack statement inside serializable transaction do not get load balance so yeah i'll try only try to touch on the important one so if you're using the extending extended query protocol and if the parse of a slack statement was sent to a standby node because you know that in extended query mode mode the the queue the query is divided into multiple messages and client messages so for example the parse was sent to the standby node and then the client issues are bml statement before executing the previously parsed then then as per the as per the algorithm of the pgpool to load balancer whenever a write or dml is issued in the transaction all the subsequent reads clearly must be must must not get not get load balance because then there's a chance that we that we can we can get a older version of data in that case if the replication lag is like on a higher side so in that case what pgpool does is it it resends the parse the the previously routed parse measure message to the primary again so this is this is a kind of a behavioral thing like then the and the in that case the same parse match message could be sent to multiple nodes yeah this is this is one important thing to consider then the then there are ways if you want to force a particular query to the primary node and you can you can use no load balance command at at the start of the query and this will force the query into to land on the primary server and of course the behavior depends on allow SQL commands and this is this is this does the working of this page shows the working of the no load balance command so so in the first uh like uh I've I've used the no load balance command in the circle in red and did a select inet server address to just to make sure which which server executes that query and if you see the query was executed by by the primary node and because we we we get 172.2 or 0.50 and if if we issue the same query without the node load balance command then then one of the standby executes that query so so you can you you can force any query to land on the primary server by using the command at the start of the query otherwise the internal logic of the pg pool to load balance will kicks in and decides which query needs to be sent to which server then there are black query pattern list like you can put in the query this is this is a configuration parameter and you can put in the queries there like which queries you want to you always want to send to the primary server so yeah and you can also use a regular expression so I've added an example like if I want a slacks on a table and the nlb to to always land on the primary server I can put that select static from nlb table in the black query pattern list and this query will always be served from the primary and yeah this is just showing that the working of the black query pattern list and then then how do we tell pg pool about the writing function like which functions are writing and which are harmless to be executed on the standby so we use byte function list and lag function list parameter these are two parameters they are mutually exclusive to each other like lag function parameter list it contains the function which which are writing function you can use the regular expression and all the all the select statements who have the function calls present in the black function list will always land on the on the primary server alternatively you can use byte function list to tell that these functions are these functions are reading function and all other functions other than which are present in the byte function list are writing function so this is this is depend upon the design of your database like whichever configuration parameter you know you want to use so I hope I'm making myself clear and just trying to wrap up things so again you can specify a comma separated list in the writing function and you can also use the regular expression and like the best practice is you can either design for the white function list or for the black function list and all what you can do is you can you can prefix your function name with get or set for example to use white function list all the writing for all the reading function are prefix with get and in the white function list you can just use a regular expression get underscore star this mean any function starting with get underscore that that that that that is a reading function and you can safely load balance it and all other function which does not have a get prefix in their name they they they are writing functions okay so yeah this is just a color inverse of a white function list we have just discussed yeah this this explains the same thing like we can use a particular naming scheme for writing or a reading function like we can use set underscore for all the writing functions and prefix all our writing function with with with set underscore and use a regular expression in in black function list parameter okay so this is this is to ensure like like we always get a latest version of that that's a depending on the type of replication setting and workload requirement load balancing can be configured to ensure we always get a latest data PG pool provides two configuration parameter to configure the load balancing behavior depending on the state of grand session so this the first is disable load balancing on right and the second is the lathe threshold so we can use these two parameter to fine tune the settings so disable load balancing on right it can have four possible values like off always transaction and trans transaction when set to off read queries are always load balance no matter you are inside the transaction if you are like whatever the state transaction is in and if well you this is as you know this this will give the best load load balancing performance and use for a synchronous replication because in that case you know you know your your all the replicas are same but yeah in case of asynchronous replication there's there's a possibility like you have you can read a older version of data when you set disable load balance and write and then when set to transaction and write query appears in an explicit transaction subsequent queries are not load balance until the end of the transaction yeah this is this is this is the normal mode kind of a thing like whenever a writing query appears in the transaction all the subsequent treat queries must also be sent to primary to make sure that they get the latest version of data you can also set it to the trans transaction and this is this is pretty hard one like if a write statement appears in explicit transaction subsequent read queries are not balance until the current and subsequent subsequent transaction in your session ends so theoretically it it it widens the spectrum of disabling the load balance and when set to always well it disables effectively disables the load balancing until the time the current session ends so I don't know where always can be used but yeah they we do have a this particular setting okay then the delayed threshold like we can set the specific maximum tolerance level of replication delay in number in ball bytes on the standby server like we can set the setting the delay threshold to zero this mean do not check the delay threshold but you can set it to any value in number of bytes and if if you like if the replication delay at particular time exceeds that that like exceeds that amount of bytes then stop the load balancing until the date the late threshold comes back within the acceptable range so this is this is the you can configure the threshold and yeah then it works automatically whenever the threshold is reached the load balancing will stop working and then when we come back within the acceptable nature no okay so we have already discussed sr check database sr check user sr check password and sr check period configuration parameter these are parameters which configures the database server user that is that is used for for reading the delay threshold like setting from the backend and yeah this we have already discussed that so basically pg pool provides variety of configuration to fine tune the routing of read queries to effectively distribute the load routing of these queries can be also be configured on basis of connection application name target database name backend node and load balancing rate so yeah you can I'll quickly go through these parameters like you can set the application name redirect preference rate like you can make pg pool to like to route a specific the connection from a specific application to a specific backend node for example if you want all your pcql clients to to be served from like to be served from 50 percent from standby 1 and 30 percent from standby 2 like you can specify the ratios like this is like this is for the load balancing thing the right queries will always the right statement will always be served by the primary node but you can you can set the ratio of the read queries for particular application name and this like yeah this is at month and similarly app name yeah this is the example of app name reference list like if you see on the bottom of the slide we have used pcql primary and my app this mean only route route all the queries from the pcql to the primary and then I use my app 1 ratio 1 and then my app 2 standby you can use node id like the my app 1 this mean route 30 percent of the queries to the node id 1 and my app 2 route 100 percent of the read queries to standby this is like you can use the standby as as general standby name standby like all the standby falls in the same category if you write standby here or primary you can use that and you can also use the node id so this is an example similar to the app name redirectional reference list we have a database redirect reference list we can we can make sure like the the connections to a database one should always be served from particular backend and the connection for database tool should also be should be served from primary only it it uses a similar similar format and as of application preference list but in this case the database name is used for the comparison or as a parameter rather than the application name and these these tools are quite kind of a new parameter introduced recently and the backend where this this this is as old as PG tool itself like you can set set the backend wait for each backend and for example back and wait zero and and then back and wait one for each you can set the back and wait for each node individually and you can set it to any values and when PG tool starts it just normalize all the back and wait and calculates the the ratio and then tries to try to route the queries based on the back and wait ratio ratio so if you see the the first like back and wait has the least preference and app name and database name redirects have the most preference if if some rules get satisfied by app name or database name then it will not consider the back and wait but if if there is no rule for a particular connection available in in app name or back and or database name preference list then this back and wait will get like this this rule will be used okay then then there is an another new thing in PG pool session versus statement base low statement level load balancing historically PG pool by default selects the load load balancing node at the start of each session that node is used as our destination for all load balancing queries for the lifetime of the session this is this is how people used to work until recently we have a statement level load balancing and setting this parameter to on make the PG pool tool to select load balancing node for each query whenever when the query arrives this is this is the example like for example if we have three node cluster one is primary two standby whenever you start a new session the primary node will always be there for each session but PG pool two will select one of the and one node from the available list of standby from available standby nodes to use as a as a load balancing node for this particular session and you can see the the true the last column this is this is this mean this this particular node is used as a load balancing node for this particular session and PG pool will always use this node to to route all the read queries but now now with the with this new statement level load balancing what happens is rather than selecting a node at the startup PG pool two will not select any any standby node and it will only select one one primary node and whenever the query arrives depending upon the load ratio on the current load ratio on the system it will select one of the node one of the standby available standby node and send that query to this particular node so this is this is where you can you can change this behavior okay so this is this is another thing and this is one thing I got lots of questions in the query like do we can we actually gain performance with PG pool as a load balancer so so there are a few things like we all know that PG pool is a middleware and it has its own processing and network overheads and direct connection to Postgres perform better for smaller workloads like but but as as the load increases the PG pool start out performing the direct connection well it totally depends upon the type of the server and workload so I've performed a few benchmarks and with a very small systems and just to show showcase like where the PG pool is useful so the green line here is TPS with the PG pool two and the black one is the TPS with a direct connection if we see in we have used the PG band with a hundred scale factor and then I've done the done the benchmarking with five fifteen twenty five and forty five clients and if we see with the five clients the Postgres the direct Postgres connection was performing way better than PG pool two with 15 clients again it was better but as soon as we start increasing the load with 25 clients both are performing kind of a similar PG pool two and direct connection but with 45 connections like PG pool perform way better than the direct connection to PG than than the direct connection with Postgres so effectively if we have if we have load on the server then then then deploying a PG pool two in front of it and using it there's a load benefit will always give a better performance similarly I perform another test this time with a 500 scale factor and you can see the tip of point comes way earlier than for five connection both were performing almost the same but for the 15 clients PG pool like almost double the performance and for 25 and for 45 clients the difference was huge so so this is a conclusion for the smaller database sizes and smaller workloads we don't expect any performance gains from PG pool two in fact it degrades the overall throughput because extra hop between client and Postgres and processing involved on PG pool two side performance gain happens when load on a single PG pool two server becomes significant and PG pool load bends will give better performance than standard for a single Postgres server whenever IO load on the single PG node becomes significant or a processor on a single PG reaches a maximum or a database size becomes very large compared to the RAM on the PG server so these are the conditions when you expect the PG pool will perform better I've also done a blog on the same and you can like where I've done a complete analysis of these performance and yeah this is the link of the blog so another thing is connection pooling yeah I'll just I'll just list down the key points so connection pooling is as as you all know like it is like the client connects to the PG pool two and in turn PG pool use connects to the Postgres then the client disconnect rather than disconnecting the the the backend connection PG pool basically pools that connection and a connection is identified by username user colon database name and this is this is just connection pooling and the important thing to note here is as PG pool is a multi-processed architecture so so each single child process of PG pool two maintains its own connection pool so so if you have 100 child process then effectively there are 100 small pool connection pools in the PG pool two and whichever child process gets the client connection it utilizes its own on connection pool so there's no global connection pool in PG pool and the other thing to note is template zero template one Postgres and regression database they do not get cached and they do not get pooled so these are these are some parameters which we can change to fine-tune the connection pooling and I think like I'll just skip them and you can just read them from the CD so yeah this is this is the point I need to explain the serialize accept and like this this this enables the serialization on incoming connection like you might have been familiar with the thundering her problem that thing is if you are using a large value for numb inner children and numb inner children is a parameter which like the PG pool to spawn numb inner number of children at at at the startup for example if you want to your PG pool setup to handle 1000 concurrent connection then you'll set numb inner children to 100 1000 that means at the startup 1000 child process will be spawned by PG pool two although this is a very huge number and normally we don't do that but but just to clarify the point so so so each each each one of the child process of will be listening or and will be waiting for the child for the client connection and whenever a new client connection comes in like everyone all of these 1000 connection will try to get that connection and only one of them will succeed and rest in 999 will will fail and they they'll come back and start their second listening mode again so what what happens is that if you are using a large number of connection then this will cause the single connection will cause so many like 1000 contact switches which which which effectively degrades the performance so so for for for that purpose PG pool two has this this configuration parameter called serialize accept if you set it to on then only one of those 1000 connection will wait for the connection and and once that connection once that child process gets a valid connection then another basically the child process will call will make a queue and the the top of the queue will will get the connection and then then this is how it will work so so basically what you need to but you need to you should only set the serialize accept for the large number of connection but because we have done after this parameter we have done some few benchmarks and it is known to slow down the performance for the for the smaller value of number in general for example for 100 or 200 nominal children you don't need to you don't not need to set the serialize accept but for the larger value you may need to set the serialize accept to on and the the name of the problem it is the thundering hurt problem which is solved by this variable so we are already three hours and 15 minutes marks so these are show pool node commands and yeah this is the best practice each child process manages its own connection pool so carefully select the value of max pool max pool is the number of database username pairs each child process can can cache for example if you set max pool to 2 this means each child process will will cache up to 2 connections and yeah this is this is how you will calculate the max connections for postgres backend as as i just mentioned like each child process each child process of pg pool to manage its own connection pool so so effectively we have max pools multiply number of num inner children this is the number of backend connections we would need in in in a situation of maximum load so you just multiply these max connection minus superuser reserve connection on postgres side should be less than equal to max pg pool to multiply the num inner children plus two plus two is for the health checking uh uh connection which pg pool to make so this is this is a kind of a formula you which you can use to calculate the num num uh max connection uh value on the postgres side okay for the log output yeah you can uh pg pool use the similar log error verbosity and that line mean messages this is the similar kind of protocol which postgres uses and then yeah this is i just added just in case you can enable the memory cache on the pg pool to side and then you can use the replication mode we have already touched a little bit on this and this is the GUI i've discussed about GUI tool for monitoring that configuration just stay tuned for this we are making a new better version of the GUI tool and it will be available soon so yeah that's it from my side if you have any question please okay let me read out the question we run pg pool to reset 7 and use rep manager to manually do failover and switch over pg pool to redact these events and properly recognize the new but all client can actually reset client have to reconnect and then things are okay again but we need to keep the pg pool children alive in this situation and make the failover transparent with client's agent well in case of a primary node failure there there isn't a way as of now there there is one tool item which we we're still struggling to fix that and it is to preserve the client the front end connections and only do reset the back end connection but it is still into rule list and yeah but unfortunately there's no currently there's no way available to to preserve the client connection when primary fails but we are working on that per user per database i'm sorry i'm not getting are you are you referring to the connection pooling or what briefly talk on user database connections can it be configured on per user per database no no no no no not really basically you can just set the max pool variable and then pg pool uses lru likely it removes the least recently use connection and the best practice here is if because normally the the application server they use is one or two or three users to connect to the databases so you just you set the max pool to the value like for the most number of use connection like for example if your application are using two two database username pairs you just set the max pool to two and then then even if the third one comes in then then the least use will be removed but there's no what we can call it as a pre-warm cache there's no no no kind of pre-warm cache but again yeah this this this would be a good thing to to be frank like if you want to use if you want if you want to have a only connection pooler then then i think the pg bouncer is a better choice when it's come to connection pooling performance but if you want to have a load balancer along with the connection pooling then then the pg pool comes in because because of its it's so many modules the pg pool is a bit heavier than the than the pg bouncer so so yeah it has a little less functionality when it comes to only the only the like connection pooling site but overall yeah the most use use case of pg pool is for the load balancer and yeah we do have this one in list as well to to have a pre-warm or a cache or caching something similar to the pg pg bouncer okay guys thank you very much i hope you get something useful out of this and yeah you can shoot any question if you have on the pg pool community i'm pretty much pretty active on that i'm also included my email in the slide deck so you're welcome for any question and any suggestion and i hope to see you guys soon somewhere in the world sometime and thank you lince for for for for arranging this training session and everything