 This is the last talk of the deaf room from Postgres this year at FOSTA, and this talk will be done by Patrick Francais. So please give him a warm hand. Hello everyone. I hope you're not waiting for the next talk, because maybe you will be a little disappointed. Who was already there for Magnus talk about replication? Oh, there will be some kind of redundancy, but that's not the same topic. So let's start. I'm Patrick Francais, and I work from Luxo Data Company. I'm PostgreSQL consultant and trainer since 2013. I have my first contact with PostgreSQL in 1999, but I discovered the community quite late, so I'm not very known. And I never stopped using PostgreSQL since I first started when I was a student. This is my first talk in English, so I will be more comfortable if you do not ask questions, please. But feel free to come and discuss with me after, or if you miss me, I've got a Twitter account. You can contact me this way. Maybe you will not want to follow me, because lots of friend things. Luxo Data is a company built on three essential pillars. The PostgreSQL database, of course, the data method and, of course, automation and cloud technologies. I mean, in these years, that's necessary. Let's go back to our subject. The subject is the many possible lives of a world file. And for the rest of the conference, I will become somewhere else. I will become something else. Soon, I will be on this file. What am I? What is my purpose? My purpose, some many database called me transaction log. That's not wrong, because transaction are recorded inside me. Other databases call me redo log. That's not false, because transaction recorded can help make redo. In PostgreSQL world, my name is right ahead log. So what am I? What is my purpose? My purpose is to record data changes as soon as possible. One of my purpose is to bring data consistency. The changes that are recorded are in binary special diff format. That is not humanly readable. One of my purpose is to help restore data. So if you have a problem and you have to make a recovery, I can help. And one of my possible purpose is to be the pillar of replication. Let's start with some theory. Maybe the boring part. We're in PostgreSQL world and PostgreSQL is an ACID database. ACID means atomicity, consistency, isolation and durability. Atomicity means all or nothing. Meaning that when you run a transaction, you don't want it to be inconsistent at some point. So for example, in a bank, if you want to send money, you don't want the bank to get your money and then the transaction stops and you have lost money. That's not acceptable. That's why you have atomicity. Consistency means going from one valid state to another valid state. Maybe some constraint can be invalid at some point during the transaction but at the end of the transaction, everything is valid. Isolation means many clients can run queries at the same time as if they were alone in the database almost. And durability, the part that is the most important for me, is that once the data are in the cluster, they will stay in the cluster. Just a little more about theory. Here we have a PostgreSQL server with memory represented and processes. And in the back we have data files and the workspace where all the wall are recorded. When a user wants to modify, wants to change data, he will issue a request to the database. If it's not already the case, the database will load the data from the data files into the memory and then it will make the data change. At that point, the memory is called dirty because it's not the same as what it is in the files. This change is written into the raw file. So maybe inside me. At that point, if something bad happens, the PostgreSQL server is able to redo the transaction, to rerun it because all the data are into the wall. So we can say to the user, OK, the transaction is committed. And later, the data will be recorded into the data files. That happens when you make a checkpoint or when a checkpoint is made automatically. So let's go back to my favorite subject, me. My vision of time. I have a special vision of time. Human time is made of hours, minutes, seconds, days. Well, that's interesting for humans. My time depends on activity. So it is made of transactions. And when I say soon, I will become a raw file, it can mean microseconds to years. To be more honest, it will be a more raw stick. It will mean maybe seconds to hours or days. I've got some expectation for my future life. I want to be a fair good worker, so I want to reach the end of file. For a human, that will mean something like reaching 100 years old, something like that. Other expectation, I want to travel. I want to see other things. I want to see the world. Well, other things. I don't want to be involved in any disaster, meaning recovery or promote things. That's where things happen in this case. I don't want to be part of it. But hey, we'll see. And I don't want to end up in cryo chamber. I think you call that cold storage or maybe tape band or DVD or brewery, something that look like that. But hey, I will have no choice in this. But you will see that all my goals may be achievable. There are some clues in what I will tell you. So let's talk about my life. Let's start from the beginning, the birth. My birth happened in many ways. I can be pre-allocated, that's a normal way, if there are not enough world files into the directory. Some more will be pre-allocated at some point. I can be recycled from an ancestor's envelope. For humans that would be very world, but that's life for world. That's a normal process. Or I can be allocated on demand if you have a high activity on your cluster. Maybe you won't have time to pre-allocate every wall. Just after my birth, I will start to work. I will start my job. What is my job? My job is to record events. So the PostgreSQL processes will write events inside me. What kind of events? Data changes. Well, that's what I explained in the theory part. Maybe replication events. If you have replication, there may be some things to record. The special event, the checkpoints. When the checkpoint happens and the data are written into the data files, the checkpoint is written into the world file, the checkpoint event. Writings inside me happen in append only mode. Except for metadata, of course. What about the work time, the work period? When will I start working? I will start working after the switch from the previous world file. And I will work until the switch to the next world file. We'll see later what is the switch. My work will then be a continuation of all those world work. So it's a teamwork. What about that switch? There is a normal switch that happens at the end of the file. When I reach the end of the file, a switch to the next world will happen. It can be a manual switch with a user function, but it will switch in PostgreSQL 10, of course. And special case can happen with PITR or Promote. And I will talk a little bit more later. What about cravings? I said I don't want to end up, if possible, in Crayer Chamber. What happens? At the end of the working period, when everything is okay, when the switch happens, I may be copied to another location. That's not mandatory. That depends on configuration. And this process is called archiving. How do you make archiving? You have to modify configuration and put something, an external command, in the archive command configuration parameter. Then you will have to enable it with configuration parameter archive mode, set to on. But be careful, when you archive, you may retain words longer than you expected. For example, if there is a lag or an error in the archive process. Where are these archives? It can be many possible destinations. It can be a local or remote file system. That's pretty current. It can be table or permanent storage, which you would call cold storage, and I call Crayer Chamber. And maybe, you would likely never be... Sorry. You would likely never to use me again. Because it will mean some problems. For example, a disaster, a cluster problem, and a recovery to do. An exception to this is non-stroming replication. What about my death? That's part of life. The death happened with the checkpointer's process. In that case, I may be deleted or recycled. So the checkpointer's process is like an angel of death. And it may be at the same time an angel of birth for another world after me. In some case, my death can be delayed. And I hope all of you would like to be the same. One thing I can die with manual action, which most of the time will mean an error. Or maybe a disaster recovery plan if you are lucky enough to have an experimented DBA. An interesting thing is I can be dead in one cluster and alive, or what you would call alive, maybe, in another cluster if you have replication. And at cryolinks, would it mean dead or alive? I cannot say everything so that humans can understand. I have my own life with special rules. When you're a cryochamber, maybe at some time you will need to be defrost. Oh, sorry, I would need to be defrost. That happened only when recovering. At that time, I may be copied from archives so it will be an inverted archiving process. And I will be fully read again to read you all the transactions inside me. Even if there are other ways. Oh, sorry, didn't read. So now let's talk about my identity. I talk a little about my life. Maybe you want to know a little more about me. Let's talk about my name. My name is made of three parts, eight digits each. The first part is the timeline ID. In my case, it is two, pated with zeros. The second part is the logical file ID. In my part, it's eight, pated with zeros. And the third part is BB, sorry, is a physical file ID. In my case, BB pated with zeros. The timeline ID starts at one. The logical file ID starts at zero. And the physical file ID goes from zero to FF. An exception to that is the first of all worlds, the great ancestor. There is an exception because it starts at one. What about my name? You can call me with a nickname 8BB. And don't confuse with that guy. His name is not that. 8BB will mean the BB's segment in the eighth logical file. All the IDs of my names are in hexadecimal. That's why I have written 0xBB. All my bytes, all my parts have an address. That's very useful if you want to precisely know where you're reading or writing. And there is a special notation, the LSN, the log sequence number, that is very used at least in replication. So this notation 8BB 3CB0D2 is my byte 3,977,426. I didn't say much about the first part, the timeline. Well, what about the timeline? Something horrible happened in the past. That's why the timeline ID is 2 and not 1. And that horrible thing was that a part of the world family was abandoned. I don't want to know much about that. But there are some details in the file 2.history, of course, made it with zeros. In Oracle World, the timeline is called the Incarnation. Let's go back to me. The version. I am tied to PostgreSQL version. That means that my internals may differ from one major version to another. For example, in PostgreSQL 9.5, there was a big change in the format. What about my size? My different size is 16 million, oh, sorry, no, humans. My different size is 16 megabytes. I'm divided into blocks by default 8 kilobyte each. I've got my full size when allocated. Let's talk about something, the world level. There are three different levels available. The parameter for this is world level inside the configuration. And it allows me different life opportunities. And as it can change over time, it might change over time, I can have different world levels in different times. Inside me, inside the same world, inside me. Let's see those three levels. First level is minimal. It permits you to make a recovery, but only from a crash. As crush are pretty well in PostgreSQL, you may think of that as immediate shutdown or unexpected stop. Maybe someone unplugged from the electric network or something like that. In real level, minimal, I will bring data to consistency to the cluster. And that's all. The life, my life would be very short in that case. So I hope the cluster configuration is not set to minimal. Second level is replica, which is the default in PostgreSQL 10. It permits more than minimal to make a carving and to make physical replication. So traveling for me. It permits also written queries on the standby. And of course, as it permits more things, there will be more information stored. But the size will always be the same. The third level is logical. It permits logical decoding. That is, for example, using a tool called PgSquiz. Or it permits logical replication that Magnus took a note about this. And of course, even more information are stored. Don't know more about me. Where do I live? I live in PgWall directory in PgData. Or in any directory that is similar as PgWall in PgData. So you know almost everything about me. That's enough to make a passport so I can travel. Well, what about my date of birth? Date of issue or exploration of this passport or my photo? Well, that's not a human passport, so I don't need that. I have to tell you something more about disaster management. So when you have a disaster, when you have a big problem, maybe at some point you will need to restart from a backup. So you restore that backup. Then you will play the work that has been made after that backup. Maybe at some point you will stop. And then you will go back in production. Let's talk about a special recovery. The automatic recovery. That's not exactly what I just said. The automatic recovery happens after a brutal stop and will run automatically. You will not need to restore a backup because the cluster was still consistent at the last checkpoint. PostgreSQL will then look up for the last checkpoint inside the walls and he will search from the last one to the old ones to find the checkpoint. Maybe inside me. When it finds it, it then will replay the transaction and redo the events to the end. And then you will get your PostgreSQL cluster running. What about a deliberate recovery? In a deliberate recovery, you humans will start from a physical backup. Then you will have to write a file called recovery.conf. In that file, you will put the instruction for the recovery. In particular, you will need to set a restore command to fetch the wall. The inverted command from archive command. If it's a copy command, you will have to set the copy command inverted. And then it does the same as automatic recovery. PostgreSQL will search from the checkpoint and redo the wall files. So, me at some point. And then the timeline will change. There is a special deliberate recovery, which is PITR. That is point in time recovery. That's deliberate recovery, but you will specify the end of recovery. This end of recovery can be a timestamp, a transaction ID, an LSN, or a main point. And then you will do the recovery. And you will set, you will maybe set an end of recovery action. That's a new thing in PostgreSQL 10. The recovery parameter for that is recovery target action, which by default is to pose. So you can check that everything is okay. That the recovery is okay. What about the timeline change? How does it happen? The last row of the recovery is copied inside the world directory to another name. The name of this copy will differ only from the timeline ID. And the content will differ from the recovery point. And of course, the old row after the recovery point in the old timeline will be abandoned. So my favorite part, traveling. Remember, I want to travel. I have to plan a trip for that. Also, I will go from an origin or source. It will be called a primary cluster, or provider cluster, or maybe a publisher cluster. I will have to set a destination, a standby cluster, or a subscriber. And of course, a transport method. Replication can happen via archives, or via streaming replication. The concept of replication is pretty simple. It's to have a continuously up-to-date clone of data. So to have that, you copy the data, and then you will play the transaction, all the events, all the change events. And with the best of the transactions, that's me. Let's zoom in the physical replication. To make this, you will have to make a duplication of the wall fire. So at some point, I will be copied to another cluster. This is done mostly with streaming replication. In streaming replication, cascading replication is allowed, so that a destination, a standby, can be, at the same time, provided from another standby. And the replication is that I will be sent gradually by pieces, octet by octet. Logical replication is not the same concept. In logical replication, I will be decoded on the publisher site. Then the information will be transformed to extract the wall changes and replicate them. And in that case, I will be sent to feed another wall in another cluster. So I will not travel, but that's okay, because my life will be useful for another wall. Let's talk about two processes, the world sender. The world sender is the process that gets the replication connections, and once replication protocol commands that are required, and then he will send the wall content to the target node. On the other side, there is the world receiver. The world receiver fetches the data, it permits redo events, so mostly data changes, and maybe he will send feedback if you use hot standby feedback derivative. There is a special receiver process, PJ Receive Roll, which is a command you can run. It is quite special because it collects and store the data, but will not permit redo. I mean, not immediately, because there is no PostgreSQL instance at that time. That's a known command. So PJ Receive Roll will make a stringed archive. You will have an archive, gradually up to date. And for the replication, there is the last concept, which are replication slots. A replication slot is a client-dicticated resource. In this resource, PJ Receive Roll will store the replication state of the replication for this client. And it will forbid the relation of any wall until it is fully replicated. So that's one case where there will be a delay in the desk process. So that's it. It's time for me to get to work. I've got just information. I'm not about to begin my life. I'm not about to start my work. So is my whole life, which you have seen can be quite complex, can have many possible. And I hope I will enjoy my life. And I hope you will enjoy your life too. And thank you for your attention. Anybody with some questions? That's the last talk. So thank you for attending. And see you next year here at FOSTA again at Postgres Dev Room. We also have a booth. That is, the booth is up and building K on level two. So you have to go the steps up or have to use the other entry. And we have FOSTA and PGD next year again. So one day before FOSTA, there was a Friday with one track Postgres talks. So enjoy your time at FOSTA. Bye.