 The title of my talk is Auto Database at Sanity. Another title is Syntax Error, because there is no such comment in Postgres, and I will be talking about preventing disasters. Now, Gabriel from Second Quadrant delivered a very nice talk on Barman and how you can recover from disasters using Barman. But my talk will be how do you prevent disasters because you don't want them to happen altogether. And I work for Zalanda. Zalanda is a big company in Europe. We probably sell more shoes than any other company in Europe, at least most of the companies. And our marketing department is very sensitive about the numbers of customers we have, but I will just say that we have millions of transactions and Postgres is the system that supports most of them. So we have a big technology stack, and really a lot of technologies are involved in making our customers happy and making us happy as well and sleep at night. So all our components are interconnected. Somehow, unfortunately, they are interconnected through the database. And this requires additional attention from us, because we want to make sure that no single change will break the database and the shop should run without any down times. And our customers are really crazy. Sometimes they shop at 2 AM, sometimes they shop at during the day, during the holidays, 24-7. So we really have to make sure that our database is also run 24-7, and we don't have a possibility for down times. We don't have this luxury. So I want to talk to you about how do we change the databases. So Valentin is here. He was talking about how Postgres is easy. But to make Postgres really easy, you have to think about your process of development, and not only the software development, but the database development. And I would first talk about the changes in your database, about what possible problems one can encounter by changing the data in the database. And I will also cover some tools that makes this process more easy. And I will talk how in Zalando we use those and other open-source tools to make sure that our databases are always available. So let's cover the database changes. Usually there are three of them, the model change, which changes the scheme of your databases. This is what you do when you develop your application. And for instance, add columns or add tables and add foreign keys. All these involves model changes. And these changes are pretty... So applications are usually very sensitive to these changes. Because if the application expects a certain table and the table is not there, then the application just breaks. So you have to make sure that changes are tracked. The others one are not really less important. The data changes, usually it's data migrations. For instance, you may decide to populate your table after you edit it, for instance, with customers. Or you may decide to update your table to correct some error. And the table may be 10 gigabytes. And it might take quite a lot of time for you to do this. And the third level, third type of changes, is the changes to your code and to database functions. If you use database functions. And this also involves some issues, which I will cover. So there are different types of changes. And yeah, so this guy is expert for changes. Changes that work for you. But sometimes they don't work in production. So I want to talk shortly about some horror stories. And these horror stories are based on real events. But all the table names and code names are changed. So one of them starts like this. You get an internal error. And you see developers running to you and saying some loud stuff. And they are definitely not happy. So you start looking at the logs to see what caused this error. And you get to the application. You get to the Postgres work and you see this query. This is just a select. So you check the table is there. But you see that the column that is trying to use is not there. And you go to the logs a little bit upward and you see that this column was dropped by the developer. So some developer decided it's a good idea because the column is obsolete. And why would you keep this? We can save the space in the database and it just went to the database where it's equal and dropped this column. And now we have an application that is not running. For sure, once we diagnose this problem, it may be easy to fix or it may not depend on the content of this column. But what we want to do is not to fix the problem afterwards. We want to prevent such kinds of problems. And for this, we need to be able to track what statements are executed against the live database. And better, we want to actually test these statements before on some staging systems before running them directly on production. So I want to also cover one thing that is related to the logs. Unfortunately, it's a little bit too dark. But we have a tool that allows you to show the running processes on the database. It is similar to the top, but specific to Postgres. It shows you some system statistics and it also shows you what kind of queries are executed currently. Basically, the contents of PgStartActivity at a given time. And it's updates in real time. So what we can see here is that AutoTable took a look on the database and forced all other processes to wait on this log. And the developer thought this AutoTable is very lightweight because the table it tries to alter is almost an empty table. It has 8 kilobytes. So what can possibly go wrong? The 8 kilobytes table, AutoTable, run in an instance. But what actually happened is that someone was doing select from this table before. And how Postgres works? So the AutoTable acquires a very heavyweight log, the access-exclusive log. It conflicts with other horror story that involves sprocks. And also show one of our tools, the open source tool called PgObserver. Yeah, it doesn't have a light mode, unfortunately, it's only dark mode. We like darkness in our database team. But what you can see is the development is the running time for the sprock, the stored procedure that was recently deployed. And the red line delimits the running time of the stored procedure before the deployment and after the deployment. And you can immediately see that the time after the deployment just skyrocketed. So we jump 10 times or even more. And this resulted in a lot of timeouts. And the rest of the story, you don't want to be this guy. So the sprocks also has to be tracked. The sprocks has to be tested. And one has to be aware which sprocks are running on your database. So what I want to take are some techniques that are usually used for the source code of the application and transfer these techniques to the database development. So these are, for instance, version controlling. How many of you have your application under version control? How many of you have your database called under version control? Okay, quite a lot of people, I can say. And unit testing? How many of you do unit testing for your database changes? Valentin, raise your hand. You do it. Okay, okay. Yeah. So also, how many of you have staging environments for your databases? Yes, this is very good. Awesome. And the last thing is the special deployments for your database changes. So you may be aware that at the given time, your database changes, and it's not like your database is changing constantly because someone's going and dropping your columns and then your application stops working. So this is things we will have tooling for. And I want to mention that Postgres is really very good for doing structural changes and for supporting these kinds of tools. Insane and great for same database changes because it's fully ACID compliant, meaning that you always have your, once you edit your data to the database, you'll have your data there. Who actually knows how ACID is translated? No one? Okay, atomic, consistent, acceleration, very good. And the last one? Durability, exactly. So Postgres is durable. Once you put the data there, the data stays there. This is very good because you cannot test the database, it just doesn't retain the data you put there. It's not a database. The one thing that is very important is transactional DDLs. It's amazingly important for testing because you can actually say something like auto database, auto table, add column, and you roll by this transaction and the column is not added. The database is in the state like before the transaction. And you can write a lot of tests, like adding a column, putting some data into the column, adding a foreign key, seeing if it works or not. So all of this is possible with transactional DDLs. Postgres is quite unique in this regard. There are a lot of other commercial databases that doesn't support this feature. So you can write stored procedures in different languages and this encourages you to use the stored procedures instead of writing your code inside, writing your SQL inside your code. And the last thing I will cover is this also allows you to select which function, which stored procedure you run via this mechanism called the search path. So all of these makes writing tools to have structural deployment and change management in Postgres quite easy. And the first tool I want to cover called Sketch is the tool that is written by David Wheeler from Portland. He's a Postgres hacker that usually goes by the name of Theory. And he wrote quite a lot of amazing tools and Sketch is one of the tool I learned and actually evaluated quite extensively. So you use this tool to create atomic database changes step by step. He says about this as agile database development. So we don't throw all the schema at once, but you create table by table, column by column in small change sets. And it actually integrates with your version control system, for instance, like Git. So it has support for merging different changes from different branches and resolving the conflicts that may occur during this merging. This tool is also, so it has a explicit change plan that allows you to write changes and always know in which direction, in which order these changes will be applied to the database. The process that you use with this tool is deploy, verify, deploy, reward, verify process. So you write the deployment scripts. You write the scripts to roll back your deployments in case something is wrong or in case you just want to reward your tested environment to the initial stage. And you also write the verify scripts and this is very interesting concept. So you write scripts to test your changes and I will show you, right, how these scripts work. So, and one of the ideas here is you run your change sets. You then deploy it on your testing environment, on your staging database, which might even run on your local laptop. Then you make sure that these changes actually work and then you put them into production. So how it works, it's very easy to convert your existing code into Sketch. You basically, so here I just initialize the new Git repository, but it's not necessary to be Git. Then I just said, okay, give me the engine postgres, give me the project name pgconf.us and give me the URI, which allows you to have actually multiple projects and have interdependencies between projects. So then it creates some files. It creates the configuration, which allows you to specify the connection screen for the database among other things. The sketch plan, which shows the order of execution of your change sets and these three directories that will contain the actual change sets. And so let's create our first change set here. I will add the application role that will control which role will run my application. And it creates three files for me and says that it added this change to the sketch plan. So this is how the change looks like. It's always inside the transaction because if it's inside the transaction, you can guarantee because of this atomic function of databases that you either apply it completely or nothing is applied. So this is a transaction and there there is a simple statement that just creates a role for me. And sketch is also adding some commands to the beginning of this file. It says which deployment is it. So the verify script is also quite easy. What we have to verify is this role that this role exists. And what we do is we just check that postgres has this role and we have usage rights on this role. So if we run this statement and the role is not there, then it will throw an error and sketch will actually work with you and process this error as a failed test and report that you cannot deploy your change because the test has failed. And this one has a role back at the end because sometimes we don't want the tests to affect our databases. And how do we deploy these things is very easy. You create the data, I created test database. I say deploy with a given database stream. This database stream can actually be hardcoded inside, not hardcoded, but added to the sketch conf. So we don't have to specify it all the time. And then it will just apply my change to database and says, okay, this change is applied. I can add another change. And in the second change, I say that this change depends on the first one. So I want to create a schema for my database and to create a schema. I want to have a user in place because I will grant the schema to the user. So I do this and this basically the same command, just minus r thing that says it's a dependency. And it adds these new files for the change. Each change has its own file. And it also adds a string to the sketch plan. So this is how the change with the dependency works looks like. It's basically the new string in the command that says it requires a pro. It requires the previous dependency. And you won't be able to deploy this script if you didn't deploy the app schema first. Now I want to notice, note that sketch does not try to resolve the dependencies for you. This is still your job. What it tries to do is to make sure that you don't deploy your dependency. That you have all the dependent exact change names, the timestamps and the author of these changes and the commands that you specified when you created your changes. And you can also see that for the app schema, it has a dependency in the brackets for the app row. So you can actually take a look at the plan or you can even edit this plan manually if you don't like how the changes are structured for you. And then a sketch will instantly catch a top end around the changes in your desired order. So sketch also supports development in branches. So you might actually have a branch for each of your database change, then do something in this branch and then might merge this branch back to the master branch. And the problem with the merge happens because the plan file is the single file in the project and every change adds something to this plan file. So if two different changes in two different branches modify your plan file then some of the changes, then during the merge file, you'll get a conflict and you'll get a conflict like this. So one change added the talk table to our PGCon project, another change added the speaker table and since both of them try to modify the plan file, the merge just failed and we have to do something to resolve it. But happily we have the support for resolving such conflicts in Git. This, you can specify what kind of merge you use in Git and you can say that you want a union merge and the union merge instructs Git to actually take non-matching lines from both files and combine them together. So at the end, you get a file which contains changes that works everything. Even your database is in an empty state because you applied some changes and then we got the changes. Everything is locked in SketchVoc so you can always see who applied the change to the broker application or you can always use it for audit purposes. So how does it do this? How does it track these changes? Sketch revise on the metadata that is in your database. There is a special schema that is called Sketch and in this schema there are tables to track the changes, to track the, so it's a very easy release cycle. You test it on staging, you bundle, you tag it, you bundle it and then you roll out your bundle on production environment and you get exactly the changes that you put on the testing environment but running on production. So one interesting feature that I also want to cover is rebuking. Usually when you create a new change in Sketch you do it in the separate file but sometimes it's not convenient. For instance, if you create a new version of your function and you only changed a couple of lines in this function you don't want a second copy of your function in your Git tree. You actually want to use, to leverage the tools you have in Git to see the history and to track your changes to the functions. You always want to see who changed what. Git is very good for this. So there is a concept on in place changes in Sketch and these changes, so instead of writing new files for the change the revoc command copies the existing files and then the new files are named with the tag that comes between the changes and this tag actually marks the original change and this leaves you free to edit the original file. So this is a little bit complicated but I want to show you it and is an example. Suppose I have a function that says set talk and that sets the information for this talk and I want to revoc this function so I said sketch revoc and what it does is adds the original file to the plan but it actually renames this file and the new name of this file contains a tag that I said before. So this tag distinguishes between the original file and the renamed one and the new files that I have with the new that I want that I will change are those files that are shown below. So this set talk in the deployer about and verify you are actually free to change them further and they will implicitly depend on the previous version of these changes which are marked with this tag. So if I do git status you will see this untracked files these are the files with original changes that were added by a sketch and there are also changes that are there are also changes in the scripts. So what this revoc also does is that it puts the script that deploys the changes as a script that rewards the changes. So that you can reward the new changes to this structure that was before to the structure of version 1.0 of your file. So if you want to revoke, to revoke your revoked changes sketch wants to make sure that you will revote it to the original version. And this actually only works for some parts of changes. It works very good for functions because you can do create replace functions multiple times and it will always bring you the version of the function that you want and probably won't work for things like alter table at column because you cannot do alter table, sorry, not alter table at column, but if you just put the new, if you add the column to the table in the table definition and by doing create table and you try to apply it over the existing table in the database it just won't work. So there are only certain types of changes that you can revoke and these changes should be important. You should be able to apply it multiple times. So this is sketch. It's quite a powerful tool. So the concepts are you have a deploy, verify and reward cycle and you can instantly bundle your changes and put it in the production and you can also do in place, in place modifications for some of the changes. The next tool that I want to cover is a little bit more simple than sketch. It's actually a very simple. It's called versioning and for me all the tools that I see for managing database changes has very awkward names. This is probably the most confusing name I have for the tool, what tool did you use? I use versioning, what, versioning? What kind of versioning? Yeah, but this is the name of this tool. It's written by Depeche. It's capitalized. It's capitalized, yes. It's written by Depeche, the hacker from Poland and the idea of this tool is that instead you try to isolate your changes in the simple transactions, simple atomic transactions. And so instead of making changes on development and finding differences between production and development, you just isolate your changes in atomic transactions and then roll out your changes on development and then roll out the same changes on transactions. So the concept is very similar to what we have in Sketch but as I said, it's much, much easier. So it has also the notion of changes as atomic files contain the transactions. It can track already applied files. It allows you to write rollback changes and it also can maintain dependencies for you. So you can always find out which change has to go before the, before which one. So how the patch looks in this versioning tool, it's very easy. You get the transaction and inside this transaction you has to register this patch and you have to supply the unique name for this patch. It should be unique among the databases. Then you get a change here, like alter table for instance, and then you commit this change. So once you do this, you can apply this patch to your database and you can see if you try to apply it second time, then the register patch function will catch that you already have this file in your database and will not allow you to do this. So every change can be applied only once and not more than once. And also has the notion of rollbacks. You can write your rollback diff and this rollback diff should cancel what you did in the forward diff. Like here we drop the table and instead of register patch, it does the code function that is called unregisterpatch. And this function removes the patch from the system table that tracks which patches are currently applied. So you can apply this patch for instance once again. So the rollback patches are especially recommended for some mission critical applications. If you're not sure that your change will work with your software and you want to quickly revert this change, then you just write the rollback patch and if something happens, you quickly apply it. So it has the concept of dependencies as I said. You can set in the register patch, you can supply the dependencies and it will make sure that the dependencies are already in the database before it tries to apply your page patch and it also, one patch can depend on multiple other patches. So it also have a concept of conflicts. I don't throw it here, but it works pretty much the same. You can say that a single patch conflicts with some other one and if some other one is applied to the database, then your patch will never be applied until you revert it. And this it, it's a very simple tool. It basically couple of screens written in PLPG SQL. It's very easy to audit. It's very easy to see what it does. It's very easy to modify if you like. And it has only one table in the database and the special schema, which has the name of each patch, the timestamp when it's applied, the name of the person who applied it and the requires and conflicts to track the dependencies which patches are required by the given one and which patches conflicts with the given one. So it's really very lightweight. You can really start using it without much preparations. Just roll out a single SQL file on your database and start writing your divs in this encode in transactions files in encode, patches encode in transactions. So these are very similar, but different tools. One is very simple and it doesn't say, it's very, I want to also mention some other tools. First is PG type, it's also written by theory. And this tool is designed to create unit tests inside your database. So if you saw the tests that we wrote here, these tests are pretty crude. So we use the functions that are supplied by Postgres in order to check that, for instance, the schema is there, but PG type is much more powerful to also get to see, for instance, if the table is there, if the table has the structure that you expect it to have, if the table has the proper defaults, if the table has foreign keys, if the function, for instance, that you defined has a certain string in the body, if the function is a volatile or security defined or and so on. So basically almost every feature that Postgres has is covered by one of the functions that you have in PG type. So it's very easy to write unit tests with PG type and it's also very easy to combine the sketch with PG type and write your full scale unit tests for your changes. The other honorable mention is the schema evolution manager made by Guild Group. It's similar to two tools that I already observed. It's a little bit different. For instance, it doesn't have an ocean of rollbacks because this guy said that if you drop a column that add a column is not a proper rollback because you already lost the data, but it may work, I may not work for you. It's written on Ruby. So if you like Ruby, you may check this out. Another tool I want to cover and to mention only briefly is the FDIF tool. As I said, all these tools have very awkward names. And this tool is designed to make sure that the modifications of your database functions stay in sync with what you have in your version control system. And basically it prevents... So you get a snapshot of your functions in your database and once you apply it to the new function, I can always check that the function that in the database after you applied it is the same that the function in your version control system. So it's basically designed for the environments when multiple people can modify your database concurrently at the same time and it might not be or might not be a very good idea, but if you have such an environment, you can check this FDIF tool. So this is a tooling and I want to show you the actual example of how do we use some of these tools in production and we end on the very... So we like the process that we have for our databases. Sorry. And this process consists of three parts. We have database diffs, basically the schema changes, which we use version info. And then we have the schema-based API deployment and we also have some tooling to make data migrations to make sure that one running transactions won't kill our databases. So first, how do we access the data? All our data access comes through these tool procedures and we use the Java layer, which codes the special middle layer, called SproKrapper. And this SproKrapper makes all the hard work of translating possible types to Java types and back and forth. And it also creates the proper function inside the database from Java's standpoint. You only write a certain class, so you give some decorators and then your function is magically translated to the function inside the database. This thing is open source. If you go to the GitHub for Zalanda, then you will find all these tools in SproKrapper as well. And all the Java layer talks to the Postgres layer, talks to the data layer, the stored procedures. We usually don't allow access via simple SQL statements, so we can control who actually access our databases, what operations are performed on the databases, so that we can always review the functions that we execute before they come into production. So our deployment procedures are very agile. We have a weekly deployment cycle for our code changes and this also covers the database changes. And this means that every week, basically, we receive sometimes tens and up to hundreds of changes to our databases. So we really want to track these changes. We really want to make sure that we know what we put inside our database, what schema changes we do, so that it won't break the production. Because as I said, we have to have a maintained 24-7 environment. And we use the Git to track our code and we also put the database files in Git as well and we put change sets in the Git as well. And we have the notion of release branches, so once we have a deployment, the deployment always goes from the release branch and this release branch is the branch that was tested and this testing and reviewed and tested and this review and testing includes the changes in the database. And as I said, we don't allow any downtime and we don't have any downtime during our deployments. So this is our Git structure looks like. Basically, our database files reside in the directory under the database and the data directory contains full definitions for our schema, for instance, such things as create table, the complete statements or create type or create function and also have the API and the notion of API. This is the functions that interact directly with the application code in Java. So it goes to the separate directory and we also have these dbdfs, the change sets that are produced by the versioning and these change sets are actually patches that we apply on top of our existing database schema that is running on production. So if you want to add a column in the database, you write the change set that says auto table at column and we put it into the dbdfs and then we have a special tooling that can find out which dbdfs we have to roll out depending on the current week number because we are from Germany, we use numbering for everywhere, for everything, including our releases and our releases are marked with weeks. So we find the proper week, we get the dbdfs from there and then we apply it first on staging and then on production. So yeah, I think I already covered this. I want to talk now about the database dfs, these change sets. We use the versioning system that I was talking before and this changes what is important. Our development team is very big. I mean, hundreds of people, more than 500 I think and we want to actually get it twice as big as currently but the number of database people are, database people are, we only have exclusive works and we have special tricks to do this. For instance, here is the dbdfs that actually runs the transaction that adds the column and adds the new column with the not now default value. Basically, if you do this in a single statement, alter table at column, not no default something, it will lock your table for the duration until it rewrites it completely and for big tables, it might take quite a long time and during this time, as I was throwing on the previous slides, everything that tries to access this table is blocked. So basically we can just use the normal alter table form in our diffs. So what we do is actually inside the transaction, we only add the column without the not now statement and we set a default value and this thing doesn't lead to table rewrite. This thing is very fast. Basically, it's just the change in the system catalog and all the heavyweight part that does the rewrite of the table that updates every row from the table, we put outside of the transaction and as a final stage after we made sure that all the rows are updated, we set it to not now. So at the end, the locks that are acquired on this table are very minimal. We only acquire one lock for the alter table inside the transaction, which is very small because it's a catalog change and the second one is we acquire the lock after the update statement and it's also quite fast because it doesn't involve rewriting of the table. So, and this process also runs for other diffs that has to change the schema and has to acquire heavyweight work. We try to avoid this. So we try to do not to avoid duplicating the definitions of our database files and for this Pseq can work with us because there is a notion of adding the data so inserting other scripts in Pseq. Basically here, we have the patch and in this patch, we just add the new table and what we could do is to just put create table inside this dbdif, but then we had to do it twice. We had to put create table inside this dbdif and we also had to put the create table in the directory which holds all our schema definitions in the data directory that. So basically, here in the data directory, we would also put the same create table statement. So we want to avoid the duplication so we just use the I command which inserts the full contents of the given file inside the database when we run this diff. So each dbdif proceeds, we proceed with dbdif with a certain, in a certain life cycle. First, it's tested locally with the feature developers. We have a background box and we have a virtual box that allows them to run postgres on their machines. Then afterwards, we run these things on the release staging and on the release staging, our QA teams goes there and tests the changes to the database as a part of the overall testing and after we make sure that on release staging everything works, the patch is applied to the patch staging, which is a special environment we use to apply hot fixes to the production before they actually apply to production and it's also applied to the live database. So generally, before it goes to live database, it's applied on at least three databases and reviewed by at least two database engineers. So we try really hard to structure the process in such a way that no diff is left unattended and no change is left unnoticed. And also have some tooling to actually review these diffs so you have a patch and you can put the comments here and you can say that either your patch is okay or it's marked is rejected and if it's rejected then the developer gets the email saying, okay, you have to work on this patch more. Here's the suggestions of how to fix it and then they fix it and then we review it once again. So this makes the process a little bit more streamlined. And we also have some naming conventions for our diffs. So as I said, this versioning tool is very simple, very easy, just one screen of functions. And what the power of this tool is that you can put some additional policies on top of this versioning and customize it for your specific process and for your specific company. So in our case, we use, for instance, Jira and for in the name of our deployment, in the name of our versioning scripts, we have the Jira version of the ticket that is, we can actually check before applying the diff that the ticket that is related to this diff is in the proper state. So it's not in the state of still working on it or it's not in the state QA didn't work on it anymore but it should be in the state like completed ready for production. And we also have some names of the databases inside our diffs that allows us to write tools to actually locate these diffs. So if I say something like, okay, so this is about the schemas but what about the functions? Since I said that we have all the data access gone through the functions, we also have to make sure that the functions are properly tracked and deployed. And for the functions, we use the trick that is called the search path. The search path is the feature of Postgres which allows you to switch the function that you are running instantly and in runtime. So here in the example, if the search path is set to first one then you'll get one function and to the second one you get another one and the function name is absolutely the same. So we use this trick in our deployment to switch between different versions of APIs. As I said, we have one API per week. So each week we deploy the new version of API and in order to select which APIs we use, we just switch the search path inside our application. Just one statement, very easy. And if something goes wrong, we can always switch it back. So we can instantly roll out the previous version of our database code or we can just test the new version of our database code before the production application will pick it up which is also quite powerful and we use it. And our API deployments look like this. So suppose we have two instances of the application running Java and they are trying to, so they connect to the Postgres database and initially they use the old version of the application and when the deployment, we just roll out the next version of stored functions, the next version of stored procedures basically by executing these SQLs against the database. And now we have the two versions of API, two versions of our stored procedures running concurrently in the database. But the application is only using the old one. The new one is just in the database. It's not used yet and the QA can test it and we can take a look and see that it's rolled out correctly and that's the version of the database is really the version that we want to. And afterwards, after we make sure that this is the proper version, we just switch one instance where the load balancer, so sorry, we are not the load balancer, we just switch one instance where the search path. So in one instance, we say, okay, search path to the new API. And this instance picks up the new API and transfers the new API. The old instance is still running with the old one and what is important that we maintain the compatibility in the schema between the old and the new version. So every dbd that we try to roll out has to make sure that it does not break the schema. It does not change the schema to be incompatible with the old API. We always say that the new changes to the schema has to be compatible with the still running API in the database. And by giving this, we can achieve this step. We can run multiple versions in parallel and actually use multiple versions in parallel. And once we tested this, once we made sure that this thing is working correctly, we can switch the whole application or other instances to the new API. And suddenly we run the new API and this deployment happened without any downtime and without any troubles from the database part. So the last part I want to cover is how do we bootstrap the databases? These are very easy since the databases are running in Git and we actually don't use something like plan files because we have very many changes and these changes are produced by a lot of developers concurrently. So we don't want them constantly resolve the conflicts when modifying the single plan file. So we just use this versioning and we write the changes in such a way that, so we number our database scripts basically, the most simple process. And when we try to roll out the database on staging or on production, we just use the find command to find our SQL files or them by the numbers in the files and then just apply them to production and do it in one transaction. So if everything happens, you still have the database that is in the same and in the consistent state. And by doing this, I mean by having these SQL files, we can actually make sure that the current code can always be rolled out on our testing database with a very simple command. So one thing that we also do is the data migrations and since we have tables that are multiple tens of gigabytes in size, we cannot just run update on these tables because this update will probably block other processes trying to modify the data in this table. So what we do instead is that we split updates in chunks and we have a special Python script that can run these updates chunk by chunk and it also can run them concurrently on different shards because we also use sharding. And what is important when you run the updates and what we will come across multiple times is that when you run a big transaction on your database, you may not only block the existing processes, but you also, for instance, may run out of storage for your right-hand walk and then your database will just crash or you may even bump up the load on the server so much that it will be unusable for every other process. So we have these things covered in our tool and we try to control the load and we try to control the wall storage and we also think that we have to run vacuums after a certain number of chunks because otherwise we will just lose this space and we run vacuums and this is also configured. So these tools are not yet open source but we are just trying to eliminate our internal dependencies on some other tools that we have open-source eventually. So the idea is that you use the select function to get you the IDs of the rows that you want to change and then the second statement just changes these rows and you can set the number of chunks that you use for this and you can also configure the vacuum. Actually, in 9.5, it will be very easy. You may roll out your own schema migration without such a tool because it has a new feature called skip locked which allows you to lock to select some rows for update from your table and skip the rows that are already locked. So you may combine this in a common table expression with an update statement to actually get your rows selected, then update these rows and then return the count of updated rows and run this command until you have zero updated rows which means that you updated all the table. So in 9.5, it will be much easier but so far we have to rely on our tools and some things that we only want to implement is the testing of our changes. So we use versioning. Versioning doesn't have an ocean of testing but we actually make changes in two different places. We write dbdfs for the changes and they also make changes in the schema files, for instance, in the create table. If we want to add a column, we write a dbdfs that adds the column and they also modify the table definition itself to add this column. So the idea is to combine these two sources and to test that the dbdfs actually put the database into the state that is in our version control system but so far we are not yet implemented this because it's, git is a little bit problematic on how to find the path in the git, how to find the branches in the git, I mean how to find path in the git from a certain commit to a certain commit. So we are working on the schema comparison tools that allows you to find all the difference between several environments which is quite important if you have a staging environment. You want to make sure that the staging environment is the same as a production and you can use something like PgTab for this or you can write your own tool to do this. And this is what we are doing. Yeah, and one thing that we also want to do is to handle the dependencies between the schema changes and the migrations. If you want to add a column in the table and then populate a column with a big update statement then you don't want to fire update statement before the column is added. So this is also something in the future and we are constantly updating our open source tools and we are constantly, we are trying to present about them in the conferences so stay tuned. We are going to roll out new things. Yes. So now do you have any questions to me? Yes? Can you go back to the slide with the links to the tools? Yeah, actually those are links to the, yeah, here. This one. Yeah, I will put these slides. To the, somewhere, I mean in Postgres Wiki for instance and I will also add the links to some other talks that covers the same topic. So I want to have some comprehensive repository of these practices so that you can choose between them and whatever you want. Yes? We also forgot to mention is that we are not only checking the, reviewing all the changes that developers write, but we also provide the training for developers that they know programs that can happen when they... Yeah, yeah, we do this. Yes, when the developers come aboard, we make sure that they can write SQL, they know what Postgres is, they know how Postgres works and they are aware of our infrastructure and this is part of the developer training so we want to trust our developers. We don't want to be a bottleneck in the development team. We just want developers to do everything by and themselves so we can focus on other tasks like helping the community for instance or writing open source tools. Yes, this is a good addition. Yes? How do you check the performance of the queries? So how do we check the performance of the queries? This is a very good question. First of all, we have this, sorry, that was too fast. Yeah, we have this PG View tool that allows you to view the currently running queries and for how long are they running and also some other metrics. I think it's somewhere in the beginning. Yes, yes, here. Yeah, it's also open source in our GitHub and when it comes to testing, we just, we usually test, try to test big data, select big statements that extract data on the same data set that we use in production so we try to, we don't have some profiling tools. I mean, there is a profiling tools for PLPG SQL but we only recently started using it so it's not very, very widespread in our company but we try to provide developers with the data sets so we periodically get the data from the production and populate the staging environment with it, upfuscating it a little bit because of the regulations and then the developers can test their queries on the real data sets and this is how we make sure that the queries perform adequately. And we also have this PLPG SQL provider as I mentioned. Okay, any more questions? Okay, then I will be here and if you have any questions about our process, about these tools or if you have any suggestions on these, on other tools that you use, then I will be happy to talk to you afterwards. So thank you very much. Thank you.