 I'm a software developer in a business pro company and I'm mostly interested in multi-manual things in PostgreSQL and this is my colleague who understands here and I'm very interested in indexes and data structures and in this talk we would like to tell you a few things about compression in PostgreSQL I'm here to show that everyone here understands the importance of this type of compression. We work more and more daily like everyday, so in this talk I'll try to explain in a natural perspective how PostgreSQL stores your data, then we'll list a number of things you can check in the database schema to ensure that you stack it optimally. There's a lot of tricks. Then we'll introduce a new mechanism of page-level compression, which is a feature of PostgreSQL Enterprise Edition. It's a cork supported by our company and finally Alexander will present the extension, who adjusts the compression, which is open source and for the whole thing. The compression is a too big subject to cover in just one hour talk, so there is a list of things we won't discuss in these days, but they are still worth mentioning. First of all, you can find a lot of great books and articles about how to work in tuning about NVCC, I think the next talk will be devoted to this topic. Another thing to know is that PostgreSQL catalog also can be loaded. For example, if you use a lot of temporary tables or actively change schema in some other way, there will be a lot of copies in your catalog table. Our catalog can also take a lot of space, but it's a kind of deriving from your data, less data you have, less will be the lower access. Only if you are not one of those risky guys who don't care about data safety and use unlocked tables, in some situations they can be really useful. And since 9.5, there is an option to enable roll compression, surprisingly there are still many people who are not familiar with it, please check it in the documentation. And also you can use file systems that support compression, but that's not the topic. Okay, let's look at how postgres stores data. It has roll-oriented storage, physical representation of the roll is called tuple. Tuple consists of behavior and attributes. Tuples are placed at pages, which in their turn form the files. That is the format for all tables, indexes and materialized view, which are existing postgres. So it's clear that meta-information takes some space, but how much exactly, maybe it's not that dramatic. Let's estimate it. Imagine you have no data, except meta-information. Fortunately, postgres allows you to create tables without any interviews at all, just some meta-information. Let's insert 10 million rolls into the table and check the size. Any guesses how big it is? Just out of curiosity, who thinks it will be about the zero, this rising kind? And who thinks it will be less than, say, 100 megabytes? Anyone who thinks it will be more than 100 megabytes? Okay, so it's not even close to the empty. It takes 268 megabytes of disk space. So, of course, it's just a synthetic example and has nothing to do with real databases, but still I think it's good to keep in mind that meta-information is not cheap, it's not free. So, let's look at what exactly takes this space. Using page-in-spec extension, we can look at hidden meta-fields. So there is info to support transactions, a couple of information that takes 24 bytes per roll. It seems to be excessive for some workloads, but database is not only about storing data, it's about providing fast and optimal access to it, so all this stuff is actually necessary for that. Okay, another thing that is good to know when you're interested in making small things is alignment of the data. As you might know, integer and Boolean, for instance, they consume different amount of disk space, and the data in progress scale is aligned. And how exactly it depends on the type. For instance, every integer is aligned. If address could be divided by 4 in case of integer, and if it's divided by 8 in case of integer, and as this slide shows, just changing the order of attributes in our table, we can save some disk space just because otherwise it will be consumed for it alive. Question? Is it because post-guar data types are native? I'm sorry? Is this because post-guar data types are native? That's why this kind of benefit you're getting? This is because you see a big integer should be aligned, it should start with address which could be divided by 8. And 4 bytes, just white ones, will just be not used if the beginning comes right after integer. So it's just how post-guar works. And another interesting thing you can find now, in previous slides, we showed doubles, and doubles have white header. And the size of header is 83 bytes, but the data after the header is also aligned. This is why double-killer actually used 24 bytes of disk space. But right after the double-killer, there is a bitmask that, well, it stores kids. This bitmask shows whether a corresponding table is now or not. If it's true, it's now, if it's false, it is not now. And this bitmask comes right after the header, and if you have 8 nulls or less, you see it comes here and bitmask right after it. And because of the alignment, it doesn't matter if you have these 8 nulls or not, it will be aligned to 84 bytes anyway. So you can have up to 8 nulls just for free. And naturally, if you have 9 nulls, it doesn't matter whether you have 9 nulls or 10 or 11, you will, one byte will be allocated anyway. Sorry about alignment. This address, in order to assess index double-killer, also must be aligned to 8 bytes. This rule has one pretty non-intuitive conclusion. If you have an index of 1 integer column, if you have multi-column index of 2 integers, or you have one big integer index, they all have the same size. So probably you shouldn't be afraid of big integers, and probably you shouldn't be afraid of multi-column indexes too. And that's a place where another feature of poster scale may come in handy, such called covering indexes. So if your index cannot be smaller, for example, its primary key of 1 integer field, it probably can keep more data. Covering index allows you to add any non-key column to the primary key index. So it's just another instrument to build better schema. As any other instrument, it has its points. So, for example, as any other index, it's not very optimal for having writeable or a lot of updates, so use it wisely. Now it's available in our fork called poster scale, and hopefully it will be available in poster scale 9. I hope it will be commuted. Another good thing to know is that poster scale has a different database. And this example on this slide, it's actually a real-world story. We hope that one of our customers to migrate to poster scale from some other instrument system, and turn it out that after immigration database become much bigger than it used to be. And our investigation showed that it's because this, in this database, were used to give you IDs. And after migrating, they were stored in byte A data type, which is not the best idea for storing new IDs. And by using your special data type or your IDs, we can significantly decrease the database size. And same applies for various other data types, for instance, like much other data type. Here is another point of view. There are different types in poster scale for storing time and base and time spans. Recently I discovered that type, which is called timeGZ, it is stored the following way. The 64-bit integer is used to store a timestamp, and another 32-bit integer is used to store a time zone. And, well, it's quite a lot of displays to store just a time. There is another type, a timestamp with a time zone, which is just a 64-bit integer. And if not, intuitively, that timestamp, which is time with date, uses less space than just a time. But it's how poster scale works. And if you're interested, you can check it. Just integrate a table with one column with type or timestamp gz, insert some data with different time zones, and then select it. You will see that the time zone has changed. It's, poster scale will use your time zone, but it's just not stored in the database. Well, poster scale has closed. It's a mechanism which, well, basically it splits your data in small chunks. It's because, as was shown on previous slides, there are pages, and the page size is limited. It's 8 kilobytes by people. And what to do if your data is much larger, for instance, 1 kilobyte. In this case, if you close the mechanism, which is basically updating chunks, it can store these chunks in a special post today as well. And the reason why I'm talking about post right now is because it has a built-in compression called the algorithm that's called the visual thing, which is more or less the same as the leap in terms of speed. So it's, if I'm mistaken, it's a bit slower and compresses a bit, not as good as the leap, but it's more or less the same. And how the JLC determines whether your data should be compressed or not, for instance, if you're storing, I don't know, compressed images, it makes less, it doesn't make much sense to compress it once more time. So PostgreSQL tries to compress those kilobytes if I'm mistaken, but first buys off your data, and if it compresses well, then it compresses all the data. And, well, if you have large streams, for instance, large data, which is compresses 12, you basically can do nothing and PostgreSQL will compress it for you. In this, you're storing some binary data, which is compresses not good in the beginning and compresses well in the end. You can actually consider reordering. So data which compresses well will be in the beginning, otherwise it will be just more fun. So one more thing to say about Post is that it's triggered only if your attribute is more than two kilobytes, if I'm not mistaken. So if you really have a number of rows which are about one kilobyte in size, it won't be triggered. That's why one more thing to do is probably take data into arrays, into bigger chunks to trigger those. Of course, if it should be your local. So, physically, what we want to say is that you should know your data and know your database in order to store it in an optimal way. Of course, there are a lot more trees than we have listed and experienced DVAs should know them, but we think that that can be a little bit simpler. And that's why we suggest CFS. CFS is Aggravation for Compressed File System. It's a feature of Post-Respondent Device Edition, developed by our colleague, good city commissioner. And I think it's a good time to ask if anyone uses Postgres on Compressed File System. Anyone? Anyone? And which file system? CFS. Yeah, it works well. We mostly use it for all envelopes, you know, data warehouse type of stuff. It's not as good for transactional systems. Of course, it's the right penalty, but for data warehouse, it works really well. Okay, and then, guys, are you using 3BSD? Yes. It's not. In this case, I'm not at the end. I'm on two legs. Okay. Somehow, there was a number of proposals in PostgreSQL factors, and the reason to reject them was, why don't you run Postgres on Compressed? I think we can do that. The main idea of CFS is to apply compression to the page when it's addicted from buffer cache to this, and the compress will be slowed to share buffer cache. Since after compression, pages have variable size, we need the additional level of metam, which is implemented in CFM 4.0. The relation is quite substantial in the picture. CFS always writes pages at the end of the file if they are changed. It allows to keep writes sequential and keep your reads actually sequential. And also, it makes it possible to implement the extension of the file in a block-less manner. To remove outdated pages, CFS has its own garbage collector. There is a set of background workers which review relations by a segment, copy and delete pages through the new segment, and then with minimal lock and switch them and remove all signals. Now compression can be applied at table space level, so if you want to use CFS, just create a table space with compression enable and so forth. You can set the table space as default if you will want to, and just upgrade tables as before. Except for one more level of collection, there are background workers and also they can be called manually just like vacuum and ventilator. Let's look at the numbers. On pigeon-inch test, data loading is about the time slower because that's heavy workload and it is expected result because each data is not compressed before writing it for the disk. What is much better and much more aggressive is 18 times more. In the next slides, I'll explain such unbelievable result. And visual integration is 5% better in terms of TPS. I would say it's about the same. So, no dramatic decrease, nothing absolutely unbelievable in integration. So, now try synthetic benchmarks. We tested compression algorithms on a dataset generated by pigeon-inch and size reduction is extremely unbelievable. The reason is that pigeon-inch tables has a lot of long zero-ed strings and the column filler, as you can see, it contains... We were told there is no microphone. Well, yeah, we need to, I'm sorry. Yeah, we do. Okay, hopefully it works. So, the reason why database was compressed 18 times is that column filler, which is a 84 bytes character, it contains not just empty string, but actually 80... 84 zeros. So, it can be compressed really well. And the reason why I put this slide here is that you always should try any new feature on your data and decide whether it's used for your load. So, we also compared various compression algorithms on the same dummy dataset, but it's the same for all algorithms, so it's okay to compare them. You can see results at the slide and, by the way, notice that Postgres internal compression algorithm is not that bad, but it can be improved. So, I think that hackers can think a little bit about that. And while it's easy to make compression algorithms plugable, we decided to choose only one and leave just ZSD as default compression algorithm as providing best compression ratio and acceptable speed. Some more benchmarks on real data. You can see that after applying compression your usage significantly decreased. One more point to mention is that now Postgres uses double bufferization. It means that pages evicted from shared memory are not going straight to disk, but to operation system page cache. And in most cases, that is a disadvantage. But for CFS, it happens to be a good thing. In some cases, compressed data can fit into operation system cache and thus avoid a lot of costly disk operations. And also we can look at CPU graph. There is no dramatic CPU overhead for compression, which is quite good. Numbers are about the same as before because each page will be decompressed only once when it is loaded to buffer cache. And that's not that often. Just like any other approach, CFS has its advantages and disadvantages. It provides good compression ratio because all information on the page is compressed, not including metadata, including some empty spaces. It provides better locality because pages are always written sequentially. And what is good, it's not that hard to support because it requires minimal changes in PostgreSQL core. We have tried a couple of different solutions. Like we tried to compress each tuple. We tried dictionary compression, but it seems to be too difficult to support. And also it provides flexibility. It's easy to use various compression algorithms. Why for now we use only ZSD? Of course, CFS has some limitations. Buffer cache pages are not compressed, which is not actually a problem. It's just how it works because there is no CPU overhead for cache data. Next issue is wall and replication. Now it's not optimized at all. Replica has to perform compression and garbage collection and all the things related to CFS on its own. And also one more level of mapping is one more level of fragmentation. So CFS needs its own garbage collector which must be tuned and one more thing to worry about for DBA. But still we think that's a good solution. So please, if you want to try it or discuss, I will be glad to answer questions at the end of the presentation. Test, test. Okay, I would like to tell you about this one. It's an extension, PostgreSQL extension, which is open source and it's on GitHub. It works with vanilla PostgreSQL. Feel free to use or just try it. And basically it compresses JSON-B. And originally it was developed for one of our customers and developers of this company for some reason they didn't want to think about how they represent their data. They just wanted to use JSON-B and that it will be compact. So the idea is quite simple. Just let's take a JSON-B, find strings in JSON-B, which are used frequently. And replace the strings to 16-bit codes using a dictionary. And I will not show a lot of benchmarks because, well, this one is not a silver bullet. You have to understand what you are doing. But in some cases it improves performance. For instance, if your data is, it couldn't be stored entirely in memory and you have to go to the disk all the time. In this case, the son works well. In case if all your data fits into the memory, it doesn't work well because the data is stored in memory compressed as well and the son just decompresses it all the time. So just you have to understand what you are doing. So let's take a look on how JSON-B looks like on the disk. It's a part of, it's a segment of the table. You can see that there are a lot of strings that are repeated all the time. If JSON documents are similar, for instance, there are similar keys in each document. There are also a lot of zeros, which seems like something that could be compressed. Well, so here are the problems, redundancy, usage of disk space, usage of memory, and as a result, there is input and output, disk input and output are not great and we have less transactions per second comparing with what we could have So actually there were like three implementation of the son before we found an approach that actually works and I will not tell you all the details right now. You can ask me later in the whole. But here is how the son works now. It creates a dictionary of strings, replaces strings in original documents to 16-bit codes and after that, data is compressed using pjlz as usual. So, well, it probably will not work worse than it used to be before. Before using the son, you have to learn it on your common data. You can use JSON-learn procedure for this purpose. It's quite simple. You just give names of tables and columns and it looks on your data and you create a dictionary. You can call it a core of how the son learns of your data. It's not that complicated. Basically it finds all the strings in documents. I mean, by all strings, I mean keys, values, array elements, et cetera, all the strings. Here is how the son works. I mean, it implicitly... Well, JSON-b can be implicitly converted to the son and vice versa. And that means you can use all the usual operators, everything that works with JSON-b works with the son as well. The son is just a wrapper which compresses and decompresses data. It makes nothing more. Here is how the compressor data is stored on disk. There is one byte for the son version in case the son implementation will be changed somehow in the future. Then comes one integer with version of the dictionary because the son supports multiple versions of dictionary. You can really learn the son when your data changes. Size of the original data. And then comes so-called hint. It's just a sequence of zeros which it needed for PGLZ to show that, hey, look, this data compresses well. You should probably try to compress it. And by default, it's... If I'm not mistaken, it's 32 bytes, but you can safely change it in the code. If you define PGLZ in size as zero, it will work as well. And yes, as I told you, I will not show any benchmarks because it's very... It depends on your data, but we did some tests on data that our clients had. And in this case, we gained like 10% more transactions per second. And on GitHub, you can find a benchmark which describes it in more detail. And it's basically it. If you have any questions, we will be happy to answer it, please. The son is an extension. It's not in the country, but it's not in the core currently, at least. But it's a usual PostgreSQL extension. You can download it, say, by making style, then create extension, and it just works. Please. Is the page compression, is that done entirely in the SMDR? Yes, it's implemented mostly on this layer. There's some more stuff required for garbage collection system because it requires some locking and other things, but mostly it's in SMDR and MD layer. So in regards to the page compression, so your comment about it was essentially that we can do better than, you know, file system encryption, like CFS or... So what are the advantages that we're doing in this way specifically? Sorry, I didn't get the question. To do the compression in... I don't know, actually, for some reasons, not all of our DBAs in our company, not all of them are trust to compressed file systems, maybe because none of them are familiar with them, maybe for some other reasons, but somehow not many DBAs want to use them. That's why we decided to implement it in the Postgres. Actually, as I see it, just customer has Linux and X4 on it and the customer doesn't want to migrate somewhere. He just wants to... My data should be compressed. That's all. I have a couple of questions. Do you have any benchmarks of CFS versus, say, ZFS on Linux or FreeBSD or anything like that? And my other question is about the JSON stuff. Where is the dictionary stored? Okay, I can answer on the question regarding the JSON. It stores a dictionary in a regular table. It's called the JSON dict, and it's quite similar. It's like dict ID, word ID and the word. It's just three columns. Nothing special, and it's cashed in memory. It's a bit more complicated, but basically it's a search tree in a memory. Now you're basically... You've got a dictionary for all the strings in JSON, but the JSON uses the most frequently used strings. You can specify how many and add strings after learning. So, by default, it uses at most 64,000 soft strings because codes are 60-bit integers. Okay, regarding CFS benchmarks, we haven't... we didn't them, but I think it's a good idea to do, to have something to compare with. Tests on real data, on various data, shows compression for about five times, up to five times. On some cases, it's more if the data is... can be compressed well if there are some JSONs, if there are some frequently repeated fields. And sometimes it's about three times if data is normalized and not that well compressed. If you add page-based compression, you add a record to a page that makes it larger. How's that handled? If something on the page was changed, the page will be compressed once again and written into the end of the file. So, there is just sequential writing and outdated page will be evicted by garbage collection afterward. Then none of the pages will be... Yes. Okay, it's a panel. A panel? Yeah. Yeah. So, it doesn't work really well on heavy update workload. But, for example, archive tables, it provides really good results. Any other questions? Thank you for your attention.