 Okay, I think I better start, it's time. So my name is Eliakas Nivansky, I'm working for PostgreSQL Consulting, and that will be probably one of the last talks about PostgreSQL today, the number one is in our room. And this is actually not about the PostgreSQL itself because these recommendations can be easily applicable to another database like Oracle, DB2, or even MySQL, but more or less. And in this checklist, explain checklist, there will be PostgreSQL-specific things. And I try to cover the most important topics of Linux tuning to improve PostgreSQL performance because it's actually a quite huge thing, and it's about a full three-hour tutorial. In the talk format, I can cover only several things, but they are actually very important if you are trying to work with PostgreSQL on Linux. Hopefully, a lot of Oracle DBAs, they say that to administer PostgreSQL on Linux, it's not the kind of the gentleman work because you need to be 20% database administrator and 80% Linux administrator. They think that, okay, Oracle is well integrated in the operating system and can do everything by itself, and the database administrator can be only very high-flying person, which thinks about how to optimize some query. Yes, in Postgres, that's not so easy, at least yet. And you need to know something about how Linux works with such specific workloads like a database. So, what is all about? The modern Linux kernel really improved very fast, especially last two or three years, probably if you are at this conference, you know this already. And there are a lot of settings which you can change and those settings can change the behavior of your database and sometimes they can change it very drastically. What I mean is that if you put your database through PostgreSQL on Linux, probably even with not-so-heavy workloads, you will face some performance issues because some default settings or even start U-turn settings can seriously affect your performance. So, that is one thing, and another thing, if you really have a very heavy workload, for example, quite successful website or any other application, you definitely need to tune Linux to take all benefits from the new versions of Linux kernel. Basically, then we are talking about tuning something in Linux. We have such concepts as tuning targets. It's easy. If you have some CPU intensive tasks, you can do something with Linux scheduler and probably you will win. Or if you have some memory intensive tasks, you can tune something with memory usage swap or something like that. But the database is quite a complex thing and actually it's quite impossible to change one small thing in Linux and see that, okay, our database performs quite fast after it. All these targets should be tightly connected and well-balanced. For example, if you have to tune your database for better higher performance, you need to match the hardware you have. And if you, for example, have a lot of memory on such installation, you can tune the memory targets. But if you have bad disks underneath, that probably cannot help in any way because the bottleneck moves to the disk. So with the database, the better idea is to use something like throughput approach. For PostgreSQL or like any other database, the additional problem to well-balance in all these targets is that if any database is actually very hungry for resources, one can try to run a database with a web server or some Java application, for example, on one machine and actually it brings a lot of concurrency and usually you wish to use the separate server for a database system. And like I said previously, we need to think about so-called throughput approach. What is all about? Your PostgreSQL processes are working with user transactions and the most intensive load on your server is actually the buffer cache. So then your database reads some data from the disk. It puts the same block, normally it's 8K size if you do not recompile your PostgreSQL for some specific need. And PostgreSQL uses operating system kernel buffer cache and it maintains its own buffer cache. So actually your pages which you read from this are internal buffer and in the Postgre server buffers. And on modern installations it can easily be 46 gigabytes of server buffers and even more. Such machines today are quite cheap, relatively cheap I would say, and there are a lot of such installations. And the basic idea behind the improved PostgreSQL performance is to maximize the speed PostgreSQL can read this page from kernel buffer and shared buffers to deliver it to end user or the throughput how fast PostgreSQL can actually write it to disk. And of course the second topic is quite painful and you need to tune all these layers, all these steps and balance them quite well to have a good performance. And the second thing is PostgreSQL actually has write-head log. I think most of you know what it is, Oracle calls it 3D log and that's basically the log of transaction. And PostgreSQL actually needs to write it to through operating system. So there are plenty of bottlenecks which can be existed on any level. So we need to look at the tuning charges to improve this page travel time. This is the small part of the problem actually but probably the most important one. So to make this travel more efficient we need effective memory, effective disk throughput and of course everything should be on the proper hardware because if you have some laptop style machine probably you can do some things quite fast but it can be not so secure like on proper server and still you cannot achieve such performance like if you have a proper rate controller and proper disk under this rate controller. So it's impossible to switch some magic button in Linux to make PostgreSQL run very fast if hardware doesn't match it. So what does it mean more effective work with memory? Let's not do your member thing. That's a more complex thing. The first is in Yuma, the second is huge pages and the third is working with Swap. It's relatively simple but we actually have a lot of things that can confuse the database administrator and to drive him to the wrong direction. So the Yuma. Yuma usually can be not proper set it up and it looks like some CPU suddenly is very overloaded and it looks like this CPU is there on the CPU which works with a huge amount of memory and the performance of Postgres drops drastically. What is actually going on here? Yuma is non-uniform memory access. So I have some theme about this. For example we have two CPUs and our processor architecture allows us to have CPU and its own block of memory and these super blocks are interconnected. That's a usual approach in the modern processor architecture. So if Yuma is enabled, that means that operating system Linux in our case actually knew what is going on underneath at the hardware physical architecture and if the CPU allocates some pages on its own local memory and operating system thinks that this CPU needs more memory, it allocates it through interconnect from another piece of memory and our clever Yuma-aware system knows that the access time to this and that memory is not the equal. So the operating system actually can an application to optimize being aware of this non-equal access time. So basically the Yuma then it worked is a good thing and it's actually designed to improve performance. But this is the first thing that can be confusing using with force spread. So what can actually happen if we have such application like a database server which has a huge amount of shared memory? Usually this ends up with one memory and CPU not actually works with all these shared buffers. And this is a huge problem. So force spread is quite a better idea if you have Yuma support in your processor to disable this Yuma and switch it off. That's not a very clear terminology here because these enable, disable, switch off and switch it on actually very often confuse the Linux and database engineers. So what actually you need to do and how it can affect your performance? If you, for example, have some software like ES6 to VMware or something like that, it can benefit if you leave the settings with Yuma like default one. I mean the Yuma is disabled. That's okay for it. That means the Yuma works actually. If memory interleaving, I'm sorry, is disabled. Disabled memory interleaving in BIOS actually means that your Yuma is up and running and it's working with this approach then your operating system is aware that you have non-uniform access times to your memory nodes. For Postgres, it's a bad idea because it can run into this situation then all shared buffers are on the one CPU. So for Postgres, it can be a bit non-trivial but you need to enable memory interleaving so effectively you need to switch Yuma off. And for Postgres, it actually means that you have quite a better performance. You can do that in different ways. First, the simplest one is to enable memory interleaving in BIOS or just switch Yuma off in the kernel for switching it off during startup time. Another possibility is to use Sysetail and SetsDon'tReclaimMode to zero. So you can follow this post of one of Postgres Go committer robot tasks that he described the problems with Yuma and this post was in the time then Linux kernel introduced this possibility to switch ReclaimMode to zero. So some information about these you can find by this address. Another thing is huge pages and actually you can diagnose that you have problems with huge pages using something like SystemTap or Perf but you definitely do not need these to see if you have the problems with huge pages because if you have enough large sharded folders I mean 32 gigabytes or more or something like that you definitely need to use huge pages. So what is all about? Operating system allocates memory in small chunks. That's quite effective for small memory allocation for random memory allocation but a database is the thing that needs a huge sharded bar per speed and in that case it's more effective to allocate it in not so small pieces but in huge chunks mostly because of a way like operating system works. The operating system needs to translate virtual memory addresses to the physical memory address and this procedure is not free and operating system tries to put the results of this operation in the special cache which calls Translation Lukasite Buffer or TLB and the TLB is the cache and it has all the problems which cache actually can have. That means if your cache is quite huge it will be cache misses and it will be huge. You will have an overhead in your memory just to maintain this structure in memory. So the approach is quite straightforward you need to use the huge pages which are supported in Linux kernel for quite a long time now and first step you need to enable these huge pages in Linux kernel and then you need to make your postgres to use that and that can be tricky actually because the application should be aware of if it can use huge pages. So how you can do that with postgres cloud? Previously before the version 9.3 you can use the huge TLBFS library after you enable the huge pages in Linux kernel and tell Linux how many huge pages you want to use with your database installation. For example, if you use a dedicated postgres cloud server which means the only database runs on it you can think that you will need about 25% of total memory for shared buffers and you need to be sure that kernel buffer and shared buffers can reside in this memory plus for example 10 or 20% thinking about that you actually choose the exact value for huge pages. In 9.3 things improve to be in terms of some aspects of postgres cloud performance. For example, now you do not need to use some system kernel parameters to increase shared memory because since 9.3, postgres cloud used a map instead of system 5 way of allocation shared memory. But effectively these things lead to the huge pages support and postgres 9.3 was broken because moving to a map actually disables the possibility to use huge database library to manage huge pages for the postgres and there was no such tool to make postgres using huge pages. But in 9.4 and 9.5 actually the things change and the postgres has a specific parameter in postgres code which you can use in this parameter you can enable huge pages disable or set it to try mode. Try mode is the safest one because if postgres starts and can allocate shared buffers from huge pages it actually do this. And if not it allocates shared buffers from normal pages. It's the safest way because if you just force it to on postgres can not start because huge pages are not available for example something like that. And this is a good thing if you have some installation which uses not only Linux but for example FreeBSD and on FreeBSD postgres scale cannot work with huge pages in that way so off for FreeBSD is quite okay because this mechanism is not supported on FreeBSD. The important thing is also to disable transparent huge pages. That's not must do actually but most likely you will need to do that. What is the problem with transparent huge pages? That's a quite clever mechanism. Actually all you need to know about transparent huge pages in terms of postgres is a bit more smart mechanism of managing huge pages. So Linux can manage them more smart. That means if it can merge two huge pages into one if they feel it's not to the top it do that and for many applications that can be a benefit and if you have a lot of memory on your server it also can be a huge benefit because the memory allocation will be much more effective. But with postgres there were some problems with drastically degrees off performance than transparent huge pages were enabled. Most likely you will not benefit from transparent huge pages if you have not terabytes of memory on your machine but the possibility that your performance dropped if you have some bug in Linux kernel which was actually quite common with huge pages with a lot of postgres in older kernels like 3.4, 3.6 it can be a problem for you. So you can experiment with that and see if on your particular workload on your particular installation you can benefit from huge pages but the safest way is just to switch them off and work with normal huge pages. What's wrong with swap? Previously we can recommend to people to switch it off totally on the database server mostly because you have a lot of memory on machine and what's the purpose of swap? But on the modern kernels it's not so straightforward so if you have a lot of memory you still can see that some processes are swapping and if your database server isn't swapped that means it doesn't work properly and everything will be very very slow. So what is actually the problem? What is actually the symptom? For example you have 64 gigabytes of memory on your machine and you can see that only half of it is used and still the swap is active and something is in swap. That's not the behavior you expect to see. So what is the better solution for postgres? Now if you set VM swappiness to zero that means if a home killer comes it actually can kill the postgres file server if it runs out of memory. That's a bad thing because from one point of view you use your memory quite effective you never can go to swap and everything is fast but it's quite unsafe because if you go to swap Linux actually will kill your postgres code process and that's bad. So on the modern kernels there is a slightly better idea to set this parameter to something like one or two or some low number. And by default I believe it's about six or seven. That actually regulates how swap can be used and influences the estimate, the schedule of things, the obligation will consume the memory. So if you have something like one or two in this setting that means your operating system will use most of the memory and only then goes to the swap. If you, for example, set to ten the operating system will use your swap quite faster than you expect. So never switch it to zero but you do not need a swap so frequently on the database server so do not set this one to very high numbers. The second step of this page traveling challenge is actually the write performance and for database it's the essential bottleneck. So most of the problems with database performance are usually the checkpoint problems, the write-hat-walk-syncing problems and overall the write performance. We call it normally checkpoint spike. What is actually, what is actually is. Then PostgreSQL runs on some heavy workload. You can see from time to time the huge IO utilization on the database device which means here is the last column of IO stats I put on the graph and you see that from time to time it's near 100% this huge spike. And the second thing is the PostgreSQL system with your Fidget-Start-Digi writer. This view shows how many pages are written by process checkpoints which performs syncing of the pages written to the write-hat-log to the blog storage. How it works. I just can return to this thing with a throughput approach. Then some pages turn to be dirty, the blue pages. That means that if you change something running Update in SQL and change one single row which resides inside some red page it becomes blue one. That means the whole page is marked as dirty one. And that means that the information we need to recover it to previous state or to redo it to the current state we write to write-hat-log and the pages on disk and these dirty pages are inconsistent. So you need information from write-hat-log the old version of this page, the clean one to put it together and to get the new version. That's basically how write-hat-log works. And the synchronizing of these one performs mostly the special dedicated process Checkpointer. The Checkpointer process starts flushing the pages to the disk. It actually takes all pages which are dirty insured barfers and send them through the kernel barfer to IOSystem. And at this moment you have a lot of S-syncs in your system and you have so-called Checkpoint spike. This is the thing on the blue graph which really effectively stops your database performance because the new database is running on 90% utilized disk. Everything will be very, very slow. And these things actually happen periodically because your Checkpointer process comes and starts flushing pages to the disk. So if you correlate to graphics the IOS and these pages start with the right system view on Postgres you can easily see that definitely your problem is not that periodically someone calls backup or some users try to update something in huge amounts. The problem is that you have definitely to create the web. You have a lot of barfers written by Checkpointer and you have huge disk utilization. And at this point you need to change several settings on your hardware in your Linux and in PostgreSQL to make this not so painful. What is going on actually if you have default configuration of Linux? By default for example in Debian these settings due to ratio and due to background ratio are quite crazy. These settings are percent and that presents from the total amount of memory on your server which operating system can see. That means for example you have 128 gigabytes of memory in your machine. You're flushing from barfers cache of Linux kernel. Only start then you have at least 10% of that figure filled with dirty pages. That means before you reach the 12 gigabytes of dirty pages nothing will happen. If you take a look on modern fast rate controllers 12 gigabytes cache on modern rate controller is still a fantastic one. So 2 gigabytes that's good, that's well enough. If you have no cache on your rate controller that's even worse a problem because your operating system actually do nothing but from time to time it receives instruction to put a lot of information to disk and at this moment everything actually has. The first step is to change these settings. The good idea is to set it in byte to match your disk subsystem throughput. So you can just change it to another setting dirty background bytes and then dirty bytes. These exact figures are for rate controller which has half gigabytes of battery backed cache. You can easily calculate them for your own rate controller. And if you actually have no rate controller these figures should be even smaller because then you have no rate controller or you have no SSDs with a supercapacitor which can handle its open cache on SSD. That means that your throughput is quite smaller and these crazy default settings which are common in most Linux will decrease your performance drastically. Then I think you actually need to set up your hardware properly. For example, you cannot benefit from cache on your rate controller if you have no battery. That's quite a funny thing but then you order your rate controller usually the battery is the optional thing. So one endeavor controller, I mean mega-rate or the self-clone they support a bit better backed cache but double check if the battery is installed. If it is not, the rate controller by default do not use this cache because it's unsafe. It tries to protect you from shooting through your lens. So I have actually separate talk about setting up the hardware part and you can check this link on YouTube with detailed instructions and actually you can see if it's set up properly on your rate controller. These IO issues, they are quite important and actually a lot of troubles go through these IO related issues but in modern Linux kernel there are a lot of nice features which can be, for example, very good for desktop or laptop but it can be very bad for your database server. One of such things is schedule settings which you can change for the system curtail. These relatively new features and they actually can really improve your Linux desktop performance but on database server it can be very painful. The first one is actually some limit in nanoseconds. During these periods of time the scheduler thinks that the process is hot enough and that means that the scheduler will not attempt to migrate it to another CPU. Obviously if this process is a database server process such migration to another CPU is a very expensive operation and it will certainly affect your performance. So your intention is to set this option to the reasonably high figure to not allow the scheduler to migrate your popular processor to another CPU. And the second one is the auto group enables. That means that all processes which are running from the same virtual terminal will be, the scheduler will try to group them on the single CPU. Just imagine the PostgreSQL which is a traditional prefog based application and use normal heavy Linux spread to do its job. One and every process of PostgreSQL are on the single CPU. That's definitely not the thing you would like to see. So this thing is probably good for some, I don't know, some desktop environments than you are trying to use GIMP to process your photos but for database server it's a stupid idea and you need to disable this. There is a good explanation in a PostgreSQL hacker list with a good example then the problem of some PostgreSQL installation was quite well solved by changing the parameters. And actually you need a relatively new kernel to work with this because there were several bugs in Linux kernels. I don't know exactly the version because these parameters are quite new and bugs are imminent. This is an example of just very stupid, very light test using a PostgreSQL default PG benchmark which my colleague performed. And you can see that changing these parameters just slightly improves the performance and if you have more complex database work load it will be even more drastically. So try to figure out how these settings are configured on your exact installation and probably you can benefit about this. Another stupid thing which actually is not stupid for laptop for example or for smartphone but quite stupid for database server is poor saving policy. Actually by default they usually are quite reasonable but I have seen more than once the hosting providers they change these for dedicated servers and if you do not change that then you get the server and put your database on it. You will have a very unexpected result. So the most appropriate for database work load is a CPU frequency with performance governor. The reason is that a CPU frequency is a bit more than driver and it works with poor saving policy better and the performance is the only option you can use with a database server. If you try to be poor safe that actually means that the operating system will try to reschedule your PostgreSQL processes and that will be not just lowering them all down a bit and have a constant difference with maximum performance settings that mean that you have unpredictable reniton all these processes. So your database performance can be actually very unpredictable for example when you try to optimize SQL query once and have the results like six seconds then you run the same query second time and quite different results. So you definitely need to make sure that you have no such things like on demand, over save, user space or something like that or conservative. Just check it if you have the server which was not set up by your operation engineer. So I really thank you to my colleagues who every day perform new research about these when we meet new tasks in field operations and if you have some questions I will be happy to answer about that. That's a good question. Ok, with Oracle things are much better now but for example in times of Oracle 9 they were not so quite good because if you use 11.2 or version 12 it actually helps you to improve these settings while you're in silence. But in older versions the situation was pretty the same like with modern PostgreSQL. The second thing is actually Oracle can do a lot of different approaches to for example IO. PostgreSQL is just doing everything Oracle IO subsystem is much more complicated. So that's why these recommendations are more or less applicable to both Oracle and PostgreSQL. The second part of your question is how frequently I have seen the problems with these actually probably every day because if the new client comes to support you definitely see some problems with that. For example then the X4 file system was introduced and adopted. Every second database server I have seen was with an enabled barrier mounted with the barrier and actually if you have a barrier enabled that means then check pointer starts to flush your databases to the disk and the journal was not written. It starts to restore to the kernel buffer and if you have 32 gigabytes of kernel buffer it will be very painful. And this setting actually was not well covered in any documentation related to PostgreSQL. That's the common problem. The same problem is the settings with dirty ratio, dirty background ratio. They are quite often default. Probably the number two in this line is this zone reclaim mode. And this is actually the problem with which one cannot so easily recognize because actually okay from time to time we have problems with one CPU is overloaded. That's quite common thing. Or someone disabled swap totally and some killer came and killed PostgreSQL. That happens actually more frequently than one can expect. So these set of settings actually not my own list but the list which is created by the emergency cases. Welcome. Yeah, sure. They will be on the wiki page. PostgreSQL wiki with conference slide scale. I believe you can Google this. PostgreSQL wiki scale 14. I actually don't know but I hope there will be some link on the schedule. Actually this talk is slightly improved in one or two slides but these slides are quite easy to find on slide share by the Linux students for PostgreSQL performance. Actually based on feedback I improved some things because people were confusing about these enable, disable and sound not actually straightforward. More questions? There are some actually but on Amazon you actually have no such plenty of space to tune something. But if you can actually you probably need for example to change these Linux settings like poor safe and policy scheduler or even dirty. But on Amazon actually there are some tricks how to manipulate Amazon throughput to get the best money to performance figures. But this is actually not about the Linux students. What's all about Amazon? I think the answer is quite common. I mean that for all cloud applications and for databases especially there is a rule stamp about that. That means if you want to have some flexibility that means you do not know how much database resources you will need in two months while your startup is growing up. You definitely need, you definitely can choose Amazon as a yes or hero for something like that. And you will benefit through these flexibility. But if you have a really heavy workload and you can choose between bare metal and virtual cloud appliances the second one will be definitely more expensive than bare hardware and probably some kind of support or in-house or some remote support or something like that. So if you need to maximize your throughput most likely Amazon will be more expensive than bare metal. Personally I have run into the problem that throughput in terms of Amazon this means transactions per second. But this is not the good metric for the database because on the virtual appliances you often can have some not quite smooth latency. And even if some transaction with average throughput quite nice if even one transaction will wait for some non-trivial time for example one second or two seconds it can actually kill the database performance if it will be for example flashing the dirty pages to the disk or if your optimizer will plan another access method in this time. So for database quite important is to maintain good latency and good throughput. The cloud services of the presence they can guarantee throughput but the latency can be not so smooth like it can be on bare metal. So if you need performance better use bare metal with good local rate controller. So anyway if you have some questions you can write me and I can answer. Just mention that you have the talk on the scale. So thank you.