 Good morning everyone Let's start It's the opening. Oh, yeah, okay So at the beginning few words about us My name is Miha Gudkovski and my colleague is Rafał We have more than ten years experience in software development Of course our background is currently we are working for TomTom, but our background is the different companies we worked for telco and Banks and country we are working for TomTom working with GIS system We have experience in Java Pied on different programming language, but I believe that as a software develop developers and Working in IT industry means that we should solve real life problems with right tools And I believe that one of them is Postgres Who I want to show you today our use case our use case how we use Postgres at TomTom We are from Łódź from Poland Center of Center of Poland But we have as a TomTom offices in multiple locations around world Also here in states in New Hampshire, Lebanon You may heard about TomTom. If not, we make those nice navigation devices those navigation devices root you from point A to B giving also some additional traffic information But besides that we also sell sports gadgets like watches and camera But basically digital maps are our main business It's a heart of our navigation devices and also Other location services that we provide to our partners According to our sources road network changes 15% per year In today's world people expect accurate and up-to-date maps Location information is very important. That's why TomTom needed a efficient map making platform and we base and we build it based on Postgres You can think about it like a Database with special features. It means that you can update geometries. You can search for geometries Inside it you can retrieve a lot of data, which is not only geometry geographical information, but also attributes of for example attributes of roads like class of road or street names Besides of course, we have different kinds of objects inside our database. There are roads. There are junctions administrative areas countries That's why if we are talking about map making platform. It's our purpose was to create platform that would be that would allow transactional and version changes the massive automated tools which are operating along with People editing the map. So our platform has to allow Them working together at the same time in editing in parallel Some facts about our postgres machines We are running country. We are running postgres 9.5 with post GIS We have about 200 quite powerful machines Storage is currently about 80 terabytes with daily increases of 200 gigabytes On average, we are able to insert 15k rows per second and If you look at our system as a logical Database one logical database it can handle over 600 queries per second and there are different kind of queries some of them are CPU bound some of them are IO Some of the some of them are very fast some some of them are longer because they need to retrieve a lot of data so Why so many queries? So why so many queries? We store a lot of information inside our database as I said it may be some geographical information, but also attributes and Also associations between those map objects like road elements are connected to Administrative areas and you have to be able to Efficiently retrieve those objects besides every Change that is happening within our map is triggering more than 2000 quality checks quality check rules Which prevents that? Our database gets into inconsistent state you can think about like a like a software releases that You want your trunk version of the software to be always releasable You don't want to allow any quality degradation. It's not showing so Just move it a little bit. Sorry I'm not sure not sure why is it not displaying but so how How we did it? basically by right scaling optimizations and also when your software is on production you have to monitor it and maintenance about scaling our Information in our database is has job geographical context. That's why we can put it into different Different database machines Also, our map is versioned, which means that we can partition the data In ranges we believe that for our use case one master and multiple read replicas is the right architecture choice and We do some kind of resource balancing. I will get back to it in the next slide So we do resource balancing. So we are able to utilize Hardware at maximum level About optimizations we are Snapshotting older data to be able to archive and keep database in database size in limit We use connection pulling to to handle clients also did a lot of query optimizations to to have fast and of efficient queries And in the end, of course, it wouldn't be possible without the right Postgres configuration that would that allows us high throughput for rights and for reads and the end I will talk More about maintenance and and our approach to monitoring back to So Okay Yeah, we we support right now and it's based on the hardware that that we have No, it's average what we what what we see right and yeah about the scaling I Think that with scaling it's always a question of How to balance costs of your hardware with demand because we know that if we have If you write if we write software in the proper way, it it should utilize hardware as much as possible at the same time costs should be should stay at the let's say budgeted level, right, so At the same time, we know that if you have different clients connecting to a database They're usage profile changes over the time, right? So for example, if we are if we're if we are having People editing them up. They are usually working during the day and at night and weekends databases for them are idling So we believe we believe that it's Smart choice to be able to reassign resources to different clients Also be able to throttle clients if they cost too much load on on your on our database Like if we have automated tools that can run 24-7 every day a week every day whole year It means that can they can also produce too much load So if we see that in the system we have idling resources we can We we choose the approach to be able to as reassign some Source resources for them and also when they are people coming back to work. We take those resources back I think that Database like should be visible as a resource. It means that You should be able to scale it easily scale it in and out by removing and adding Because if probably if you have one two databases, it's like you're thinking how should how can I make this? database you know my Precious resource right, but if you are using like 200 databases, then it's more about the scale and be able to be Be efficient with your hardware, right? So if you need database you just click with one button You should be able to click and add some more resources at the same time be able to remove without fear of losing any data, right? so for the scaling We are heavily using streaming replication as I said before we are using architecture with single master and read replicas and For that we are using physical replication Additionally, we are we decided to keep our replicas or standbys in the tree this way we can balance traffic from the wall generation and Also give us more failover options With replication and reading from replicas, we expect that All the replicas are in consistent states So meaning if you read some data from from the standby server, it has to have the current version of your map 9.5 Yeah, exactly that That that was that is one remark that we are not using synchronous comments We are using async commits because we believe that that Waiting for the clients to acknowledge or the standbys to acknowledge that they receive the data applied that would That would be too much penalty on the writing performance So we decided to go with the different approach that in application. We have application logic logic which checks that Given standby has the version of the map that you ask for so We put that logic into application and then it checks. Okay, this this server is okay I can query it right and yeah, we are doing that because we don't want to slow down writes and and also this application tree Give us more benefits About replication according to documentation of Postgres, there are two ways of creating initial copy of the of the standby with arcing and pg-based backup, but I think right now the Recommended option is pg-based backup as along of copying the the the data you can also copy incoming wall files anyone do you think arcing for creating Standbys no, yeah, and what's your experience with using arcing And Resume option, for example, does it work? Okay, yeah, because I mean it's Arcing by definition gives you some options to resume right and pg-based backup not but I think if you want to resume then you have to at the beginning check check Which data what data already have and that can be also lengthy process, right? Yeah, there is great post on the postgres viki How to set up string replication? so you can do it on your own you don't need any magic knowledge But of course Replication is not for free. You have to be aware of some facts like there's always some if yeah If you are using async commit, there is always some replication lag It means that you cannot assume that the data will be transferred immediately and Transferred and available immediately on the standbys. There's always some some delay let's say from milliseconds up to up to seconds because the the applying of the of The wall files can also take some time and can be blocked by running queries Of course, you have to remember that in with the physical replication standbys are read-only So there is limited configuration options. You cannot change users. You cannot change roles and only let's say Postgres com postgres conf can can be different But the question is do you want different configuration on the different machines? Also with replication you can with the streaming replication when you take Changes for the whole database it can cause High traffic on your network interfaces so Even it it can lead to some saturation of your infrastructure if the if you are generating like gigabytes of data per second it's also worth mentioning that with the Streaming replications for example sequence values in postgres are not only not fully consistent with primary it means that like by design it says that it's like if you increase sequence value on them on the primary then there is special or there's buffer that the the sequence get increased on the slave So you may not discover that the sequence value was changed. So it's not recommended to read sequences from from standbys also with This kind of replication tree that we have you have to monitor yours Your your your leaves your machines because failure of on one machine can can disconnect all branch Also, you need extra extra storage for walls to be able to to to send it to standbys and Also with the streaming replication. There is known Error of cancelling statement to the conflict with recovery and Of course suggested option is to enable hot standby feedback but you have to take into consideration that it can cause higher bloat on the data because Running query on slave can prevent cleanup of Cleanup on the master of that rose So it's important to monitor the application luck Of course along with like you should also monitor your system metrics Because luck can be caused by high CPU IO delays or network saturation Postgres gives you built-in options to monitor lag You can do it with the queries you can select from Pg start replication the primary server and it gives you Detail information. What's the state of replication on your standbys? for example, you can see that there is There is a sent location which gives you information which Wall file was sent and also which was already replied on the on the standby It means that there are different kinds of different reasons for replication luck. For example, your primary Is too slow to send the wall files or it's basically not able to your standbys not able to apply changes Efficiently so it's it stays behind the master So it's important to store historical information like like metrics in a moon in our Prometheus And also use like Grafana to display those historical data So if the some replication luck is happening Middle of the night then Later in the morning you can check what was the state of your system why it happened Was it only CPU load or there were different reasons? so, yeah, we want to reduce replication luck, right and The easiest way to do is by reduced for example CPU and IO usage coming from your queries and You can do it both way optimized queries or throttle clients. So they generate less queries also doing a lot of Create updates creates updates in your database can cause higher wall generation level and from the hardware point of view it's It makes sense to use separate volumes for for postgres data for and for xloks Wall files this way you distribute IO load to different hardware About some configuration settings with postgres You can set this max standby streaming delay, which means that query will not get cancelled for the given minimum time, but it can cause Replication luck happening on the standby on the other hand you can put this hot standby feedback on option But then you have to remember about the bloat that can That can happen on primary Also, it's it's worth to change the default Number of wall files that you keep on the primary because it can happen that if the standby is slow in receiving wall files then some wall files will be storage will be recycled some wall files removed and Standby cannot catch up anymore with primary. So it means that you have to create a new initial copy again and also You should be able to you should configure number of Clients for your replication. So This way you can also control Network from the wall traffic any questions, of course our data has Geographical information, that's why we're using post GIS a lot. It's a great extension for us It gives Geometry support for geometry types. It allows us to do geometry filtering by geometries. Also give processing and analytic functions for geometry types Queries run faster with indexes on geometries and One of the Optimizations we do with post GIS and geometries is that we don't sometimes we can run query on simplified geometry because in real life Some let's say large area objects can have thousands of Coordinates and if you run it on simplified the geometry then it's easier to to query this kind of geometries So yeah with queries it's always it's always worth that they use Indexes that are available on your tables So first of all, it's important to keep up-to-date Statistics because otherwise query planner will not be able to choose the right plan You can increase Statistics per table per per column using this alter table command And then of course you have to run analyze again on that table with query optimizations, it's Smart to run the queries on the production data not to try it on your local machine with minimal set minimal set of data because on production, it's of course bigger volume and also with Parameters that are coming from your production environment because different parameters can Can create a different query plan Of course You have to remember that indexes are not for free if you put Too many indexes on your table Then it can increase insert time That can increase also size of your tables and data and also it influences walls with Postgres allows you to do milk multi-column Indexes, but you have to remember that the order of columns in definition of for example, B3 index is important and it's It's it makes sense to plan your indexes the right way about indexes and Explained lines plants. There's great page from explain depesh.com that will that do visual visualization of your of your plans About database maintenance Even if you don't remove any data and only do inserts Still inserts can cause Bloat Happening on your indexes and if you have bloat it can increase Latency of queries and also increases size of the indexes. So The search will be bigger. So it makes sense to monitor the bloat also if you create your database At the beginning you are not sure which indexes will be will be Used actually so at after some time It's it's worth checking if they are indexes which are not used then you can remove them and save some space of course on the online system you cannot do vacuum full which will put lock on your Table or on on your system. That's why it's Yeah, it's not recommended to turn off out of vacuum because even if you have your special script Sometimes you can forget to run it or there is some other failure. So database is Would be not maintained. So it's better to just let postgres do the cleanup and Tune out of vacuum process properly So it doesn't hurt performance of your system and at the same time you are sure that it's always Running when it's needed So it's important to monitor number of dead tappels and to parametre tune parameters of autovacuum to be able to Set proper thresholds when autovacuum should start Also costs of autovacuum if you want to limit how fast it can go and at the same time because Generating a lot of changes can cause again more walls being generated for the replication Of course for the Autovacuum process it makes sense to increase maintenance memory because this way it it will run faster on the live system You cannot Use create index statement directly. It's better to use create index concurrently, which is not Creating exclusive lock on the table. Of course it can take It can take longer with concurrently option and Still be blocked by other running queries like for example vacuum with indexes and Removing bloat it makes sense sometimes to create a new version of the index Of course, you cannot on the life again on the life system. You cannot do reindex which will block your Which will lock your tables, so it's better to create a copy of the index and then Switch it with the original Index removing old one and there is like let's say general I think general rule for postgres configuration that you shouldn't keep the default setting default settings Coming with the postgres installation. It's better to you know tune it a better Tunit tune it a little bit so you can get better performance from postgres About connection pooling if you if you have a lot of Clients connecting to your database. It makes sense to for example Create use pg bouncer to the to limit number of after active connections Also, you should use connection pullers on the on the application level on with Java It's the multiple options like c3po dbcp and this way you also limit number of running queries on the single machine and also There is no overhead with creating New connection to the database every time you run a query There is couple of Things to remember and know about pg bouncer that for example, it's a single threaded It means that with the bigger load you can it can be a bottleneck, but there is smart trick With using IP tables that can redirect traffic randomly to different pg bouncers based on the Just the number of course Yeah Yeah Yeah, this is what we saw with pg bouncer and with the higher you know with the higher load If you are transferring a lot of data it really can be a bottleneck that you are checking your database that hey It's the the load is not very high, but then if you try to make query and Get connection from the pg bouncer It spends too much time or just retrieving the connection or you know retrieving the data But at the same time you you connect directly to the database and the same career runs pretty fast And now my colleague Rafał will tell you more tell you more about configuration tuning. Yep so memories and another settings that can Significantly influence your performance and these are the most crucial settings Shed buffers is a place where postgres stores Cached pages so if there's need to read a page from disk then this cache is checked at first and if page is already in this cache then Popularity counter is increased for that page and that page is returned when it's not in the cache then postgres request postgres request OS to provide that page and then well most of OSes has its own Caches for for readings Data from from disk And what is more some hardware can have its own cache for such purpose So Sometimes it's not a drama to to to read data Which is not in shared buffers as Then when the page comes from OS to postgres Postgres is looking for the place in shared buffers to put that page in That shared buffers and it's it's going around this page table and it's it's called a clock sweep mechanism and Decreases these popularity counters Until it finds the pages that can be evicted There's there there are a lot of discussions what are the best values here to put One says that it's 25% of your available memory If you have more than eight gigs The others say that you would like to keep shared buffers At the minimum level to let OS cache do the job Well It really depends. There's there's no simple answer for that because it depends How big your databases? How big your database objects are what is the data distribution in those tables? What is the data popular popularity for for particular values in your in your tables? so you need to Do some experiments to find the best values You can use a country extension PG buffer cache Which will tell you how how big portion of the objects are actually stored in the in the shared buffers So so it's it's not simple way to say what's the best setup here Effective cache size, it's actually not a preallocated memory It's just hint for the query planner to let it know how much memory Can be used when indexes are to be used huge pages default size of page in postgres is eight kilobytes and When you have shared buffers set to let's say 200 gigabytes of memory, then there's a lot of pages to browse through And it's it's CPU consuming and and it might be ineffective So there's a concept of huge page, which is by default of size two megabytes most of us is now supports Such solutions so so you can define in by kernel settings. How many huge pages do you want to do you want us to to to preserve and When the huge page those desired huge pages is are available provided by OS You can use them in your application for for instance, you can use them postgres use might use them so What you need to do is to switch huge pages to on or to try if you put it to try Then a postgres will try to use as many huge pages as as possible If if it fails, it's still is able to start up About what man is a amount of memory of that is That is used for Single operation of sorting or building hash tables. So it means that Single query might use multiple times of this value For each sorting or building hash table it's It's important to monitor if if you are not Because when you when sorting is not able to fit in this work mem Amount of memory then temp files are started to be written on the disk and this could dramatically lower your performance So it's it's worth to monitor if if there are any temp files Written for Particular table you can you can check that in PG start user tables You can monitor if particular query use it you write stamp data. You can monitor it in PG start statements Made the next work mem. It's a memory that single process of vacuum manual vacuum Uses or create index or adding foreign key So it's worth to mention that that Each each Each single process uses that much memory So if you if you do PG restore with multiple threads then Every create index that is going there is going to use that much amount of memory When you do some additional maintenance regular maintenance like reducing index load by Reindexing or something like that. It's probably a good idea to put a higher value here Okay background writer and checkpoints It's another crucial thing Especially if you want to have a high write throughput so It works like dirty page is a page that was modified like updated or deleted and this dirty page is not Written to the disk straight away when it was modified by some back end but it it is kept in a shirt memory and It is kept there. It is for performance sake as When all the back ends would Like to store the changes on the disk straight away then the queries would be stalled and and and cute So so it wouldn't be performant enough so so there is a Optimization that these are put to the shirt buffers and then on periodical basis checkpoints happen By default, it's five minutes. So every five minutes all the dirty pages from shred buffers are Flushed to to the disk It will be okay, but if you have a lot of changes then There's a lot of dirty pages generated and you'd observe on this periodical basis picks on IO usage and those should result in an in Lower performance in that that period So there's another optimization that when the back end wants to read the data and It needs to read the data which was dirtied then these dirty pages are Are flushed to disk during this back end query runtime So so some of dirty pages are Flushed to this before checkpoint happens But it's still not enough because in some setups in some workloads Maybe you you won't even You would never read those dirty pages between checkpoint happens. So you would still observe this IO picks So there's another process around in a background. It's called BG writer it's Its job is to it's to also periodically To to flush those dirty pages to the disk But it's it's doing more often There's a big writer delay between the rounds And it's not flushing all the available Dirty pages, but it's it's limited by those values like or max pages But checkpoints You should also It's it's good to to have checkpoint as less frequent as possible to To have you need to find the balance between BG writer configuration and checkpoint configuration These last two mean wall size and max wall size define how many wall Are generated walls are generated between checkpoint would happen It's even if there's not this checkpoint time that happened then if we have That amount of wall Files then the checkpoint would still happen And it's as you see on the on the chart Changing those values significantly decreased how how much dirty pages were were flashed by checkpoints and and This also influenced and in lowering of these IO picks we had Yeah, so about Database definition we are using liquid base to keep Keep definition of our databases in some repository right it's it's worth to keep it outside And in the be able to review what people are changing for your schemas It gives you a ability to roll back changes and test those changes on pre production environments About monitoring postgres. I found this cool picture on the internet the credits goes to Alexei the sob ski Yeah, he created this kind of view about the postgres Internals and what how can you observe the how can you observe performance and what's happening in those internals? So it's called postgres PostgreSQL observability views, so I recommend you if you want to share it with share it with your colleagues and Show internals of postgres. That's really useful so if you are monitoring queries that to Are great views pg start activity which gives you view about the current running queries and pg start statements which aggregates queries and About them and con postgres configuration for example, it's you have to load pg start statement as a library and also If you want to track long running queries you should configure in postgres My mean duration and then your long-running queries will be locked in postgres lock There's also cool option with postgres that you can enable Explain with the time limit. It means that all the long-running queries Longer than this duration will be locked and their plan even you you can lock explain analyze But it's expensive to run query twice because actually it's running first there to generate the plan and then run query again with with logging So pg start statements. It's great because it gives you aggregated view about running queries But if you have multiple instances then it can be difficult to track how how queries are running on them we Fun created this kind of tool that's Collects pg start statements from machines and then it's it's it's running every hour to so collect Statistic reset so everyone every hour we we have new data and we put that data into the log stash then it's easier to monitor queries check How if there is any regression in the system or number of calls get increased Yeah, this is short view about the Presenting data in Kibana. So we are yeah, we're able to to and we are gathering The information that pg start statements are giving us so CPU time the total time and number of calls for example also average metrics Other tools we use for monitoring the very cool Munion plug-ins that enable you to track system metrics and Of course, we use the AWS cloud watch and if you want to To to be able to monitor whole application performance We we recommend for example app dynamics, which is great if Give you the overview from the application To the database it shows the problems in the Java code or problems in the long-running queries There is of course basic unique tools Unix tools like H top IO top which show you metrics about the the the machine And there is like there's this great extension pg view which gives better information from the pg start activity About currently running queries so Couple words at the at the end as a conclusion. We think that posbius is great Database it allowed us to create very large system. It's scalable with the replication. You can easily scale it Of course, you have to configure it properly to get best performance out of it and Of course, it needs maintenance and monitoring, but I think every production Production system needs that any question at the beginning. Oh, sorry, but we are around so Thank you for attending and