 So, hi, my name is Thomas Wondra. I work for a second quadrant and I live in Prague over the ocean I know there are other parks in in US. This is like in Europe a Little bit about me just so you know who you are dealing with. I first met Osbres in 2003 So I'm working with the database for a fairly long time Solving performance issues and so on most in most cases I Also contribute patches to Osbres Some of the patches I will talk about during the presentation I Do benchmarks quite a lot of them. This is what this talk is about I often argue with people on on the mailing list. So sorry about that if you are one of those people and Over the last few months. I started working on posgres excel, which is a distributed database Built on top of posgres kind of fork if you want By the way, I'm not a native speaker So if you don't understand something because of my accent or something just let me know I will repeat that with exactly the same accent All right, so This is the first time I do this talk. So what are we going to talk about? I hope the timing and we will manage to talk about all of that Postgres is kind of a general purpose database, right? I mean like it it's not a analytics only database. It's not well TP only database But from the very beginning the the design is kind of oil TP oriented, right? I mean it works great for oil TP workloads because it has One-to-one process model like each connection used to be handled by a single connect single process Which is perfectly fine for well TP, but not that great for analytics It has a row oriented row on disk format, which means that the rows are stored as a chunk of data it's not like transformed in any other alternative format and the exit executor is using a Model of execution which we which we call double at the time, right? It gets one couple does all it needs to do on the couple and passes it to the next operation Which is again perfectly fine for well TP, but in analytics you are processing large amounts of data This doesn't really work That well compared to special databases, but kind of surprisingly People don't do just oil TP, right? The users also want to do quite a bit of analytics. They want to process the data Generated by the oil TP workloads. So if you have a like a big e-commerce site You have all TP on the customer side because they submit orders. They submit Payments and so on but you want to At the end of the month do some analytics on that to see how much money you made how much money you lost and People started using Postgres for that too There are two reasons for that first People don't really want to pay money for like proprietary databases So they started using Postgres for oil TP and then decided that maybe they also want to save money and use a single database to do Both, right? I'm not saying it's the most efficient solution But it works for a lot of people. So what are the differences between oil TP and Hola, right? oil TP stands for online transaction processing and bi stands for business intelligence and online analytics processing, right? So oil TP typically Those are queries that deal with small amounts of data, right? So when you Read a website or something it will fetch just a single article using a primary key. So that's typical oil TP query It's usually accessed by indexes unique indexes or primary keys and Because you usually work with just small amounts of data You you see like nested loops for joints, which is perfectly fine works very very efficiently great the business analytics and Business intelligence and all up is however very different, right? You have huge amounts of data Nowadays, it's usually tens of gigabytes hundreds of gigabytes in a single table We have customers who who are running with terabytes of data or more The the queries are complex and usually process large amounts of data, right? It's not just a few records for a single customer. You are analyzing all the data for all customers and doing some aggregations on that Joining that with other tables and so on. So if you know what for example a star schema is that's typically How online analytics works? so What are we going to look at in this talk? So I'm working with Postgres for quite a long time, right on more than 10 years easily and We usually see only benchmarks like this is what 9.6 is doing, but you have absolutely no idea how the benchmark would compare, you know over multiple versions So you don't see the improvement over time Which is kind of annoying at least for me. So what I will do in this talk is we will look at the same benchmark Which is dbt3? 22 queries. If you know what dbch is so this is the dbch or implemented in open source package We will look at releases since 9.01, which is I think like six years old, right? and Well, we will see the numbers we will see how long it takes to run the whole benchmark on average I'm not going to do like the full dbch or full benchmark with Incremental loads and everything I will just do the load around the queries repeatedly to measure some average time And then we can see the improvement So I did the benchmark on my small home machine that I use for testing so it's like just two sockets 32 cores 64 gigs of RAM and the like SSD with 400 gigs so Not a small machine you could say that this is like a enter level hardware server configuration for today and I Will talk about two types of improvements, right? We will look at the numbers and the I will talk about Which improvements were done in each postgres version like the main ones? And there are like two of them two types the first one is like Let's use a bigger hammer right like The the main example of this is parallel query in 906 It simply means well, we have like 32 cores. Why should we run the query on just a single core? So let's use more RAM. Let's use more CPU power and that's it It might be actually a little less efficient because there's a communication between the processes and so on but Yeah, that's the post right the the other improvement is Well, maybe we could use the resources the hardware more efficiently or maybe we could do you know not do things we don't need to do actually like Maybe we can use an index in a different way in a more efficient way So Those are things I will talk about Right, so the first thing to to talk about is like a release is since 9.1 until 9.5 which were kind of The more efficient improvements making it more efficient in general You can see that for each release There are like 22 queries each each color is it's a different query and you can see that we are kind of improving Some of the queries like the first query at the bottom, which is like a very CPU expensive even without using multiple cores multiple CPUs We still can improve the query by making it more efficient There are other queries that are doing like more expensive aggregation joints and stuff like that But in general Over the Five releases We gained by just making you know Making it more efficient The algorithms We gained something like 20 25 percent, which is not bad, right? I mean Postgres was well-designed were engineered at the beginning and the gaining like 25 percent just By optimizing stuff is kind of good, right? and I wouldn't really expect like a huge improvements in this direction in the future I mean we are getting about as efficient as possible So so one thing I didn't mention is that I have done the benchmark on two data sizes So this is just a small 10 gigabyte data set People are often talking and asking me about like big data and stuff But the truth is that most people have a small data size, right? I mean they have like 50 gigabytes or less Usually Maybe the database is larger But the working set like the part of the database they work with is usually much smaller, right? So this is like in memory Pretty much no IO That's it The other And you can see sorry you can see it takes like 10 minutes to do the whole benchmark, right? I mean 700 seconds something like that The other data size which is 100 gigabytes and this is on a machine with 64 gigabytes of RAM So this is doing a lot of IO It looks like this so again We are doing a lot of IO on a fairly good storage system though on SSD connected to PCI Express and and Still we have gained like 20% right again, not bad the the analytics does a lot of Sequential scans and so on so the storage storage is handling that quite quite easily But we are still gaining like 20% So let's talk about the the improvements in in diversions, right? And the it's not just that I want to describe The improvements like what have we done, but I'm going to explain that because you can actually use that, right? I mean If you don't know that for example 9.2 Edit index only scans you can't design the indexes to benefit from that, right? I mean index only scans What does it do? By default Postgres doesn't store visibility information for each row in indexes So whenever you get a row from an index What it used to use to do is it had to still go to the table and Check like can I see the table? See the row or not and that means random IO and it's also expensive What 9.2 did? When all the columns that you need in the query are in the index We try not to do the second step Which is like a huge improvement Because first it it eliminates the the access to the table, which is inherently random IO and The second reason is that it gives you the data in sorted way If you read the data from an index, it's already sorted in most cases and Sorted access sorted aggregation sorted joins is about the most efficient solution to you know analytics You could use hash-based Implementations, but they are definitely not as efficient as already sorted data. The other improvement in 9.2 is Batching tuples in copy right so when you load data, I will show you the improvements in copy on one of the next slides But one of the improvements is that instead of you know copying and inserting each row Independently we batch them into large chunks, which is again more efficient and 9.2 also included about the first improvement for sorting by Peter Geegan Who had a talk about you know sorting and hashing I think on Wednesday, so Really nice talk. I recommend you to actually see the talk on a video if you haven't been on the on the door directly and Those things combined Gave us a small improvement like 10% right on average because the queries are complex you improve one of the Operations in the whole query you will get a smaller improvement in general So I will skip 9.3 because it didn't have any Significant improvements in this area 9.4 That's why we Neither for included some improvements in aggregate function Computations, so if you have like a huge aggregation We can now For example share states Between different aggregate functions, so if you if you have a query and it's doing like a sum and average and Standard deviation on the same column those actually internally Keep the same state, right? I mean to compute the sum you need to sum To compute the average you need a sum and number of records to compute the Standard deviation is using the same internal state as those those two functions. So why to do Like triple amount of work, I mean you can do just a single You can do it just once so that's exactly what 9.4 did 9.5 That's where Peter really Made a huge improvements in in sorting So what he did is he instead of comparing the full values like which could be fairly long text or Numeric value which we don't store in line in Postgres So what he did is he used a concept of abbreviated keys of only comparing like the first few bytes of a value Which is much more efficient in terms of cache accesses So you can do that for example on CPU cache, which is like two orders of magnitude faster then accessing the main memory and he did further improvements of You know in lining the the functions we use to compare instead of you know effectively calling the function from SQL He did that directly in C which eliminates a level of indirection, which is inefficient There's also a patch I did which is about hash joins so originally what Postgres did We kind of assume that the estimates in During the query planning were correct. So when the hash expected one million rows and Got 100 million rows Right, it got completely wrong size of the hash table. So instead of a chain of ten rows It you could have like a thousand of those and that's exactly what you don't want to have in a hash table You don't want to walk long chains of rows So I started looking at this for example, because a Kuwaiti was talking half an hour and After fixing this after doing this patch. It's running in like 30 seconds right, so That's important because in analytics in This type of data sets you have a higher redundancy and correlation between columns. So the estimates become much more Sensitive to that So this is a hash join improvements, and there are some improvements about bitmap index scans Or bitmap heapscans rather that help with Using indexes for scanning large amounts of data So that's there was up to 9.5 By the way, if you have any questions Maybe don't ask until the end of the talk just ask me directly, right? Right, so now comes the big difference, right? I mean instead of improving just the efficiency Robert Haas and his team did a huge improvement in You know using the bigger hammer instead of using just a single Single CPU single core for a query we can now do Parallel queries we can paralyze and run the query on multiple CPUs at the same time Which is great and you can see that in this in this case it improved the You know the performance To put by like order of three. I mean it's like three times faster We can complete all the older queries within Just above three minutes Instead of like ten minutes Which is nice, right? It of course means that instead of running, you know multiple queries at the same time You will probably be able only to run Like I don't know four of them because each of those queries is going to run on eight eight cores Saturating the hardware. So this is the in memory In memory Data set just the ten gigabytes, right? On the on the larger one Which is 100 gigabytes you see the improvement is still significant like maybe twice as fast Something like that Which illustrates but it's it's not three times faster What I wanted to illustrate by this is the the parallel queries are great for CPU intensive queries, but once you start hitting at the bottlenecks like for example a Memory bandwidth or IO It's not going to give you like a proportional proportional improvement By the way, I was using like eight worker Processes, so it's like instead of using a single CPU. It's your running or eight CPUs Which shows you that it's not eight times faster, right? So nine to six What it does is? Parallel queries It's a bit difficult to explain what exactly parallel query is but it allows to Parallelize some operations within the query, right? And initially we only got Sequential scans some joints Pretty much has joined a nested loop and we got some aggregates, which is exactly what you want to do in in analytics workloads 9.6 also Included some additional improvements of sorting Peter Geegan looked into some design decisions made up years ago when the when the hardware characteristics were fairly different and We have simply removed some Some parts of the of the algorithm and we are using quick sort instead of replacements or in most cases and We are using additional abbreviated keys or Abbreviated keys for additional data types Pretty much it could be applied for any Variable length array or variable length data types initially it was only done for text and Numeric data types Here we also support that for UIDs Character data types and so on the other thing I have Included on the list is index only scans and partial indexes So who knows what the partial index is Right, okay, so about half of the people it means that you can include like a very conditioned to an index right, so if you for example only need to query Orders that are not you know fulfilled yet not Not completed you can say where order state equals New right or something like that The the idea is that the index will be much smaller. It will not include all the like 99% of the historical records And it's going to be much more efficient in terms of maintenance in term in terms of querying and so on and before 9 to 5 you couldn't actually do index only scans on Partial indexes or you could do that But you had also to include, you know all the columns in from the predicate into the index itself, which is kind of pointless, right? I mean You know what the condition is you know what the predicate actually says so 9.6 fixes that which include you know improves the the usability of Designing the indexes for index only scans And it also includes some additional improvements in aggregate functions and Well, it didn't actually help very much on this benchmark. I mean Oh, sorry, that was 9.6, right? So now we are talking about Postgres 10 This is on a snapshot from like three days ago so we are just in the final days of the last commit fest of Postgres 10, so There are like 10 patches committed every day. So this is already obsolete But I think it shows that we are not going to get like a huge improvement from From postgres 10 in this particular area This is on the 10 gigabyte data set and on the 100 gigabyte data set you actually see some improvement I'm not sure exactly why it shows an improvement here and not on the smaller data set I suspect this is actually because of a bug in parallel query That I have managed to you know Not hit in the postgres 10 version, but I hit it in 9.6 and I have already reported that to hackers. So we are talking about that But this means that ultimately you would get the same number for both postgres 10 and postgres 9.6 so What we do get in postgres 10 is additional sorting improvements and I think Based on discussion with with Peter I think he doesn't have much further ideas how to improve sort But so we will probably not see like a huge improvements in that We got additional Parallel operations. I'm not going to you know discuss all the details because Robert has already had a talk here and about that and He also had a talk. I think at PGcon 2015 or something about the whole design of parallel query. I would recommend it to watch that So what we get in postgres 10 is Bitmap heapscan and bitmap index scans as a driver table. I mean We only paralyzed queries that use a sequential scan Now we can also parallelize queries that use Either bitmap heapscan as a driver table, which is used for you know Dividing the work between between the worker threads And we can also do that with index scan we do get get their merge and Merge join support which I think for large datasets is a huge huge improvement Exactly because of the CPU efficiency and all of that on large datasets We get a hash support for grouping sets until now all the grouping set computations had to be done by sorting Andres Freund committed something we called Faster expression evaluation which effectively means just in time compilation infrastructure For some parts of the code I Think it's very exciting. I think it's you can we can get maybe 10% improvement from this patch alone in the future and We also get some improvements or There is a question mark because it's not committed yet But we might get improvements in the planner When joining tables and we know that one of the table tables is unique, right? That is already marked as ready for committer and we also get a Something we call parallel shared hash table When I mentioned that the the bigger hammer approach like using more resources May actually be less efficient in general This is one of the examples right when you do a parallel query which includes a hash join Then each of the workers will build the hash join from scratch, right? So although the hash tables are going to be exactly the same, but the the workers just don't share this and this is Exactly the point of this patch which Share this work between the worker class Which by the way also means that we are not going to use That much memory right because the the processes will share the same Same htable one thing I didn't really mention a lot is that I'm pretty sure we could get better results from the benchmark And the reason is that all the on all the versions I have used the same schema Which was effectively optimized for the 9.1 version Right I'm sure we could improve that by using partial indexes by Designing the indexes for index only scans and so on right so that's not included in this version of the talk Hopefully Next time you see the talk it will be there Because it allows actually doing Plans like these two right you can do merge join with two nested index only scans or you can do group aggregate with An index only scan and so on I Also mentioned that you can do well just you don't just query the database right you also need to kind of load the data into database first and This is the duration of of the data out in a serial way I mean you load the tables one by one then you create the indexes one by one So you get like a how long would it take if you only had one CPU and Well, this is the total duration for the two data sets. Sorry. I have used 50 gigabyte data set in this case Instead of 10 gigabyte one And you can see that we are still improving that right by using the 9.1 number as a 100 percent You will get something like this. So you can see that we have not only improved the efficiency of of The queries we have also improved the data load efficiency by roughly 30 percent Which is also nice and I think there's more to come in this direction because This is just a single CPU This is not parallel at all at this point, right? I know that Peter is working on on Paralyzing create indexes We are probably going to get other maintenance Maintenance commands Parallel version of them So quickly Let's look into the future like what will we probably get in maybe postgres 11 So we will probably see additional types of queries being paralyzed so right now or We don't have parallel sort so we might get our sort We might get additional utility commands like great index for example We One big difference or limitation is that we can't parallelize queries that do any type of right Right. So if you do an update or something something like that, even if it's just in for example a CTE It's not going to work. We will disable parallelism because of locking issues The other thing we might get in postgres 11 is definitely Some just in time compilation Right That's pretty much why Andres pushed the commit in In the last commit fest We might get something like partial sort, which is another example of the let's reuse previous work, right? so if you have a if you have a table which is which you know or a Stream of tuples that you know is sorted by the first column Then sorting by first column and second column can be more efficient, right? Because you can just Reuse the the partial sort. That's exactly that There were also talks about you know improving the storage efficiency by using compressed indexes by using covering indexes and stuff like that and which I think is going to be a huge difference in some workloads and All the stuff I talked about until now is mostly about making the executor More powerful, right? Like allowing the executor to use Multiple cores allowing the executor to you know skip some work and so on But we also improve we are also working on improving the planner like Making better plans in the first place first phase right because I'm sure that some of you faced like issue with the postgres coming up with a plan which then Performs pretty badly and then the last part here This is multi column statistics We kind of get the first few bits in Postgres 10 and hopefully postgres 11 will expand the capabilities in that area and The last slide in this section Is something I call long shots. It's something I think and this is my opinion not something official or anything I Those are things that I think we need to do Over the next few years to actually improve the database performance again and There were already patches for or attempts to do some of that In the past, I think we will see new versions of that new attempts to do for example Plugable columnar storage We will probably see something like vectorized executor We will see Compression and you know deduplication Applied on different places in the database We will probably see Something like parallel grouping sets, which is not parallelized at this point And we might see something like materialized views With incremental update and automatic use in queries instead of the source table I've been on the talk Just before this one which was about ETL by Corey Hooniger and He was talking exactly about this about something he calls rollup tables from Oracle So that's exactly the idea Right. So any questions about like BI performance? Oh, he's sitting over there Yeah, so any questions about the the Postgres improvements Right. No. So let me just quickly talk about something I've worked on over the past few months, which is called Postgres Excel So who knows what this Postgres Excel or Yeah, okay, so what that is It's a distributed Postgres scale fork or Simon the CEO second quadrant Always says it's a spoon because we are not like our intent is not to keep it completely separate. We want to Move as much stuff as possible back into postgres It's meant to provide horizontal scalability, right? So instead of you know building bigger and bigger servers you build multiple servers next to each other and they kind of share Shared the data and execute queries in a distributed database There used to be a few or there were a few Projects before that storm DB postgres xc you probably saw some of you know Talks about some of those in the past the a big difference for postgres Excel is That it has a postgres license So it's using exactly the same licensing as Postgrespl, which allows us to you know move code and do work on both without like legal issues So if you want to know more find more there is a postgres Excel.org website Our intent is not to keep like a Proprietary for or something we would like to Make it as close as possible as possible to postgres as possible both in terms of license which already happened but also code wise I mean What we are working on right now is Keeping it, you know rebasing the fork To 9.6 so it has all the features from postgres 9.6 All the all the new stuff And that's what I've been working on over the past maybe four months and I don't know how many of you has You know experience with maintaining forks. I mean large forks and fast-moving forks, you know forks of fast-moving projects like Postgres right because if you look into commit fest applications, I mean which patches were submitted for Inclusion into the next version of postgres. It's like hundreds of patches I mean and some of those patches are extremely invasive in terms of like what they change, right? So It's I can tell you well, let me walk you through like how would how such a merge process from postgres into 9.6 Into Excel works So first you do something like this, right? I mean like you are in the Excel repository and You do well, I want to get all the changes from The branch in the upstream repository and you get something like this Right So it's like completely broken It doesn't even compile Like you get conflicts in regression tests in different parts of the planner. It's completely broken So you fix the regression tests. Oh, sorry, you fix the merge conflicts and Well, you do make right? I mean you want to compile the code and you get this right, so you get a like a million of compilers so you You try to fix some of those but you can't actually test that it works, right? Because it doesn't compile because the other complication errors, which is like So you start drinking beer, right? Then a week later you manage to fix some of the complication errors and it compiles, right? So the next thing you do is you try to run regression tests I think you know what will happen, right? So you get like a million of seq faults because the fixes that you made Blindly like without being able to run anything They're actually wrong. So you fix those and You run the you know make check and you get like a million of regression failures Because the code doesn't work the way you expect it to work, right? And then you've kind of fixed that so That's pretty much Made the current status off of the project. We have a few I mean all the regression tests fail Oh, sorry pass with like maybe ten thousand line diff, which is pretty good because we started with like 100,000 lines, so that's fairly good Most of the differences are fairly small like like a small plane change of something we can live with And we expect to you know, push the branch into the public repository soon It didn't really we haven't been keeping it Private because we want to make it like a proprietary or something, but it simply doesn't make sense to push something which is completely broken And then get like Email messages like well, it doesn't compile. Why have you pushed that right? I mean, so that was the reason What we get from the 9.6 merge from from upstream like a huge improvement We get internal parallelism like Yeah, thank you. Yeah, well Postgres Excel Has always been able to do parallel queries, right? I mean you we've been distributing the query to multiple nodes, which is a type of parallelism But with 9.6 We also get like parallelism at the node level So we get like parallel execution across multiple nodes and each node will do parallel query, which is great It also simplifies a lot of the code, right? I mean we've been able to rip the custom implementation of parallel aggregate from Excel we simply replace that with the stuff implemented in 9.6, which is like a huge and Just to show you How much closer we got to to upstream in 9.5 if you look at this between 9.5 XL and 9.5 Postgres you get something like 300 Sorry 200,000 34,000 line new lines and We got like minus 20% of that. So we have just removed like a lot of code A lot of changes which is which makes the the code much cleaner, actually So that's nice Right, so that's where the Excel project is right now and I think This is my personal opinion I think what we need to do is we need to build a bigger Excel community, right? I mean right now the Development community of the project is fairly small. I mean If you look at the history of the At the comic log, you will see like two people there Which is not very good We also are not Particular grade in accepting, you know, contributions from other people Who submit maybe Some patches to to the mail English then nothing happens. I think that needs to change, right? And I'm telling you that because we want to make the the community more open It's not a project that is owned by second quadrant. I would like to see Committers and contributors from other companies Ideally we would get something like Postgres like community With a mailing list that would be great We just need to make it happen. I mean we need to Give you commit rights. We need to establish some basic rules how to do the development I'm definitely willing to help you with you know getting started with with the code and That's it. I mean we are open to that I think the the only thing we we know that we absolutely need to do I think we need to move off South Scottish. I mean that is not a Good development environment like with good tools Right Right so That's true. I mean the the primary repository for the project is On progressquiel.org. There is a repository for Excel But all the other stuff like mailing lists and Backtracker or something like that All of that is on on source for so if you want for example to you know Look into the archives of the mailing list the interface is kind of Well, not very usable So we are not exactly sure how to do all of that. So if you have an idea how to you know You would like to Become a member of the community you would like to do something with Excel and there is a lot of problems to solve I can tell you that Well, just told to us. I mean we will help you we will find a way to actually make it happen and That's all for me. I Will be here till maybe three o'clock So if you don't have any question now, maybe you will have a question over over the lunch or something I'm available So I can't speak badly about Cytus because I have a right Yeah, I don't have much experience with that. I know that The difference between Cytus and XL is they are more about sharding While Postgres XL is about having a you know a strongly consistent cluster, right? So one thing we have is a global transaction manager. I think Cytus doesn't have that the other difference is that I Think that they are Sending sequel queries to the other path to the data nodes Yes While well, not just that We are actually sending plants or the plant queries Which has the benefit that we can for example reuse the Parallel aggregate Infrastructure which they don't do right. I mean they need to do a lot of magic to make that happen So what but I think it's it's a good database I mean Cytus I'm definitely sure that there are use cases where Cytus is going to be better than Than Excel. I'm sure that there will be use cases where Excel is going to be better. So I Don't know which kind of workload or application you have I'd like to see such comparison myself for example, right? That's it, right Any other questions? No, so thank you for attending the talk Yeah, and maybe see you