 Hi, so I'm Robert Haas and I am a Postgres QL major contributor and committer and I work at Enterprise DB where I am the chief architect for the database server and Today I'm going to be talking with a lot of echo about the elephants in the room or limitations of the Postgres QL core technology And before I get to that I want to first say that Postgres QL is a great database. So You know, it's got a reputation for stability and for standards compliance it has a whole bunch of Great features some of which no other major relational database system has such as transactional DDL You can begin a transaction create an index drop a table and then you can roll it back and undo whatever you just did That's pretty cool Postgres QL also has a strong ecosystem of supporting products that do all kinds of interesting things One example is post GIS which brings geospatial capabilities to the database It's got a great community and it's open source So please don't take anything that I say in this talk to mean that Postgres QL is bad That's not my point My point is rather Wow, that's a lot of echo I think the problem is I have this mic on too Okay So my point is okay Can somebody stand up there and advance the slide when I? When I need to do that John can you thanks? So so my point is rather that we've gotten to a certain level of success That we have today with Postgres QL And these are the things that I think we need to do in order to get to the next level of success next slide so You know if you are a longtime member of the Postgres QL community You may be tempted to look at some of these issues that I'm going to talk about today and say I you know Yeah, that's really an issue, but it's not really that big of a problem Well, I think it is a big problem or it wouldn't be in the talk On the other hand, especially if you're a newcomer to Postgres QL You might be tempted to say oh my gosh Postgres QL has problems I totally shouldn't I totally shouldn't use that and that's not the right reaction either don't panic These are problems, but the things we can fix or And and end up with an even better system than we have today So wow now there's two people up there This is getting very crowded. So yeah, so here's my list of elephants I'm not going to spend too much time on this slide because all of these things have their own slides So next slide So Postgres QL currently uses buffered IO, which means that when you Read when we read a page into shared buffers It first gets copied into the operating systems page cache and from there it gets copied into shared buffers When we write a page we write it out from shared buffers to the operating system page cache And then the operating system eventually writes it back to disk Postgres QL is pretty much the only major relational database that does this And it might be a bad idea we should at least consider whether it's a bad idea One of the problems is that it is very likely that the buff that the buffers which are present in shared buffers will also be present in the operating system cache and if you cash two copies of something that means that you're going to have To cash zero copies of something else that you otherwise could have cashed one copy of and and that's not so good Another problem that I've become aware of recently is that newer SSDs offer a feature called atomic rights And this is pretty cool. Postgres QL has a system which actually is actually quite an expensive system That we use to protect against the possibility that the system may lose power right in the middle of writing an 8 kilobyte page So part of that page ends up on disk and the other part of the right doesn't happen because that's when you ran out of power Okay But this atomic right functionality Offers potentially a way out of that you do an 8k right and you say it's all or nothing either get this whole thing to disk or just forget the whole thing and These SSDs have this and exists. It's exposed via a Linux kernel API that you cannot use With buffered IO you can only use it if you're using direct IO Sucks for us So given these things you might ask why in the world do we use buffered IO and the reason that's traditionally given is that We think that the kernel is going to do a much better job of scheduling IO then we're going to be able to do ourselves And we don't want to re-implement the smarts that are in the kernel So let's just take advantage of the kernel getting smarter and smarter rather than re-implementing ourselves But I don't think that's really what's going on here What I think is happening mostly is that buffered is that buffered IO Allows us to paper over the times when we make bad buffer eviction decisions Next slide. Okay So here's a graph that my colleague Jan Wieck made As a result of some PG bench runs that he did The blue line on here shows the number of transactions per second that we got During each second of this test and as you can see it's very spiky some of the time It's up between 1,500 and 2,000 and other times is down near zero, which is obviously not what we'd like to see But then Jan made a really interesting observation Explaining why that's happening the shaded red area on this graph shows the number of dirty buffers at each point during the test and if you look what you can see is Every time the number of dirty buffers go starts to go down our performance tanks Every every time when that red number is sloping down because the kernel is doing those delayed rights to disk that I was talking about our performance dies so I When you see things like this you really have to say so just how intelligent is that kernel IO scheduler that we're supposedly benefiting from I'm not saying there aren't other ways to improve this, but that doesn't look good All right, it's all I got to say about direct IO if you're not mad yet wait till you see the next section Okay, so the next thing I want to talk about is our on-disk format. This is another graph from yawn it shows the results of running a TPCC like benchmark on postgresql 9.3 and also on another database product Okay Now you can see that in the first 30 minute run This was a series of 30 minute runs the first 30 minute run We did pretty darn well But we couldn't sustain it as we did additional runs on the same data set our performance declined and the performance of the other Database product remained much steadier, which kind of stinks right the reason for that is the size of our data we started out about a third larger and Both databases grew over the duration of the test But by the end of eight runs we were 80% larger Then the other product well being 34% larger isn't great being 80% larger is Really pretty bad, right? This is not going to happen on every workload It's not going to happen in all situations, but on this test in this workload. That's what happened. It's not good I don't know which kernel version that Okay, according to Andres who talked to yawn. It's 2.6 point 32 No You can't ask that okay, so So so there's two basic problems here right our data is too big and it grows too fast and Both of those things happen for a variety of reasons the data is too big Because one one big reason is that our tuple headers are huge compared to most other products They're 23 bytes. It doesn't sound like a lot, but when you got millions of or billions of tuples it adds up We waste based on alignment padding I'm not sure how significant that is and there are data types specific issues where we just don't do a great job in some cases of Framing down values into the absolute minimum number of bytes possible That doesn't sound serious sometimes it's not serious, but when you have enough data Again, those bytes start to add up to very significant amounts of space and We grow too much Improving vacuum as we've done over the years is good. Don't get me wrong. It's very good It's made life much better for so many people, but it only contains bloat. It does not prevent it The other product on that slide there can update rows in place without having to put another copy of the row into the table and then later come back and get rid of it and that is of course a huge benefit on a test like this because a table that is update only where you just replace One version of the row with another version of the row. That's the same size and another version of the row That's the same size and another version of the row is the same size You zero out your growth on that kind of table. There were other tables here where where both products increase experience the increase in database size but There were some tables where the other product had no growth and we had growth. Yeah So the question it was whether Okay, I totally missed something there Okay, so the question is does this other product require that the rows be fixed size? No, it doesn't it has some method of coping with that. It's complicated. I'm not here to explain how said other product works So, you know a related problem is that When you're able to update rows in place Then you only need to insert new index entries for the For the indexes where the corresponding value has changed We've got hot updates which are great They touch no indexes, but when the index but if even one index column is updated We then have to go insert index entries for All of the indexes not just the ones where the value changed but for all of them and so there's stuff We can squeeze out here, right? There's an opportunity for optimization okay, so That's all I'm going to say about the on-disk format Okay, so now I want to talk about replication And I think the first question about replication is well, why do people replicate data? There's a bunch of reasons. There's high availability You have two copies of your data so that if your main server goes down you can fail over to another server that has a copy of all that data You might want to do a database version upgrade So you install a new version of the server replicate everything over and then swing your traffic when you're ready Multi-master replication if you have geographically distributed stuff That's a use case and then there's re-scaling and You know, I think it's important to emphasize that all of these things are things that you can do today But the only one that I think we've really nailed is the high availability use case streaming replication Works really well for that. It's really reliable does a great job. These other use cases. There are lots of tools available but You may end up with a combination of tools you may end up using different tools for different purposes the list of Ways that you can do these things on this slide is not intended to be comprehensive So if I've left out somebody's favorite tool for doing one of these things I apologize for that but But you do tend to end up with a bunch of different tools to do these things And so basically up until 9.4. You only had two real ways of doing replication. You could use Streaming replication which was great for what it did right which was mostly high availability Although there's definitely ways with additional tooling to use that for read scale out and then you had trigger-based replication solutions Which were a lot more flexible what you do a lot more different kinds of things But there are issues around the performance of those solutions the complexity of those solutions and frankly there were also issues with adoption I mean How many people in this I mean how many people in the PostgreSQL community have written a trigger based Replication solution for PostgreSQL. I know I've done it Anybody else? Yeah, so there's like at least ten people in this room who have done that, right? And so what that leads to is a fragmentation of the developer community many of these solutions even really well-established ones that lots and lots People are using well only have one or two people working on them And that's not a great thing and the user communities are also split So in 9.4 we got a big step forward here Which is we got logical decoding so you can have a process that reads the right-of-head log and turns that back into a Stream of inserts and updates and deletes Andres did that work. It's really cool Yay, Andres But we're not out of the woods yet because you can't use logical decoding unless you have a replication framework that knows how to use logical decoding and We need a lot more here We need to be able to answer all of those use cases that I had on the previous slide with simple Reliable high-performance well-supported solutions that are that are in core We need that that's all I'm gonna say about replication By the way, the patches for all of these things are due next week. So you guys better get coding All right. Yeah, okay, so now I want to talk about horizontal scalability And I think one of the really important questions that comes up with horizontal scalability is do we really need that and It's not a dumb question. I mean database sizes are growing Rapidly, but you can get really big servers these days I mean you can get a commodity server with a terabyte even two terabytes of memory in it And then you can put your whole database in memory if you can put your whole database in memory You can probably make that database pretty fast, right? And these aren't even enormously expensive boxes And a lot of people's databases are nowhere near that big a gigabyte 10 gigabytes 100 gigabytes That's easy to fit in memory if you want to and you your controller isn't too tight-fisted, you know You can solve that problem So while there are certainly people who have databases that are tens of terabytes hundreds of terabytes pay-to-bytes There's an awful lot of smaller databases where horizontal scalability maybe is not really as As big of an issue as maybe we sometimes think it is One case where it is a problem though is with right scaling It is much easier to put more memory into a machine and make that machine able to cache a larger database That it is to get that database that that same machine to be able to write lots and lots of data changes to disk Really really fast that is still a hard problem to solve Another reason why people say well, you know, maybe we don't really need horizontal scalability is That you know, it's often better to push some of the logic about knowing where the data is back into the application Instead of just throwing everything into one Massive database that is going to internally do all kinds of magic to spread your data around across multiple machines And everything is going to work wonderfully It might be better to have some application level knowledge of where all of that data is located You can then partition the data in intelligent ways. I mean that the reality is that Network latency is much higher than memory latency, and that's not going to change so putting related data together and being intentional about how you group your data together has a lot of value and if you do that you may very well find that there is never really a use case for a scale-out architecture within the database on the other hand There are certainly existing applications that were not written that way and sometimes people need to take one of those applications and scale it up And Maybe the biggest thing is rightly or wrongly people who love other systems Think that auto sharding works And they don't like it that we haven't got it Whether they're right or whether they're wrong that's a little more arguable But they think they're right and if we don't provide that solution they may decide that they're going to use something else So We've got a sort of a horizontal scalability Solution that has been maturing very slowly over the last five years in Postgresql 9.1. We got foreign data wrappers so that you could access data on a remote machine using an SQL interface I think that was an enormously successful interface lots of people were used it And it's been improved in every release since that in small ways 9.2 got better planner support and statistics collection and a bunch of other little stuff 9.3 got the ability for foreign data wrappers to write data as well as read it, which was very important change It also got Postgresql FDW into the core distribution so that using only code that was in the core You could make Postgresql talk to another Postgresql That's obviously really important 9.4 got trigger support for foreign tables And 9.5 DeVal as of a few days ago allows foreign tables to participate in inherent inheritance trees and you guys may know that inheritance is sort of the way that we handle partitioning of data in Postgresql so the ability to put Foreign tables into inheritance degrees should sound a lot like the way we do sharding in in Postgresql Which I think it will come to be But We need a lot more here There is a slide missing There there is a whole bunch more improvements that are needed to this architecture In order to really do scale out with it Well, and we are creeping toward those improvements, but we are not getting there I think nearly as quickly as we really need to get there For example, if you're selecting data from two foreign tables And you're joining them you would like the join to be done on the remote node rather than on the local node And right now you won't get that Probably even worse if you are say doing an aggregate over the data Like a really simple case is select count star from foreign table What we're going to do is ship all the rows back from the remote node to the local node and then count them locally and It will be a lot more efficient to do the counting on the remote node And then just bring the count back and give it to the user, but we can't do that yet Right and there are other things too to have a real sharding solution you might for example want to have consistent visibility between a group of servers and Postgres Xe has developed technology to make that happen, but in core Postgres We don't have that technology, so there's a whole bunch of different things that are needed That that need to be done in order to In order to really make this This technology production grade another not production grade But to make it better and more capable and able to address a broader variety of use cases Another one is asynchronous scans right if you have foreign table if you have multiple References to foreign tables in the same query we could kick off all of those scans at once at the beginning of the query and Then wait for the answers to start to come back But today we don't we kick them off one at a time and when the first one is done We kicked second one off when the second one's done we kicked the third ones off So it's sort of a poor man's version of parallelism But it's it's an important performance optimization, and we don't have it today I'm gonna be interested to see whether the slide that I just talked through without having it shows up later in the deck or if It's just totally gone okay, so Another place where I think we need to improve is a parallel query Obviously a big part of the driver for this is that CPU and core thread CPU core and thread counts are still increasing fairly rapidly, but single CPU performance is really not going anywhere I mean, it's increasing a little bit, but it's extremely slow And so as your data grows it becomes more and more urgent to be able to operate on that data with lots and lots of processes at the same time There's good news and bad news about parallel query The good news is that My colleague emit Capella and I have a basic implementation of parallel sequential scan. That's pretty close to being done I don't know whether it will go into nine five. I I think if it doesn't it will go into nine six there There is still work to be done there, but I think we are getting Fairly near to the end. We're certainly in off despite Andres shaking his head over there We are an awful lot. We are an awful lot closer to the end than we are to the beginning And and I feel pretty good about that Unfortunately by itself parallel sequential scan is not Going to get us where we want to be right? We need other types of parallel scans for example somebody mentioned to me last night a parallel bitmap index scan If you imagine that you have a huge relation You can use a bitmap index to figure out that 75% of that big relation you don't want to scan at all, but the remaining blocks you'd like to scan in parallel So we're gonna need that We're gonna need parallel aggregates parallel joins Parallel utility commands probably a bunch of other stuff that I don't have on here There's gonna be a lot of work Some in terms of implementing additional parallel operators But also quite a bit of work in terms of making the query planner smarter So that it can Come up with the best possible parallel plan using the operators that that we have So, you know in one sense I feel like I'm just about to complete what has really been a really long journey I've been working on this for like a year and a half, and I might have another six months to go But on the other hand there is an awful lot more that needs to happen after that And my hope is that all of the infrastructure that I've been building for this project over time will Will make building the the second installment of this a whole lot easier than building the first installment has been But we still have to we still have to get to number one before we can move on to number two So, yeah And the last major area that I want to mention is connection pooling So there's a lot of reasons why people use connection pooling and a lot of people who are using post-crest QL Today are using connection pooling connection poolers are I think one of the most popular add-on pieces of software for post-crest QL Some people use PG pool I think a lot of people recommend PG bouncer in cases where you can get away with it because it's simpler Which of course has upsides and downsides, but if if you can manage with simple then that's better But I think we need to really give some serious consideration to why Do we need connection pooling and why do we tell all of our users that they need to go and Install a connection pooler to solve problem x or problem y or problem z And I think in order to answer that question we need to sort of list what the use cases are and I listed the five That I kind of know of on the slide here load balancing Right you're you've got a bunch of read replicas and you're using your Connection pool or to direct some connections to one server and other connections to another server High availability your connection pooler is going to reroute the connection from one server to another server in the case that you Have a failover event Admission control you don't want the database server to get slammed by a huge number of processes all trying to do very complicated things all at the same time and Totally swamp the machine so you use the connection pooler to throttle the rate that queries are hitting the database To avoid the overhead of back-end startup and shut down which is relatively expensive and post-crest QL and and even for Replication where you actually have the connection pool or run the SQL statements that it thinks are going to write data on Multiple machines so that if the stars align and everything works absolutely perfectly you will have two copies of your data And all of this is fine right there's a lot of useful solutions that can build that can be built this way But there are some disadvantages having an external connection pooler in the loop increases administrative complexity It adds an additional point of failure I can't Conveniently count the number of times that people said well I don't have to worry about my database going down because my connection pooler will reroute the connections to another machine And then I said what if the connection pooler goes down? Oh I didn't think about that And and of course connection poolers also add latency because now every protocol message between the database and the client Has this additional hop that it's got to jump through in both directions and You know, I think to some degree we're relying on the connection pooling to solve problems that we would really be better off if We could solve those problems in the core database server. For example admission control. I think is a particularly Clear example of this. Why should it be the user's problem to make sure that the database server Doesn't get more connections at a time that it can handle gracefully Should it not rather be the problem of the database server to say wait a minute You just sent me a query, but I'm overloaded right now. So you've got to wait It's exactly the same thing doing that in the database server as it is doing that in the connection pooler Except that you need one fewer component to make it work And you don't have this additional complexity point of failure latency So So postgresql is already a tremendously successful system It's being used by a lot of people To solve a lot of really complicated problems. It's being used by a huge number of Big companies. I know that because there are customers and And what we should do is We should try to make that system even better and what I've tried to do in this talk is sort of outline some of the Areas where I think the big work needs to be done Obviously, there's a lot of small things about postgresql all over the code base here and there you can find Well, we could tweak this thing and make it a little better We could tweak that thing and make it a little better But these to me are sort of the the the the big things the really large projects probably multi-year projects to really start making headway on these and And get to some place that's really exciting I think in the areas of logical replication and horizontal scalability and parallel query We've made a start the logical replication work as I said before is really due to Andres and of course Simon and everyone at second quadrant who who made that possible and all the people who helped fund it the Horizontals and and I reviewed it. Yes, I did review it And the horizontal scalability stuff has been really led by some of the Japanese guys And they just need more time We need more people to spend more time on those solutions to make them better And of course parallel query is is largely my work with help from a bunch of other people at Enterprise DB So we've made a start on all of those areas, but we need to do a lot more in order to ensure total world domination In some of the other areas that I mentioned direct IO and changes to the storage format and building in Connection pooling technology. We really haven't tackled those issues. We think of those as things like oh, yeah You know it would be really great to do something about that, but it would be so much work I'm not going to do that project and then we find some other area of the code to improve And so I think these are things that we really need to think about doing in order to in order to make the product better My final note here is please help like you not somebody else you help There's a tremendous amount that goes into building these solutions You know I and my colleagues on the Postgres QL mailing lists who are Committers play one role in that which is to decide when the code is at a level Where it is ready to go in and become part of a product that's going to be unleashed on all of you But that is far from being the only important role There is review by lots of people that is needed to get those patches to the level of quality Where we feel comfortable giving them to all of you there is the writing of those patches Someone's got to write the code There is the funding of All of that activity those patches do not get written by volunteers They get people who are paid by Postgres QL companies There are certainly some patches that get written by volunteers, but to make progress on these big problems We need more than that we need people whose job it is to write those patches and that requires money So all of the people here who work at Postgres QL companies Will be familiar with where the money to fund this comes from and that's that's from customers, right? So we need help from from people financially writing the code reviewing the code Committing the code and then and this is also a very important part discussing what the solutions should look like and testing them Once they are available to make sure that they actually meet the needs that you have so even if you're not a coder and even if you Don't have any money There are still a lot of things that you can do to help make Postgres QL Even better because if you just leave it up to people getting around to other people getting around to doing something about these problems Well probably happen, but it might take a really long time. So please contribute to that effort in any way that you can Somehow I'm ahead on time Which may be attributable to the fact that I spent the last week being frantically worried about being behind on time So I'm done Thank you very much And if you have any questions, I've actually got time for those. So thanks Right. So the question is how much of the process is Deciding on what the solution should be and how much of it is actually coding it I Think it is both What I would say about that is that If you are the kind of guy who writes code and many people in this room are whether you are or not you may have a tendency to think that writing the code is the important part of the problem and That the deciding what it should do is the kind of BS thing that the community makes you do before you get to the important part Which is writing the code. I don't think the evidence bears that out, right? I actually think that that process of figuring out what the solution should be which is sometimes called design It's actually an Incredibly important part of the process and I think that you know Andres and I have reviewed each other's patches a lot and both of us go crazy With the things that the other person tells us we need to change in our patches, but the patches get better Right not everything that the other person suggests is an improvement But that's going to be true in any relationship that you have with another person in any situation, right? So, you know, yeah, the design and the discussion of the design and the discussion of how the feature should work and all that Kind of stuff that is a really important part of the problem and all of the things that I talked about here are Very complex in terms of figuring out. What are the right next steps? But I don't think that as developers we should step back from that and say well, geez There's no hope in working on this project because I'm gonna have to figure out a design You should figure out a design and even if the design is the only part of it that you can help with that's okay That's great. That's a huge contribution to help us understand what the design should be I think that comes out in the logical replication discussions where one of the things that's come up is Well, we're gonna need control mechanisms for this to control what gets replicated and how it gets replicated and all of that kind Of stuff and do we really know what those should mechanisms should look like because if we don't know the use cases We might end up designing some putting something into the core product that everybody has to live with for 10 years That really isn't that good and we definitely don't want that to happen so Yeah Yeah Yeah, I think I think resource management as a general concept is a pretty important concept because we've got lots of evidence that when you over consume a certain resource the performance of the system goes downhill a lot in Enterprise DB's fork of PostgreSQL. We actually built resource limiting mechanism into our version 9.4 And I think it is very likely that the community will eventually do things in that area as well So yeah, I think resource limitation is part of it in many ways the the transaction The the the the limiting of the number of active transactions could be a proxy for limiting a lot of other kinds of Resource utilization and I don't actually know what's best here, which gets back to that whole design question, right? Exactly. What is the best thing and how do you limit and how do you limit it? Limiting connections is one thing We've done a couple of other things that you can limit in in in advanced server 9.4 But some experimentation and testing will be needed in order to figure out Which of those things are actually most valuable to people in which use cases It is although Right that's only so accurate like it turns out that your plan costing can be way off and you can still get the right plans Almost all the time so adding more dependencies on the plan cost is is a little iffy, but yeah, great So Greg's question is whether I have anything to say about a particular topic But more specifically his question about is about trade-offs and what do you do when a particular change? Is going to make things better for some people and worse for other people. It's a good question I mean, I think you have to consider it in specific. I'm not aware of a reason Why? Using direct IO would be better for some people than other people or why it would regress anyone It probably would help people on large databases more because there's an awful lot of postgres databases in the world Where let's face it. There's not that much happening, right? People spin up a database because they need a database, but it may it may not actually process very many transactions I mean, we have a we have an actual customer that I was working with on a support ticket and their transaction volume was I think one commit per week, but that commit But that commit was really important, right? Okay, and you laugh, but no seriously that commit was really important. I can't tell you what it was, but trust me it was important Okay, and we're having a little raffle here for a training And then I'll take that question Somebody wins a free training it looks like it's Ryan Malaki Okay, all right, you win a free training You got a question. Oh, it's a great idea. It would just be better if you didn't have to Right. So there are definitely problems with the with the connection pooling Maybe making bad prioritization decisions about which queries to run what right away and which ones to put off Nevertheless, there are people getting enormous performance benefits out of using using connection pooling. I mean really huge benefits So, you know Yes, you could possibly make some of those things smarter if you put them in the core database And that's why I think we should consider doing that But people are also solving these problems in the connection pool or very effectively My colleague Kevin Grittner who's not here today We'll be happy to talk your ear off about what he did at the Wisconsin courts Which basically involved limiting a huge volume of transactions down to something like 30 database Connections and seeing response times just go through the roof and to do that He had multiple prioritization levels and multiple pools so that certain things didn't interfere with other things So it's tricky, but it and it probably remained tricky Even if some cases will probably remain tricky even if we have something in core But the benefits really are there if anybody in this room has a server with 500 database connections as a typical setup You should probably put a connection pooler in there and it will probably be faster if you if you tune it right and Force that down to be shared among a smaller number of connections Peter So Peter's question is about the importance of improving our caching algorithm I think if we want to give direct IO a try it's going to be Absolutely necessary because if you make a bad page eviction decision When you are using buffered IO It's not great But it's not that bad either because you write the page over to the OS and then you're like oh crap I need that page again, and you read it back in from the OS cache And so you copied an extra 8k of data out you copied an extra 8k of data in you had a couple of system calls It's not wonderful, but it's not terrible if you make a bad decision about Writing a page at the disc and you write it out to the Physically write it out to the disc and forget it completely from all of your caches So that you avoid cache duplication, and then you discover that you need that page It is a lot more expensive to go get that page back So to have any hope of using direct IO we would really have to make sure that our Decisions about which things to cash and which things not to cash were absolutely top-notch Otherwise we'll get creamed Yeah, so the Linux kernel developers I think to describe why oh atomic used oh direct. I think they were using words like nightmare Now I'm not a Linux kernel developer So I can't stand here and say those guys are all weenies and you know, they should really just Figure out a way to do that and maybe they will It would be nice if they did but it doesn't seem like they have any plans to do that right now I do understand kind of why it's complicated, right? Which is that if you do a direct right all you've got to do is Go and write it and then when the right is done You can forget that that was supposed to be an atomic right if it's a buffered right and has to go in the operating system Buffer cache you have to remember whenever it eventually gets written out This range of bytes has to all be written out atomically and by then more atomic rights Have may have come along even in overlapping byte ranges, which wouldn't be an issue for us We would not have overlapping byte ranges, but you know in general if they're gonna support that in the kernel They have to cater to those possibilities So it's probably a tricky problem to solve that with buffered IO might not be totally impossible, but but tricky definitely tricky Jeff absolutely Yes right, so Jeff is absolutely right and you know One way of saying that is complain early and complain often You know because I mean I don't run a production database I developed database code right and I Have run production databases, but I don't do it right now when I haven't run your production database, so knowing For me and for I think ever for every PostgreSQL developer having feedback from people at conferences and on mailing lists and and Every other opportunity presents itself to know what would be most helpful to you I mean you you've seen the things I talked about today come up to me or any other or the any of the other hackers in The room after this and say which things you think are most important for you which things are least important for you The inputs always really helpful. I think that we are out of time So thank you everybody for coming and I'll be around if you want to grab me afterward