 Well, let's get started. Hi, I'm Peter. I've been a contributor to the Postgres project since 1999. And now, my fix. So you can reach me at these addresses. Just feel free to say hi afterwards, or send me a question afterwards. Thank you, David, for the laptop, because I didn't have one with HDMI. And so I'm going to talk about how Postgres is tested. So if you've been here for the previous talk by Simon, he introduced you how you can write a patch. And then this is more of a deep dive into how we make sure afterwards that the feature works and continues to work. So this is a little bit of an archaeology talk, in a way, of how we got where we are now. So the first reference to testing is Postgres 4.2, which I downloaded the other day. You can still get that. That's the last release that came out of the University of California, Berkeley. So you can download that from the website. And that already has a test suite in it that we effectively still use today to a large degree. That's what we nowadays call the regression test suite. So that already was included way back then. So that's not a new thing at all. And a lot of, if you look into the test code in the details of the regression test suite, there's a lot of stuff like street addresses in California and coordinates of sort of geometrical coordinates of things that related to the University at the time. So they already did a lot of that work. So the source code layout in the Postgres 4.2 was slightly different. So nowadays we have the main test suite, the regression test suite, and under source test regress. And that has subdirectories that look like this. And this is a repeating pattern, which I'll come back to later. So this is a pretty simple idea. And this was developed in the early 90s, or maybe even late 80s. So that was before we had all the fancy testing tools that we have today. So you have three subdirectories. SQL results in expected. And the SQL subdirectories contains a bunch of SQL files. And when you run the test, all those SQL files are executed using the PSQL command line tool. And then the output is captured. Just whatever it prints out is captured. And that is stored in the results folder. And then that is compared just using a text comparison. Against what is in the expected folder. And if that matches, then test pass. And if it doesn't, then you have to dig in. So that's that. That's how we've been doing testing for the longest time. And to execute that, you can run the command called make install check. And that was sort of the original setup. You first had to build Postgres, install Postgres. You have to set up a Postgres server. Make a directory, run NADB, start it up. And then you can run make install check, which runs that test suite that I just described against the running Postgres server. And that used to be very complicated. Because it's like eight steps to actually get there. You have to build the code, install the code. You have to make a directory. You have to run NADB. You have to start the server. You have to make sure it actually starts, it finishes starting up. And then you go back to the other window, and then you run make install check, and so on. So before, even away all days, you had to set some library paths and all kinds of other weird stuff. So this was sort of an eight to 10 step process to actually run the test, which was very complicated. So one of the first great advances in this, in a way, which we might laugh about now, but is in a 7.0 release, make check was introduced that automates all of whatever I just described. And that's what we really want to do, right? If you have a repeating process, then we just automate it, right? So make check, build and installs everything in a temporary place, and then runs those tests. So it's basically only one command that does everything. And that's great. That's what we want. And the other thing that was introduced that make check introduced sort of at the same time, but not really related to that, is that it runs some of those tests in parallel. So there is a certain way these things are scheduled. So in that SQL sub-directory, there are, I don't even know how many there are, like 40, 50 files maybe in there right now. And it is sort of set up manually that the test runs some of those in parallel. Not really in a way that they interact with each other, so it doesn't do anything like that. That's something we will come to later, how you actually test multi-session behavior. It just sort of, well, let's just see what happens, right? We'll throw a bunch of things in there in parallel. And we'll see if even just, you know, you can even have more than one session. And does the locking work just basically right if you have multiple things going on? And so it's sort of more opportunistic. We'll just throw a bunch of stuff in there once and see if the system stays up. Then the other effect nowadays is actually, so that was the idea back then, it's like, do we even hold up with many sessions, but nowadays it's actually faster to run stuff in parallel, right? If you have NCPUs, then just running n things, it's going to be a little bit faster just to, and that's something we want. We want to get the tests run as quickly as possible just so we can use it as part of development. So all of this logic is in a program called PG Regress, which is also in that sub-directory source test regress, that used to be, like many things in Postgres, used to be a small shell script, and then it became a big shell script, and now it's a big C program because it had to be ported to Windows. So and that contains all that logic of taking these SQL files, running them through PSQL, starting the server before it is necessary, running the diff afterwards, gives you a nice print output, and that's what PG Regress does. So it's basically a custom test rival that we wrote over time. So the way this test, this test, we have 20 years of Postgres, and then the university Postgres was before that, so it has held up quite well, but it has lots of problems in terms of what you can test with it. So first of all, you can only test things that you somehow are somehow exposed by the SQL command interface. So there are a lot of tests in there, the tests, all kinds of things. Can you create a table? What about this data type? What if I put a text input into an integer? What, you know, geometry types, network types, all these different edge cases, insert, update, delete, joins, does it actually join the stuff in the right way? And then all these different complicated queries, all that stuff, you can test great. You can, you know, foreign keys, does it really work if I cascade things? All those things you can test great, but anything that's not SQL, you can test that way. So anything that is background behavior, vacuum, is a good example always, or any front-end tools, PGDump is not part of this at all, right? So it's only that. Next problem is that the test always has to print out exactly the same thing, because of the way the diffing works, right? We just capture the output and compare it with the output that we had before. And it has to be exactly the same. So anything that has sometimes different behaviors is a problem. And there's multiple reasons why this could be. Sometimes you just, you know, you had there's sort of certain randomnesses in the way statistics are collected sometimes, or what happens on the system at the same time. If a vacuum runs in the background at the same time, some things might change, the plans might change as a result of that, or the scan order might change. And then one easy way around some of those problems is, well, you just put an order by at the end of every query, right? That's sort of a beginner's suggestion. Well, it was just order by, and then the order is going to be consistent. But then if you order everything, then you only test things that actually do sorting. And you don't test other plans, so that's a problem. You don't want to just order everything. So that's a hard balance to strike there. So there are sometimes platform differences, especially in sort of floating point behavior. So there are a couple of ways to work around that. There's a way to have different expected files for different platforms. And that mostly has to do with things like the floating point, or sometimes the time zones are configured differently. So that's kind of an annoying thing you have to take care of. If you configure Postgres differently, you might have different output. For example, if you configure without XML, all the XML tests will fail. Well, most of them, actually. Some of them will not fail. So you need to account for that somehow. One annoying thing is if you have an external library that is used somehow. For example, XML library or Python library for PL Python. If they change, something might change, and then the error message might look different. So then the straightforward diffing doesn't work anymore. So you have to maintain a lot of different output files, or you have to write your tests in a more clever way. So those are all problems you have to battle with that. And the last point there is, in a way, this is the wrong way to do test-driven development. Because the ideal way to do test-driven development is you write the tests first, and then you make the code match the test. That's how you're supposed to do it. And I guess you could sort of do it that way, but it's really hard to do it that way. The normal way you work with this is that you write your code, then you run it, and then you write some tests. And then the tests will fail because the output isn't there yet. So it's just going to be a diff of empty versus whatever it prints out. And then you look through that and say, yeah, that looks right. And you copy that from results to expected. You copy the exact output file from results to expected. And then you run it again, and tests will pass, hopefully. So it's not really a proper sort of test-driven approach of, I want the code to do this. It's more like, well, this is what the code did last time, actually, somebody checked. And now we'll just record that for forever. But that's how it works. So nonetheless, this has been a very popular approach. So it has spread around to, we were test all kinds of different modules and extensions that way now. So the first tests that were written this way were in 6.5, the multi-byte tests. So they test different encodings. And in 7.1, the first contrib modules had tests that way. So they have their own SQL expected result, and you run make check in there. And then it just tests. They're very small, usually. They just test that whatever that module is doing, it works the way the author thought. And that has spread around almost all contrib modules. Many, if not all, almost all separately distributed extensions have tests like that. So if you download an extension and they have a test, it probably looks like that. So if you want to make a change to any extension module that works like that, you download it, you run the test, you want to make a change, do it the exact same way. You edit the SQL file, run the test, copy the results over if you're happy, and then submit the patch back. That's how many, almost all, extension modules are tested in Postgres. So almost forgot about ECBG, like many people do. But the guy who wrote those tests is actually a friend of mine. I'm staying at his house in Manhattan. So I figured if I don't mention that, he's not going to let me back in tonight. There was a Google Summer Code project to test ECBG, which is the Embedded C preprocessor. And it works the same way, kind of. But the input and the output is different. But it works the same way as like we have an input file. We run it. There's two stages to this. First, you take the source file. You run it through the preprocessor. You get an output file. You can compare that against what you expected. And then you actually run the program. And then you compare the output of that program. So it kind of takes the same. It shares some of the code with PGAgress. It takes the same approach, just has a bunch of input files, runs them through two stages in this case, and then compares the output. And it has very good test coverage of all the different features in ECBG. So it's quite well tested. And as Simon mentioned earlier, we to kind of put all of this together. So we have these different test suites, which sort of test features. And then we have the other axis is portability. So we have Postgres runs on all kinds of different operating systems, CPU architectures, different configurations, old versions, new versions of operating systems, and things like that. So we have, in order to be able to know that it runs on all these platforms, what we used to do is we, before 8.0 days, so in the days of 6.x and 7.x era, when I got started, before the release, when the release came around, when it was sort of beta or so, we send out an email, like, guys, it's time to test. So if anyone has an odd box, an odd operating system, release test post goes into report back. And then we used to collect these emails and sort of maintain a list, like, OK, somebody tested on Spark, somebody tested on Debian, somebody tested on Suisse. OK, so nobody has tested yet on NetBSD on Alpha. So we'll send out another email. Is anyone have a NetBSD Alpha sitting around and saying, oh, yeah, I'll find one. I have some over here. So that was a very manual process. And then at some point, somebody automated that, as we do. And that was Andrew Dunstan has led the efforts to render here? No. And so the build form is basically the automation of that process. People have old boxes, usually old boxes, lying around somewhere that are not necessarily doing a whole lot, and they put this build form client on it, which just builds Postgres on the schedule that they set themselves, and reports back via a simple little protocol to the build form server, which you can find at that address. And it shows red, green, all the colors of what happened. And then, so I don't exactly know when it got started. But the reason I put these releases on there is I saw in the release notes of 8.0, that was the first time we mentioned the build form in the release notes. So instead of having this list of, oh, this guy said he tested on NetBSD Spark or whatever, and this guy said he tested on Suisse on Alpha. We started actually mentioning, well, this was actually just tested automatically on the build form. And in 8.3, actually, we did away with this email loading stuff around. We did away with that altogether. So as of 8.3, the build form is it. So the build form tests whatever if we claim to support a certain platform, the build form proves it. And if it's not on the build form, we can't really say anything about it. And the build form is a great project. It's been a great resource. There's some issues with that, one of which is that most of the build form boxes are old and donated because they're not doing anything else. So they're usually quite slow and hard best fragile sometimes. So it's not something that may be a commercial company which is buying new servers as build servers. We rely on old and weird boxes to do a lot of that work. And the other issue at times is that these boxes are in someone's basement or data center or somewhere. So we don't have actually access. We just rely on them to maintain them. And if they break, they have to hopefully fix them. Or oftentimes they're just trashed at that point because they're so old. And so we can't really control this process. We just kind of hope that people do it themselves and report back or maintain these boxes. So let's think for a moment of why we test. Yeah, well, obviously we want to test to know if it works. But who are really the stakeholders? And for initially, in the old days, we just needed to know whether it works at all. And then we needed the build form to know whether it works on all these other variants that other people might need to know. So when the release came around, we just needed to see like it doesn't actually work still. But I think over time there's been other stakeholders and there are many, one of which is just developers themselves. You want to have some, if you want to work on a piece of code, you want to, hopefully there's already a test for it there. Which if you write code, if you write new feature, you write test for it, is that great. But if there's already a test for it, that's even better because then you know you haven't broken anyone else's stuff that they have done in the past. So it is also, it's for your own development process. It is for other developers to know what you're doing. And then it's for the release effort, the release team to know whether all that stuff together still works months later on all these different platforms. And then it goes out to packages that build it in slightly different ways. And they want to still know that they didn't mess it up. Because they are sort of the representatives of the project at that point. They, we ship a source tarball, but the packages really take that work and then actually sort of give it to the end users in a way. And so they want to make sure they didn't mess it up. And then the users, sometimes the, oftentimes the users just take it and assume it works. But many users just want to run the tests themselves to know that however they installed it in whatever way they thought they want to install it, it still works. And then there's commercial companies that do stuff with Postgres. They package it themselves or they fork it or they build other products around it. And they, as I've learned, they use a lot of that same test drivers for their stuff. So if some company might add additional functionality in the back end server or add extra modules and they, in many cases, they just use the same PGG regress driver to test it. So all of those different ways of looking at testing or that need to be catered to. So, which yields to sort of my follow up to what Simon said earlier is I, this, each patch should come with these, if appropriate with these things. Obviously you want to write some code and you want to write some documentation so people know what it is. And what you should think, and I think we're ready for that, that you should think if whatever I'm sending doesn't affect, doesn't have a test or changes an existing test, then you should be sad. Or if it's a bug fix, ideally there is a test that can somehow prove that there was a bug and that you have now fixed it, right? So, and then my personal pet peeve is if you send a patch, please also send a commit message with it. Because then if I have to commit it or some other commit, I don't have to write the commit message myself and I might then totally misrepresent what you actually meant by the patch. That it takes a lot of time to, when you sit there in the evening, you want to actually just close that patch out, but then you have to write a little bit of an essay on behalf of somebody else to explain why this is also awesome and how it fits in. So, if you could just send a commit message with your patch, that would be great. So, that was sort of the up to the eight dot X you know the eight dot X releases in and then at some point somebody thought, well we have all these tests, but now I have to go through and find all these different tests. So, I make a change, I want to make sure I didn't break anything, but then there's like all these different tests I have to find to run them. And so, we came up with make check world which actually runs all the tests. So, that just basically goes through, you know all the different modules and contrived backend tests, ECPG and whatever else comes around and tests everything. So, that's really what you ought to use going forward to test everything. Sort of as the, obviously as you just, if you know you're hacking on some contrived module you're gonna run that test over and over again as you code, but the very end sort of when you're ready to submit or you're ready to go, just run that to make sure everything is still fine. And so, as I mentioned earlier, there's a lot of problems with the regression test driver approach of what it can test. So, at some point, one of the biggest problems that had always been talked about is that you can't test multi-session behavior. You can run stuff in parallel and hope it doesn't break each other's stuff, but you can't do test any kind of locking or concurrency behavior, which is obviously a very important aspect of relational databases and Postgres. There's a lot of different details with isolation levels and locking and different kinds of locking. All that stuff is very important that it works, obviously, but we can't easily test that or couldn't. So, in 9.0, we kind of built a new, kind of took a framework that was already existing and sort of ported it into our framework, which is under source test isolation. And that is a sort of a custom test driver that can do concurrent tests. So, it runs multiple sessions and then this guy takes a lock and then we'll see if that other guy actually weighs something like that. Or this guy runs something in this isolation level and this other guy runs something in this other isolation level and then there's ways they can make sure they are at the right position. And then you see, can this guy see what that other guy did or should he see what the other guy did, all that kind of stuff, right? And there's a lot of stuff in there by now has been added to over time. So, the original sort of idea for this framework came from the Postgres R project, which is a replication project that is sort of a fork of Postgres that existed for a long time. But I think the urgency to adopt this came about when the, I believe it was when the serializable isolation level was implemented. So, what they call the true serializable isolation level and the developer or the people who contributed to the development and the patch review at the time thought like, this is all great and this is awesome, you've done all this work, but we can't really know, I can't see, I can't test this, right? It's very hard to set up a concurrent sessions like, okay, I do this, I do that, okay, that seems to work, but there's a lot of depth in that feature, right, that you need to somehow be able to test more automatically. So, we added the isolation test driver and then over time other people sort of added their own tests onto that for locking and things like that. In 9.2, we added a test for PG upgrade and that sort of, I myself worked on that and it came about in an unrelated way that I was working on some other feature and it turned out weeks after it was committed or that somehow PG upgrade was broken. So, if you, it was some alter table feature, I believe it was and so if you altered your table in this way, then afterwards if you run PG upgrade it doesn't know what to do with it, there's some problem. And so, you can, okay, we realized that problem, we fixed it, but there was really a problem to realize that if we hadn't found this particular problem that some other stuff might have broken PG upgrade and we would have learned half a year or a year later when we were ready to release, like did anyone actually check that PG upgrade still works? I don't know, somebody would have probably done that at some point, but that's a little problem but there's this tool that at the time, we wanted to push for people to do the upgrades but there was no automated tests for it and in order to test it, it's a fairly complex manual process, right? You have to set up a server and you have to put some stuff in it and configure it and then you have to set another server and you run PG upgrade with all these options and then, oh, you did it wrong, okay, so I have to do it all again. So, basically what I did, I automated that, I wrote a shell script essentially that does that, right? It runs, it sets up a server, loads a bunch of stuff into it, runs upgrade, PG upgrade, and then see if that works. So, that's been quite valuable over time because it actually tests a bunch of other stuff as well as part of the whole process, a lot of other tools and stuff has to be touched. So, whenever that fails, it doesn't necessarily mean that PG upgrade is broken but some other stuff that is part of this whole process will actually break, so that's quite useful. So, we also added tests in 9.2.4 in LibPQ and that was also one of those things where we just said like, we need a test for this. This is all a great feature, but I'm afraid it'll be broken before we know it. And that was some extra, you know, new contributors said send a patch so you could use URLs as a connection string in LibPQ. Great feature, great idea. So, you can do PostgreSQL, colon slash slash, server name slash, and then okay, and then there's lots of corner cases there, right? You gotta do percent encoding and slash encoding and what do you do with the pluses and the spaces and how do you do local sockets and what if you don't have a host but you have a poor, what about IPv6 parsing and there's a lot of details in there and just as the patch reviewer, like how do I test all this? Do I have to type all these things in? It's like, oh, that's okay, that works, okay, that works, okay, that works, and then the next version of the patch comes in, like oh, I have to try all these things again, no. So we just said, we gotta have a test for this and it's pretty straightforward to test, right? You just have a bunch of URLs and you parse them, have the code parse it and then see if the output is fine. So that's great, I have good confidence in that feature still, that still works the way we originally committed it because we have all these tests, so that was good. We don't unfortunately have not many tests for other stuff in the PQ, but of course, the PQ is part of like PSQL and that's part of the whole suite so we know that the PQ generally works but some of those details are important to test separately and so this is the latest and greatest addition which I was involved in. We didn't have any tests for any of the non-server tools. So we test the server and all that SQL stuff, we have a test for PG upgrade and we know that PSQL generally works because it runs all those files but we don't know anything about how CrateDB or VacuumDB or PG-based backup and all these tools, we have no way to test it. And again, it came sort of out of a need, I was reviewing a patch from someone for PG-based backup. So for those who don't know, PG-based backup takes a base backup of the server and then you can use that as a backup or you can use it more usually as a start for a replication slave. So it's sort of the first part of setting up a replication slave really and somebody send a patch so you could map table spaces differently on the slave. So you have the table spaces that point to certain directories but then maybe if you want to set up the slave you want to have them point to a different directory. So that's a reasonable feature. How do you review that? All right, I have to set up my own, I have to set up the server, make sure replication is turned on, set up another server, make sure the replication is here. I have to run my base backup, make sure it's all the different pieces configured. So it takes me just, you know, 15 minutes to set up a test harness locally and then something breaks, have to do the whole thing again. Again, so I think I got to automate this. And so we wrote some tests for that and that particular patch was also one of these things that a lot of corner cases in terms of how the mapping is parsed. So you can say, well, this directory equals that directory. What happens if you have an equal and directory name? Different cases, all this stuff has to be tested. So instead of having the patch reviewer type all that stuff in, we just said, well, we'll just write tests for that. And so the way we implemented this, obviously we could have just had another shell script like we did for PG upgrade, but at this time it was clear we needed something a little bit more forward looking and needed to run on Windows too. That was one of the problems with the PG upgrade stuff was almost like a quick hack. Well, here's a shell script that tests this. Well, let's just keep it. Whereas this, you know, with all this experience, like let's write something that also works on Windows, hopefully it works on different platforms and there's sort of a better way to report failures and things like that. So we took the, basically the test modules from Perl and the naming here is confusing because tap, we call them the tap tests, but tap is actually a protocol, the test anything protocol that just reports test results. So the way these tests actually work is they're just Perl scripts essentially that does stuff and reports what they found, right? And it's a Perl program. You can do anything with it, but generally the way it works is you you just run the program with different options and then you can check if anything happened. So it gives you complete freedom of what you can do, right? So you wanna, you run PG based backup with a totally fake option that doesn't even exist. And then you could just check if there's an error message. You run PG based backup with a little option that sets up a slave. Then you just check if there's actually something that was copied there, right? That's what the result of that was. And then you run it with these new options that we were just developing at the time and make sure that in this combination, while the table space is actually mapped to that place. And in this combination, it actually reports that this directory doesn't exist. And there's a lot of the cases, right? So, and some of these ideas were taken from other projects. So, and in 9.4, that was the first time we added those. We added small tests across different front end programs. So we added some for PG config, PG CTL, PG based backup, create DB, create lang and all these different things just to make sure they work and in a way, those tests were not super useful but it gave us the opportunity to actually work out how this test framework should actually work and make it portable and all these things. And in a way, it turned out quite successful because people other than myself have added on to that greatly. So in 9.5, we added a test suite for SSL. Again, that's one of those things if somebody sends a patch that changes something in SSL. Here is an option to check some other certificate in some other way, right? There's a lot of ways like that in SSL. So, oh, that's great. So now you want me to set up SSL locally so I can replicate that. That doesn't sound very attractive. And then ultimately those kind of patches just linger around and nobody wants to look at them. Like, I don't want to set up some SSL stuff here to just look at your thing. So now we have a test suite. It does all that, it runs through all different cases like make a certificate, make another certificate, have a baroque certificate, see if that doesn't work. Does the authentication work that way? Does it actually? What if you have client certificate and server certificate? No server certificate, outdated certificate, all these things. So it's great. So now if somebody sends an SSL patch, I'm much more willing to actually look at it because I have some way to actually test it, make sure it doesn't break anything. I have a fairly easy way to add more tests. So hopefully the original developer already added some more tests to describe what that feature changes. And so it's much easier to review stuff like that. And a lot, you know, SSL is kind of stuff you want to have some confidence in that you didn't break it. And just the latest stuff that came in just a couple weeks ago was actually a replication test. So that's very important that your replication actually works, right? You can set up two servers and you put some stuff in here and it comes out there, right? It's good to know. Until now, that was usually manual work. If somebody sends you a test, well, I have a new way to configure replication in some way. Like I want to have a delay of some kind, right? Or something like that. So like, okay, then I have to set up one here and I have to set up one there and then find the documentation for your new option. And okay, so I want to have a 30 second delay. So let's run this like, okay, I'll just wait 30 seconds. Like, oh, actually came back after 20 seconds. So that doesn't work. So that stuff is super annoying to test manually, right? So that was sort of a, for a long time, we've been waiting and somebody now recently contributed the beginning of a test suite for that. So it just sort of, there's lots more stuff to replication in Postgres, obviously, that we could add on, but at least we have something to work with. So this is sort of the zoo that we kind of work with. So there was a lot of, you know, you can kind of put this in eras if you wish. So in the sort of seven dot X era, we kind of just figured out what we even need. And then the eight, eight point X era was perhaps sort of the era of the built farm where we just really made sure we had automated platform coverage. And then in the nine point X era, we just wanted to just add a lot more depth to testing just to make sure we actually have a lot of tests for everything ultimately, right? And so some of those, we have sort of a proliferation of different approaches to that now. So perhaps it's a little bit of a future project to consolidate some of that. So an obvious project, for example, would be to take the PG upgrade tests and re-implement them in the Perl tab framework to just get bit of a lot of duplication and custom hacks there. And perhaps also these LippiQ tests could be moved that way. I haven't looked into that exactly, but sort of the, you know, we don't need five and a half ways of testing. Maybe three would be enough. And there's more testing outside of the Postgres project, right? So the packages do testing, hopefully. I know the Debian projects with the original package or Martin Pitt and others who have helped has a very decent test suite of whatever the packaging adds. So it makes sure you install the package and actually put stuff in the right directories. You start it up, it actually runs. You stop it again. You have different locale configurations and all these things. It's a very good test. And actually a lot of the Perl tab stuff was kind of stolen from that by me. Because I had a lot of experience with that and I knew how to work with it and how it would work. And it's been useful. So just, you know, took that approach. I know a lot of companies that Distribute Postgres, Redistribute Postgres have QA teams internally that do automated and manual testing and sometimes they report back and find additional bugs, especially stuff that's hard to, has been hard to test automatically. Like replication stuff, for example, there's a lot of manual or sort of custom testing that they do. And as an open source project, sort of the founding myth of open source was that you have a users test, right? And I think we have learned over the decades that that doesn't really work so well. At least for a database project, the way Postgres works, right? If you have, there are other pieces of software. If you have like, it's like an editor or something, right? And you bring out the new version of VI, whatever, right? And it doesn't work. Okay, well, you go to the previous version, it's fine. Just new version is crap. I'm not gonna use it. But if you have a new version of Postgres out and you have upgraded, you took, taking it down time, you upgraded everything and you converted the data or whatever. And then it's like, oh, it doesn't work. Moving back is very hard, right? So we can't, obviously, feedback from users is always useful and necessary, but we can't have users do basic testing that it works. And there's also this sort of problem like, who moves first, right? The users don't want to upgrade until they know it's ready, but we can't really know if the users are gonna be happy with the new code until they upgrade. So the best case scenarios, you have, whatever your internal application is, you have a test suite for that, right? If you have an app on the website or whatever, maybe you have some automated way to test that kind of thing, well then you just plot the new Postgres in, run the tests with that. Hopefully it still works, your report backs like, yeah, I'll test the Postgres 9.6, it works great for us, go forward. But the reality of how business works, how software development works, not everyone has that. Not everyone has the resources to, even if you have that, not everyone has the resources to do that, so it's hard. So there's also this whole bunch of other stuff you might have heard of or seen, which is not testing in that sense, it's sort of more code analysis and stuff. So if you're interested in that, there's a talk I gave a couple of years ago, you can follow, there's a link on the slides or just go to PGCon 2014, there's a talk about that. So that's maybe sort of another aspect to quality that you just sort of static analysis or as ball grind does, like memory leak checking and things like that, that's sort of more of a low level approach. I don't want to talk about that too much. Test coverage is, again, you can talk a lot about test coverage just as a reminder of how it works. You can run the first command to kind of clean up what you had previously, then you run whatever tests you have, and then you run make coverage HTML, which gives you an HTML report of your test coverage. Actually you have to, I didn't put that in here, you have to run configure with certain options to make that work, so the code is instrumented for that. And it's in the documentation of how to do that, and there's actually a new, just as of yesterday or so, a new website that just coverage.postgresco.org, where that kind of information is looked at. And as Simon mentioned in the previous talk, one way if you want to kind of get involved into this and if you want to get involved in the Postgres contribution and you're interested in expanding the quality of the tests, just take a look at that, and you will surely find lots of places that are not tested. So then you start there. And so these URLs, there's a blog post and a Jenkins server that I run myself that has sort of similar information. And in reality, some people look at that sometime, but we have by no means any kind of idea of what kind of coverage we really have or we should be aiming for, so this is just sort of an advanced gimmick in a way at this point. So here's a whole bunch of stuff that we don't have good testing for. And a lot of these could easily be your first project or some of code project where I don't think we're in Google, some of code this year, but if you want to just pick up something, a lot of these are totally doable. So we don't have any good testing for PGDump. Which is really scary in a way, if you think about it. PGDump is actually tested as part of the PGUpgrade test because the way that works is that it runs the upgrade and it dumps the old server and the new server and it diffs them. So we know that PGDump generally works, but there's a lot of edge cases, as you might imagine, of corrupted catalogs or something is missing or you have a table that you upgraded three times, it dropped the column, added the column, dropped the column, added some inheritance, removed inheritance, can we still dump that? Can we still restore that? I don't know. So we could add more variants to PGUpgrade testing. We could do one of the things that would be really good is if we could actually PGUpgrade across different versions, right? Right now PGUpgrade just upgrades from the same version to the same version, which is useful because it exercises a lot of code, but what we really want to know is can we upgrade from old versions? And there's ways to do that manually, but you still have to download the old code, install the new code versions, and then run the thing. So that'd be great if that were automatic. A whole bunch of things like, not all clients and tools are covered. For example, I was thinking of the other day, PGResetXLock, that's a very special tool that you only use very rarely and we don't really have good test coverage for that. So if you change something in the Postgres code in the wall or something, and do we know all the different options in PGResetXLock still work, do the right thing? I don't know. And a bunch of other things, you know, indexes, we, all kinds of functionality, but vacuum, does vacuum even do anything? There's no way to test that. Like, we know it runs, but does it actually remove the things it says to remove? Does it actually update all the different maps in a way that it should? There's no automatic way to determine that. And crash recovery, you know, pull the plug. We now have virtualization everywhere, so that would be easy to do that way. Like, just pull the plug and see if it comes back up and behaves in the way we expect it to. Different configurations, you know, with all the different configure options and different versions. It was pointed out to me last week, that the LIPPQ protocol would be interesting to test. There's, you know, obviously it is tested as part of the general using of PSGL, for example, but there's lots of different options in the protocol that would be useful. And more stuff about replication, we could, and we surely will, Sue and Ned, more and more stuff about replication. One thing that would be interesting is all the different access control options like Kerberos and PAM, LDAP. There was a patch, for example, on the last commit fest, where somebody wanted to add something about Kerberos, the GSS API. Well, and it's like, well, that's great, but I can't really sit here for three days, figure out how to set up a test Kerberos installation to look at your patch. If there was some way to automate that, that would be great for those kind of patches. We don't do any automatic performance testing at all. Many people have tried, it's very hard. We don't really do that. Oftentimes we just get feedback from users like, actually, in my workload, this is actually a lot slower than before. And then there's a lot of panic. So if we can find that out earlier, that'd be great. More testing packaging, as I mentioned, Debian packages have pretty good, very good test coverage. Other packages don't, I'm not gonna name names, but I've been very upset. A lot of extensions, as I mentioned, extensions have usually have tests that are quite good, but they don't have sort of the built form support and all that other stuff. Just last week I was compiling a bunch of extensions in 32-bit mode, for example, and that was not a pretty sight. So, because apparently nobody uses 32-bit anymore, but we obviously still claim to support it, and I was a bit scared about the results, actually, because a lot of compiler warnings got aligned and mismatches and pointers being cast in the wrong way, and that sort of stuff, you only find out if somebody runs into it later. So, and it's not obvious how to fix that because we can't put more load on the built form and have everyone test their random extensions that way, so that there's not a good solution. And I gotta finish, but just a couple of things I like to do, again, I mentioned we have the testing zoo, we wanna kind of put that into more of a way that it's much easier to see what tests we have, uniform way to report the results, and we wanna run not only performance tests, tests about the performance, but we also wanna have the performance of the test better. So, the tests run faster, so you can run them off and add more tests. Only at some of the tests that would be great, so we don't have to run everything all at once all the time, just the faster the tests run, the more people will run the tests, the more tests we can add, the better it will be. So, that is that. So, there's a lot of projects here for interested people, if you know, just find me later, I'll assign them. Any questions? I guess we are almost, we got a couple of minutes before lunch. Yeah, we have time. Sure. Anybody has any? No, there's one in the back. Okay, so his test was about PG, PG back rest, which I believe is a backup manager for Postgres, right? It's an external project. And he has written a lot of tests around, you know, replication, archiving, timelines, and where should those tests ultimately live? Yeah, obviously a lot of that depends on the specifics. I think whatever, you know, all the interfaces that Postgres exposes for testing should live in the Postgres source code, which we have now, you know, started that. So we have like some frameworks for that now. So I expect certainly there will be a lot more stuff coming that way in terms of all these details. And ideally we could, I'm not sure if we can just move the code, I don't know how you've set it up, but obviously a lot of those ideas should certainly be stolen, I think. Yeah, yeah, yeah. And I think that's definitely useful, like all that thinking we should definitely adopt, right? The code might look different, but the ideas is really often the hardest part. I would love to look maybe later if we can actually look to your code and see what we can do there. I think so, yes, yeah. Yes, definitely there, yeah, right. Yeah, definitely, that would be super useful. Yeah, you can spend like the next two years just submitting patches for that. Was it a question over here? No, cool, all right, let's have lunch, thanks.