 So just in case somebody is in the wrong room, this is about the special setup you get in Debian when using Postgres, or the special needs of Debian in terms of Postgres usage. Okay. Okay, mouse doesn't seem to work. Okay, let's do it manually. Okay, got it. So just short introduction, brief look into the room, which tells me that roughly half the people know me already. This is from a website called OpenHARP. It just shows some of the open source engagements you have, probably for either one of us. It does not show everything I noticed on some things I cannot really make out. I mean, this revelation, I don't even know that I committed a patch to revelation at some point. I don't even know that I had right access to it, but it seems I did something about it. But anyway, this is meant to show you that I've been doing open source and free software for quite some time, or might I say for too long already. I started as an open source developer in 92, just to get this straight. Yeah, this slide says 93, whatever. Linux as in kernel work, 94, but I stopped since. I actually stopped a couple years after because that was just too much to combine with work. I'm a Debian maintainer since 95 and started doing Postgres, which is what I actually learned. I got my degree in database science, so it came quite naturally to go into Postgres. Very roughly, I'm going to skip those introduction slides because I'm not sure it's of interest what I do for a living. Just a brief idea. The company I work for, we do a lot of open source stuff, so this is not just about the open source project, but also about experience from the outside, from users. Although for this topic, it's not much of it, because it doesn't make sense. Debian is special. Sorry. For those of you who are interested in what exactly everything is that's so special about Debian, I'm going to talk about Debian itself, about the distribution and all the specialities tomorrow. I think at 11.30, I'm not sure. I just brought a couple points up here. The stuff that really intrigues people most of the time is that Debian is indeed a real community and only a community. There's no company behind it, and it's completely geared towards free and open source software. It's also everything that Debian is free just by definition already. Some call it democracy, and then I don't even know if duocracy is a real word. I heard people say it, but I guess it's made up, saying, yeah, do the work, and then you own the work. Some might use more negative words like anarchy, but it doesn't work as democracy. Again, more about that tomorrow. As I said, it's the largest distribution and maybe even the largest project. I'm not sure about that. There's a word map showing all the developers, well, all those that put their addresses in, at least. I'm not sure about those that seem to be in the middle of the ocean, if there are islands there, or the coordinates are wrong, or just to show some faces. This is last year's developer conference, and no, I don't think we had as many people at Postgres developer conferences so far. Keep in mind, this is only the group that traveled to Heidelberg, Germany for the conference, not including the hundreds of others that didn't. So back to Debian the project. What Debian is well known for is some special guidelines, the constitution, and the social contract. Well, technically, the free software guideline is part of the social contract, but I still listed it as one of itself because the Debian free software guideline is the basis of the open source definition. The Debian project came up with those definitions first, and then, literally driven by Bruce Brown's move back then was the project leader, it became a much broader movement. The other point I just wanted to briefly point you to is the social contract. We, as Debian, kind of signed the contract with the open source world or with the free software world that we are going to be, and still want to be and keep to be 100% free software. We don't hide problems, we give back what we have, and again, the priorities are our users and the free software, it's the same level, the same priority. Largest distribution of them all in terms of packages, but also in terms of hardware. Just put a couple things on, yes, upper left, the C64, that doesn't run Debian. It doesn't do Linux at all. Anyone here ever worked with one? Oh, more than I expected. I thought I was the only one, where Bruce, you're my agent there. But, of course, this is only there because it's iconic. My first Linux system was 486 PC, with eight megabytes of main memory, and it worked. And the old Atari isn't the right one, but there are Ataris that do work with Linux on it and there is a Debian distribution point. For all of these pieces, there's a Debian distribution available. And, of course, that means because we have Postgres packages in Debian, for all of these hardware pieces, there's a Postgres package available. So the way Debian is organized is we have three areas in the distribution. Main is real Debian, free software only, but because it's convenient for users, we also have some, what we call, non-free software, stuff that is available for you to just use, but doesn't have a free license. And there's a third one, Contrip. Contrip is a package that goes into Contrip if it needs something in non-free to build or run. If it's completely sustainable by itself, and it's free, and it goes to main, if not, it goes to Contrip. As you can see from the numbers, there's not that much non-free and Contrip software in the distribution anymore. 20,000 source packages, 21,000, building 250,000 binary packages. I guess that's a number that a lot of distribution cannot compete with. But, again, this is the advantage of being community-driven. As a Debian developer, you just see something that you like to have and you upload it, you package it, you upload it, and everyone else can use it. So I found this picture to explain the releases. It's not completely up-to-date. Actually, it's pretty outdated already, but it's a great picture and somebody really put a lot of effort into it. I didn't want to recreate one just because of new names. If you look at the names to the left, there are a lot of pictures in there, well, actually to the right as well. For those who don't know Debian that well, all the Debian releases are named after Toy Story characters. And just in case you wonder, there is no version 1.0. We have 0.93 something, and the next one was 1.1. Because somebody, some vendor released the Debian 1.0 before Debian did, so we couldn't because that's a different version, so we had to start with 1.1. Anyway, the thing is, there are a couple areas in there, so you see experimental unstable testing stable. The idea behind that is, of course, stable, as the name suggests, is disabled release, the release you usually work with unless you're me and you use unstable, which is completely in development. Testing is like a rolling release for stable. So everything in unstable that works well enough that doesn't have release critical errors, eventually migrates automatically to testing. And in that moment when we freeze for the next stable release, we freeze testing and don't migrate anymore. Experimental is for huge changes, it's for stuff that you're not sure about, how it will disrupt the rest of the system. So that one is more or less kept out. By the way, unstable, also called SIT, as in system in development, or something like this, or the Toy Story character. Anyway, thing is, you see here, you got stable, and this goes up to Weezy and Squeeze is still listed there as stable. So nowadays, the current release is Jesse. Weezy is already out stable, but since Debian went to long-term support, even Squeeze is still supported. And as I heard earlier, we still have users in the room. So what we do is we actually maintain all the, not all, but the most important packages in Squeeze and do the security patches and everything for those old packages. However, have a look at this. This shows you the list of the Debian releases and the list of Postgres releases. Now, of course, Squeeze could have been released with a newer version just from the timing of the release, but keep in mind we have a freeze before the release, and Debian only releases when it's ready, meaning the freeze can be for six months, 12 months, even more worst case. And we don't release a beta version. So that meant when we froze Squeeze, we had to use Postgres 8.4. And I'll do the math. I mean, end of life for 8.4 was 2014, July. And actually you asked earlier, Bruce, it's next month, as you see. So we still had almost two years of one year and seven months to cover with an old Postgres version. And the same problem comes again. Weezy ends May 18, but the 9.1 version ends September 2016. And Jesse again, although for Jesse, it's a very short, by comparison, short timeframe. So what we do with that is we actually take changes from other versions and backport them. By the way, in terms of long-term support, help is always welcome. So not just for Postgres, for everything in Debian. If you're interested, just give us a hand. So we went to 8.4.22 and by that point, we were out of it. Postgres had declared end of life, that was the last version. Then, and this is where I'm personally involved, at least partly, in the long-term support thing, we tracked 9.0 until we came up with 9.0.23. And all the patches made there, we manually checked if they are applicable, if it doesn't make sense, if it's just, it's usually bug fixing, but some bugs are more important than others. And then backport them and make sure they apply to the 8.4 version. Now we got the next problem. 9.0 is already end of life. So we have to go one step further and do the same with 9.1. Fortunately, only for another six weeks. Because obviously, for everyone who ever tried doing that, the further the major versions go, the more difficult it is to apply a patch and you have to change it, which sometimes is a lot of work. Sometimes that doesn't even work. We created a wiki for that where we document which patch worked and which one needs which adoption and which one doesn't apply at all. Yeah, say it again. So the command is we're not the only people who do that. I'm supposed to repeat the questions I read here. So yeah, yeah, yeah. You mean as in backporting patches from 9.0 something to 8.4? Okay, except Debian. No, I mean, these patches is not something we or I do just for ourselves. Okay, yeah, that could be, that could be. I mean, if you're not doing Debian but you're running on anything but, and for whatever reason you're still on 8.4, you have to do it yourself, right? Right, that's the point then. That's the point why we put it into Debian. And the next release, as I said, will have the same thing. So as a user, you're guaranteed to have, what is it, five years of usage of that version on Debian and you get all the patches you need. By the way, just to be clear, a lot of the development, a lot of the package development, the same, what I'm saying here holds for Ubuntu because Ubuntu and Debian, the Postgres packaging is done by one team for both distributions. Now the LTS thing is different because their LTS cycle is different from ours. So some work we can do together and some we can't. At some point, you will come to the point where you have to upgrade. You might want to do it earlier, but eventually you have to. And then we are facing the typical distribution problem. How do we do that? I mean, the upgrade in itself sounds easy, right? But just keep in mind, the process of installing a new package is pretty complicated. And there are a lot of places where it could go wrong and it have to back up and reinstall the old version and keep the old version and so on. How do we upgrade a database? You have to make sure, or let's say there are two options. One is you take the data from one to the other. The other is you just dump it before you upgrade or before you upgrade the package and then you restore it. That will work, but I'm not sure that's the right way to do it. Keep in mind, if you do the dump restore thing, you have to have enough room on your disk space, filer, whatever, to store the whole database. You have to do the database upgrade at the same time as when installing the packages. You have to have a lot of time. There are databases out there that are not in the few gigabytes area, talking about how many dozens of terabytes. It takes an awful while to dump that out and restore it, A3. And you're sure that's on purpose? You know, it happened to yours truly before that I just back ported to all the versions I had checked out and pushed it and then somebody approached me and said, hey, you do know that this had end of life like last week? Oh, no, I forgot about it. Happens. But the thing is, this is, but back to the upgrade problem. Just think about this whole complex cycle. You did dump your database and then something goes wrong and you have to restore the old version. It's really, really complicated. So looking at the Postgres docs, there's a good hint. By the way, before I forget that, when I talk about the cluster here, it's not meant as an high availability cluster, but it's the original definition that also part of the SQL standard used to be in the SQL standard. I'm not sure if it's still in there. The cluster defined as the group of databases handed by one postmaster. So what this says is you have to have 94, let's say you want to migrate from 94 to 95. You have to have both running at some point at least. I'm pretty sure, correct me if I'm wrong, that the also holds for PG upgrade, right? So it's not sufficient to say, okay, I just dump it and drop the rest. Might go wrong and I have to restore it completely into the old version and reinstall the old package doesn't work. And for other reasons that we already discussed, we don't want to do that. So we have to set up a system that runs 94 and 95, and ideally even at the same time, so it can copy the data over our use PG upgrade. But the way it used to be handled was that the relevant binaries, the relevant programs, were just copied aside to some temporary storage and then run to get the data dictionary and the data out and everything and put it over. You might imagine there's a lot of chances for this to go wrong. So what we came up with is a solution to do it right. Instead of just putting it something aside, why not design the whole system to allow multiple versions and multiple clusters of different versions at the same time? So the way the Debian packages work now is you can install whatever you want on versions and they all can have databases on the same system at the same time. So upgrading means only to tell them, okay, this is the old one, this is the new one, just push the data over. So how is it implemented? On the left hand side, you see the Postgres package as it used to look. So there's data in there, user live Postgres, user share Postgres and some other files of course. This one was changed because instead of the Postgres package, we now have a Postgres QL95 package that has exactly the same data in it. Well, mostly, some details about that later. The original Postgres package still exists but instead of having the files to install, it's just a meta package, it's just a dependency that makes sure you get the latest version of Postgres 95 or whatever is up to date by then. In the package, we also change instead of putting the clusters under valid Postgres as they used to be and the information about it under ETC Postgres, we added a version number there as well. So there's another sub directory for the version 95 and then there's another one for 94 and so on. So they are all separated. By the way, Debin does move some files to ETC that are not there by the standard Postgres install. Reason is file system standards. Configuration files have to go to ETC, so we move them over. And then to handle this, we created a whole bunch of new programs. So instead of calling in a TV directly and making sure we handle the version number in between all the time correctly, we created a program called PgCreateCluster which takes away all the right configuration things. Same for PgControl cluster, which does the same for PgControl. And while we're at it, why not doing it right? We know something like createDB from Postgres or we don't know DropDB or LSDBs, so we just edit those as well. I got examples for that so you can see how those work. And of course, upgrade cluster, sorry, almost forgot the main reason why we implemented it that way. You have to upgrade it. Let's go through a detail to an example. So what I did here is just create a new cluster for version nine five on my system. You can see the output is standard. And see the last thing, port five four three three, that's because my standard installation already ran on five four three two. So during installation, the system decided, okay, that part is taken, let's take the next one. There's no need for me to manually interfere and say, I have to, this is what you have to do. I can change the port manually if I want to. So then PGS clusters shows us what's available. I did remove some pieces because wasn't able to get it on the slide otherwise. The green line shows your running cluster. The red line shows the cluster that's not running. And guess what? The next command, control cluster obviously starts it. And we see it next, LS cluster shows us, oh, there's actually a copy paste error. It should be online, not down. On the second green line. That comes from copying the lines instead of taking the lines from the terminal. Sorry. And the last one, don't really need the test database, so let's get rid of it. Interesting side effect is PGDrop cluster also enables you to stop the postmaster in the same command, so you don't have to worry about all those details. You only get rid of one. The system knows how to stop it and delete it afterwards. I hope this is big enough size-wise. Yeah, seems to work. That's the upgrade command. So I also did install a nine four version of test and then upgraded to nine five. And you see it goes through all the steps. There are actually more lines there that are removed that didn't give us so much news. But you can see there's a lot of work to do for migrating one database to the other. But it does work. It does do the work when you want it to do it. When you have like the availability, the system, the users are shut down, shut out. You can upgrade the database. You can do what you need to do. Maintenance window. And you still keep the old one. So essentially what you do is you upgrade your cluster and then after it works, you test it and after you see it works, you delete the old one. What does that mean in terms of accessing the databases? You can see there are the two again. This, by the way, is the right line. As you can see, it says online. Just starting PSQL gives you PSQL version nine five. We can use the port as we used to do it in Postgres. And you can see in the next line, it still starts PSQL nine five, but the server is nine four five because that's the one we just installed. Or, and this is the only change, we added an option cluster. Where you can tell it the version number and the cluster name in one argument. You see it's the same. It does access the one on five four three three as we can guess from the note it gives us. To implement that, we added a wrapper to all the standard Postgres tools. So looking at this standard path user bin PSQL, you see it's just a link to the PG wrapper. And the same holds for all the other tools. They all go to PG wrapper and PG wrapper then parses the cluster argument if there is one, takes the other arguments and just push them on and makes a decision which tool to start, meaning PSQL for example, will always be the latest version because that's compatible with the older servers. Some other tools are not or not guaranteed to be. So I use drop user here as an example. As you can see, just the version number in the first one, by telling it to work on the nine five cluster, it gives you version nine five drop user by asking it to work with nine four, it goes to nine four five. So same thing there, just because it could be, not sure, but it could be that there's so subtle differences and with some tools we even know there will be difference between. We could say that's it. We solved our problems, right? But why should we stop here? I mean, now we solved the problem of having different versions at the same time on the same system. Yes, I know that in the production environment, you don't wanna have a lot of different versions and different databases on the same server, but in development, who knows? And if you have to do tests against different versions of different databases, might be nice to have them all in the same system instead of maintaining so many different systems. So while we edit, why not add another system to allow all versions? Or let's say it's already possible to install all versions, but why not add a system that gives you all versions as Debian packages? So there comes a solution that is handled by a couple of Debian guys and the Postgres people. As you can see from the URL on the left, it's apt as an Debian package management dot postgresql.org. It's a joint offering for Debian and Ubuntu. Again, the same team doing the packages and they do it for both systems. So from this apt postgresql.org, you can get all the Postgres versions, at least those that are still alive, for all the Debian and Ubuntu versions as binaries. The distribution, the sources are the same as used in the distributions. So one of the common questions is, shall I use this or that? It doesn't actually matter. The same source is built by the same team. Yes, the build system is different just because of the amount of packages. Some of the developer versions are available actually, but only for some hardware, some architectures, and only for some distribution releases. But you can, at the moment, you can get to go to uppostgresql.org and install Postgres 9.6 from it. You're gonna play around with it, fine. You can install it and it runs in the same framework as your production 9.5 system on the same system. The names are different. They added PGDG for the distribution version and PGDG disk version to the package number. Okay, we have to change something so you can see the differences. But again, all active Postgres versions for all active distribution versions, now let's do the math. Three to four Debian releases, three to four Ubuntu releases, might even get bigger with all the long-term support stuff. Six Postgres releases, all server, all extensions, all modules, just two architectures, only AMD64 and i386. The rest hasn't been added and probably won't be. But still, 143 different source packages and 10,000 different binary packages, essentially handled by one, two people all the time. And yeah, just to see how the size increased. In 2012, when we started, we were talking about something like 2,000 binary packages. And now we are at, this isn't up to date. As you can see, the latest graph that was created was from early, like a year ago. Didn't find a newer one. But still, a lot of new packages have come around. So how do you use those? I have to take a short detour here and explain a little bit about the Debian package management. For those of you who already use Debian, you probably see why. By the way, I just see you're taking pictures. The slides are available online. If you want to, they're already uploaded. So far only to my server or to our company's server, so I have to figure out how to get them here. But worst case, hmm? Okay, so I will add it in there. There's a small change I did this morning and my hotel violers didn't allow me to upload more than two megabyte. Every time, 2,112 kilobytes and stop. So I have to make that change and upload it, but we'll be on the wiki so you can get it. So the question is just to repeat it. Do we have a procedure to work with changes in Lipsy localization? Because that's a problem for Postgres and the index structure, of course. But I'm not sure how this fits into this area because the packages are built against a stable release. So there won't be any major update on G Lipsy on a release Linux distribution. No, no, a distribution release is stable. And for a reason. So the only thing that gets changed in a stable release is security bugs. Anything else, new versions go into the next release. Yeah, it could be a bug fix. I'm with you on that one, but it's probably not serious enough to make it into a stable release. And the other thing is, of course, I have a slide for that, a brief slide about testing. I mean, 10,000 packages want to be tested as well. And you cannot do that manually. So those tests should find problems. And we have a lot of regression tests in Postgres anyway, right? Just a short introduction. So these lines show you how, in Debian, you define your repositories. So the first line says, it's for binaries. It gives you the Debian mirror. Actually, it's a drum, a mirror. The part, the beginning is the root director is the slash Debian. The release is Jesse. Could have used stable instead right now. I only want to have main. This is not because I don't want to have a non-free software system, but it's because with Contrib and non-free, it didn't fit the slides. Because they were just added afterwards. The next line gives you the same just for sources. And then for measure, the third one is the security system where all the security updates for the stable release are. So then you add another line to this. The next bucket here, it's uppostgresql.org. And then you, the path, whatever it is. And you say, Jesse pgdg, and you get all the Postgres packages that are available for Jesse from uppostgresql.org. Or you want to have a specific version. And if you're talking about latest development version 96, you have to do it because that's not automatically activated. You have to use the last line here and after the main, add the version number. So it's completely up to you what you want to do. You get, you can, you're completely flexible in changing and deciding which version you want to use. And then of course, standard Debian, once you have it configured like this, up get update to get the new package list, up get upgrade will upgrade you, the packages you have that are new. And that's it. Just use it. Now there's one problem, if you keep it like this, you might run into a situation where one version looks newer than the other to the system, despite you not wanting to change those. The way we solve it in this problem in Debian is by giving priorities. Upcache policy shows you the priority or the information essentially for the postgresql95 package. See which version is installed and you see there's a candidate version with a PGDG edit. And due to the way version numbers are compared, the version 950-1.pgdg plus two is considered newer than the standard 950-1 version. But we might not want to make that change every time and might not want to use up postgresql for that because we only want to have use it for a different older version. So what you do in Debian is the so-called uppinning, as in the red part here. You put a file into preference.d that says everything that comes from up postgresql.org gets priority 200. And then we call upcache policy again and you see the 500 up there went to 200. 500, by the way, is the default, went to 200. And now, it's not a problem anymore. It doesn't think about upgrading the package anymore. There's also the priority 100, which is the currently installed package. I don't think that is really a needed one for at least not for us right now here. Yep. No, so the question is, is the PGDG version supposed to be not newer, although it looked newer? Yeah, due to the plus two. No, this is just because that was the current version, the plus two. In this special case, it would be newer. You don't want to play ping pong. So the next release might be a dash two version in the main release, which is absolutely the same source as the plus two on the PGDG and you get a new installed again. But the main reason for this is, you make your decisions, which package you want from where, and you want the system to keep to that rule. If your decision is to use PGDG, do it. If your decision is to use the standard Debian org package, please do that. But don't switch behind my back without me even noticing. That's what the app pinning is for. You can make the decision. You can also increase the priority, same thing. Then it will not go away from PGDG. It will not do the standard Debian packages anymore. But it should always be your decision. And in this special case, I agree. The plus two version might be newer, but I have to look into the change log to find out. As soon as the dash two version comes out on Debian, there will be a dash two PGDG plus one version again on the PGDG archive, or might be a day later. Depending on timing and so on. So how are these packages built? As I said, nobody's going to test more than 9,000 packages. So you have to find an automatic way to test the packages. There are a lot of tests available on the Postgres regression suite, although that is not enough. But more on that later. But in general, we need test suites for almost everything. And Postgres is one of the open source projects that's good and creating test suites for what they need. But that we need some more. So what we do is we use Jenkins for continuous integration. Essentially, Jenkins is a server to run scripts, right? We need two jobs. We need to make sure it builds the packages, the binaries from the sources. And we need another job that, no, wrong. The source job is to build the source package from the information we have, the tarball and the VCS information, it's all in Git. So we have Git, we have the Postgres tarball, so we have to make one source package from it first. And then we get the binary package that takes the source package and creates the depth files from it. Let's a brief look how it looks, the dashboard. As you can see, there's a green there, so these things worked, chose the duration, chose last failure and so on. So there's a lot of stuff that goes behind the scenes, but it's also a lot of work. And finally, I'm coming to the end here already. As I said, regression tests are not enough. They are good enough to show all the problems or the bugs that might be there in the source code because it doesn't run the way it should be. And with all the, the way Debian works is we have a lot of different architectures, as I mentioned, and we have built servers for all those architectures. So all the packages are automatically built on all those built servers, right? And depending on what kind of hardware there is and what kind of setup there is, the source code of the package might not even cope that well with it. So there are quite a few bugs that were found by running Postgres, the Postgres Regression Suite through all those architectural builds. But again, those won't catch the packaging errors of Regression Suite's ones. So we also need tests on the installed packages. And what the guys did, I'm not involved in that so I don't take credit on that one either. It's a tool called Auto Package Test and you see it's a small run here. So it also does run, tests all the packages by taking a standard system, a new system from scratch, and installing all those packages. Doing all the stuff that you do automatically after installation and reinstall it again to see if it's correctly reinstalled. Testing things like are all files gone after I reinstalled it and so on. This is all done automatically behind the scene. So we are pretty sure that the packages on uppostgresql.org are really up to the standard quality we expect from them. And that seems to be it. I'm a little bit early. Any more questions? I'm not sure what you're asking there. So you're asking if PgConfig has different versions as well as installable at the same time. And it was a problem because you said extensions for different versions to install at the same time, but wasn't that before we split it out to all the different versions? Oh no, then not. I have to admit I don't know. I have to look it up. So the general approach is use the latest version all the time if possible. But if a version, if you need versions for, well, if you need the program in different versions for different postgres versions, then always use the one that has the same version number as the server. Yeah, right, but you weren't talking about installing the extensions. Yeah, yeah, yeah, right. You can have the extension for 95 and 94 and 93 installed at the same time. Yeah, but then, as I said, then it should be using that wrapper and you should always get automatically the right version for the server version you have. Now if that has correctly been implemented, I have to check. I cannot tell you from the top of my head. No, if it wasn't, I mean, did you tell anyone? Why? Because that would be interesting because they should have a reason for that, right? Yeah, you might need a different version of the extension, might be. But still, the tool should work with all the servers. I might have lost that again. I have to check it. Anything else? Thank you.