 Let's welcome Wookie who is talking about infrastructure updates and whether we can change anything in less than two years. I hope so. Okay, can everyone hear me? Well, so thank you for all coming back after dinner and not just going to the pub. I'm impressed by your enthusiasm. So, yes, we have this interesting problem that it's quite hard to change things quickly in Debian. Now we all knew that. But I just wanted to talk a little bit about some experience I've had over the last few years trying to change a particular thing, which has not gone particularly well. I would have liked to think we could have done a bit better. So, I really do want to talk about the general problem here of the way our infrastructure works and how, if you miss a stable release, that can be a real problem. But I shall use this example of build profiles just because that's happened recently. And I think it illustrates the point, or at least some points. I'd very much like to discuss things rather than blither on for too long, so I'll try not to blither on for too long. So, just to clarify, in case anybody's not clear, all of our infrastructure basically runs on stable. There are a few exceptions to that for practical reasons. So, if it's not unstable, you can't use it in the infrastructure. So, now I've been standing up and talking about build profiles for some years now. Just in case anyone's not familiar, it's basically an addition to the build-depend syntax to be able to put angle brackets, profile, something, so that sometimes profiles do or don't apply because you set magic variables. And this lets us do arbitrarily complicated things. Primarily, in this case, makes bootstrapping easier is the original reason we started this, but we've generalized it rather and hope it will be useful for other things in future. So, because we've changed the build-depend syntax, tools that read that syntax have to understand it. At the very least, ignore it. So, that mostly means d-package, but it also means apt and it means s-build, and we actually found 11 other things that read build dependencies themselves rather than using the tool. That's possibly slide too many, but there you go. Some of them are quite obscure, like Haskell, we don't care about that. There's a list on the web page, which I did have, but I've just lost. So, just to give you some idea of the timeline here. So, 2010, there was a UDS, we had an argument about this problem, and Leak Minier wrote a spec for build-depend stage one and build-depend stage two, which was a very primitive version of this concept. I first talked about this at DEBCON for 11, which is now three years ago, and wrote some patches eventually, beginning of six months later. Those went in, which implemented, again, this fairly simple-minded version. And then there was a long argument about exactly how this would best be done and how what I'd done was a bit crappy, which was true. The d-package maintainers understand that because you added an extra field for every stage one, stage two, stage three, it made for a very bloated implementation in d-package. So, there was quite a long back-and-forth about, it should be used angle brackets, it should be used square brackets, and if we used angle brackets, that was the last character left. So, if we ever thought of something else crazy, we'd have run out of metacaractors to use. And this went round and round. And to be fair, I think that debate caused the final solution to be much better. It's much more general, it's generally a good thing. However, it did mean that whilst having that argument, we missed the wheezy release. So, it wasn't stable now, because we were still arguing about the syntax at the point where it could have been in the stable we now have. So, it didn't seem to be too much of a big deal at the time. So, eventually we get a better implementation in d-package that everybody's happy with. We rewrite all our patches again. So, by this time, basically, Johannes Schauer, my GSOC student of a couple of years ago, had more or less taken this project over and heated all the patching and generally fettling things, which is good because it meant it happened in a timely fashion, which it wouldn't have done if I was in charge. And so, yes, by the end of December, the end of 2013, we had new patches for most of those 15 packages, certainly all the important ones, and we sent them in. And then Helmut Gruner got involved and said, let's actually upload something to test this. For the very good reason that if we don't try it in the real archive, you won't know whether you have, in fact, found all the things you need to fix. And if any of those need to be fixed in stable, so that it actually works, you need to find out before the next stable release, otherwise you're screwed for another two years. So, he uploaded Doxygen, because that was one of his packages, and it needed a build dependency, a build profile adding to make it bootstrapable. And he got rejected, because Python apped somewhere in the archive software said, I don't know, I've done this. It's not right, throw it away. So, we found something that breaks, fair enough. I've supplied patches for that. Wait, we already had patches for that. But the problem was how do you get that into the archive? Now, by this stage, as I say, I wasn't really pushing this. Now, the point, unfortunately, Helmut and Johannes are neither of them are here, but we did have a sprint last week, so I talked to them about this. So, they were basically trying to make this happen. They're both smart people. I think Helmut's a DD, Johannes isn't yet, but only because the AM process is very slew. Now, they decided to mostly do this on IRC, or Helmut did primarily. I suspect that was a mistake. There's something in a process somewhere about, if you want to have a stable release update, you should file a bug about it. So, we didn't start by doing that, or they didn't start by doing that. They started by asking people, is it okay if we do this, and is it okay if we do this? Which is kind of informal and polite, and it's polite to ask people whether you do something before actually doing it, perhaps. But I don't think it actually helped their case, particularly. Anyway, so, they did try reasonably hard to get this done, so we had all the patches written. So, the point about this was that they weren't trying to get all the functionality of build profiles into the old stable release, just enough to be able to ignore things with angle brackets correctly, so it wouldn't blow up. And the point is that anything that gets used in the infrastructure needs to not explode when it sees some angle brackets, and just ignore them, and then we can upload stuff and test things. So, they tried putting something in easy proposed updates, but the stable release manager didn't fancy that much. And then we go, okay, well, maybe we could put it in backports. Backport the new version of dPackage, which has all this functionality in. And they went, no, that's how loads of new things as well, which we don't want. How about just a patch to do, just ignore angle brackets and not do any of the extra functionality. That's the basis for that. The original patches we were trying to get in were just to ignore angle brackets, not to implement the whole thing, because clearly that would be wrong in a stable thing. It was the least possible patch that would let this pass through was the idea. And so there's some rules about stable backports, and that didn't comply with those. All right, how about just stick these in quietly somewhere so they get used in the infrastructure in much the same way as the version of S-Build we use. It isn't actually the version of S-Build in stable. But I think people didn't like that much either. And then there was a bug file which basically said, so eventually it said, dear tech committee, can you help? What should we do? Which again, perhaps wasn't the smartest move, but there you go. So everyone involved was very helpful and polite and reasonably said, well, what we normally do is this, and this doesn't really comply, so now we don't really want to do that. But the problem was that we couldn't even upload something to experimental to try this, and it seems to me that we probably all have been able to do that. So does that mean that collectively, so each individual group, the stable release update people said, well, this isn't really a stable release update sort of thing, and the backports people said, well, this isn't really a backport sort of thing, and so on. And they all had a point, you know, it's easier for us to say, no, don't do that because, well, it is. Arguably correct, exactly. But the kind of collective result of all these individual groups going, it's not really something we want to take on the maintenance burden of, is that we couldn't do this for once. We still haven't done this. Now, as it happens, at the bootstrap sprint, we found a bit of a problem, so actually maybe we want to change it, so maybe this delays a good thing. But in general, it would have been nice to be able to try this a few months ago and go, yes, this all works, and we have all the pieces. Now, so one question is, in fact, was all that IRC chat the wrong way to go about this, and someone should have filed a bug at the beginning and had a much more public discussion? I think they probably should. If you read all the bug reports about this, the stable release update people said no, and they said, we didn't actually say no, we just said, I'm not sure about that, which isn't quite the same as no. Yes. So to some degree, that's perhaps just communication failure, but on the other hand, these two guys are smart people who know how the project works and did fairly sensible things. So if they didn't do the right things, then maybe it's hard to do the right thing. Arguably, you could say that, in fact, we don't want to introduce build profiles and the thing, and you do have to wait another two years, and that's just tough. I don't know. How much do we want this to be discussion? I almost finished. So, yeah. Can we improve on any of this process? So one of the things I hadn't appreciated four years ago was just how much it matters if you miss a stable release, there's something you need in. You really do have to wait quite a long time and I'm not sure, and maybe I'm just slow on the uptake and everyone else realized this years ago, I'm sure Steve has, but, yeah, you know, if I'd been more aware of that, I would have worried a lot more about the fact that we missed the wheezy opportunity to get this in. I really wasn't sufficiently on the ball. So all of these people write to go, well, there's a small risk this will break something, so we don't want to do it because, you know, the version of d-package that people are using in, say, a stable release update really does matter. So, you know, people are right to be quite conservative about that. Now, you know, Helmut when asked the tech committee kind of help, I don't know what to do. Some people think that that's really not a nice thing to do. So, Guillem, for example, hates the technical committee and went, well, no, that's... Yeah. Yeah, not to interrupt right before you finish, but it is worth noting that what he actually did was to informally engage tech committee members in an IRC conversation, which is really not the same thing as invoking the technical committee. Precisely. But it's still a noise, Guillem. Yeah. So, Bikesheds, obviously, PPAs is something that maybe we should be using more to enable this sort of have some software in use for the machinery, which, as he isn't in stable. We already do that, in fact, for several packages, and maybe we could do it for more when it's expedient. That's approximately what we do for the same kinds of things in Ubuntu, and it works fine. All you need is... So, I totally understand the need for admins to have the thing maintained somewhere. And, you know, that's not always necessarily something that you want to push to Debian users. There are kits where we want to be running something different, but you want to make sure that the thing you're running on important production systems is actually cared for by somebody. And having it be have, you know, the Debian DeepCutters team's name on it, and have them be able to push security updates, and that kind of thing, and have them be on the hook is, I think, sufficient for this kind of thing. Yes. So, that's the end of my slide. Let the people... I just need it closer to where the mic is... Exactly. Hi. So, from my perspective, the first thing that went wrong in this particular process, I was always of the opinion that Louik's original proposal, regardless of any lack of aesthetic desirability from Deep Package Upstream's point of view, it was the right thing to do because it would have gotten us where we needed to be. And so, I think it was... I think the original sin here was in letting GM derail it by going for the perfect solution, which necessarily meant it was going to take longer to implement. And I did voice this opinion at the time of that discussion. So, I think we could have gotten there a lot faster by being more pragmatic about it. That's true. But that was a decision that the people who were working on it needed to make, and I was not the person working on it, and so that's fine. You make whatever decision you wanted to. In the knock-on effects of that, yeah, as Colin says, if we picked one thing to overrule here, it would be the SRMs as most likely. I tend to agree with that. But we were in the situation which I viewed it as something which in the absence of any particular group being willing to stick their necks out and take on this work, there needed to be somebody in the project who did have the big picture view about what Debbie and as a project needed as opposed to what the individual teams needed, and that is supposed to be the technical committee's responsibility. So, I actually, I asked... I think it was... I don't remember if it was Helmut or Johannes at the moment, but I did ask somebody to assign that bug to the technical committee. Now, in the meantime, we discovered oh, maybe the conversation they had with the SRMs was not what they thought it was and maybe the SRMs wanted to review it, which is why the technical committee didn't act on that. The other aspect of it was that in the course of discussing it, it was pointed out that as a consequence of everybody now being a delegate on these teams, the technical committee doesn't have the authority to overrule them. And so we have a bug here that I don't think... I think it's an emergent bug that we don't actually have the authority and the technical committee to overrule delegates that don't have the authority to overrule matters like this. And I think that's something we should probably constitutionally fix because it is... We've used delegations a lot more than we used to and so this is... I think this is an unintended consequence and we should have a project-wide discussion about whether that should be fixed. Obviously, all things being equal, we should have an amicable discussion and consult people anyway, but sometimes an overrule might be needed and there should be somebody with the power to make those calls. We should have individual maintainers but the moment the team gets a delegation so that they're blessed by the DPL then suddenly you can't overrule them anymore. I certainly hadn't realized that. I guess a lot of other people haven't either. One thing that occurs to me is perhaps you were trying to solve the wrong problem. That could have quite possibly. Because if you think about it the way this would be managed in a sort of corporate environment, you're basically trying to test the tools that are used in infrastructure to make sure that you'd caught everything. So the solution to that would typically be set up in the infrastructure. To be fair, that's what we were told to do ultimately was go and set up your own copy of everything and test it there which, as you say, is a perfectly correct way of looking at it. The problem is it's really hard to set up our infrastructure because half of it's not packaged. But perhaps... That's actually not true. It's not particularly hard to set up that S build and all that. S build is not horrible. Mike, thanks. What's the way you think it gives me step by step? Yeah, I'll write one next year. But... Anyhow, this actually for their specific case here saying you can set up parallel infrastructure is ignoring half of the reason they want to do this. It's not about testing that it works in the archive. In fact, it wouldn't because the deep packages they want to upload will ignore those fields and the infrastructure will not in any way be improved to support these features. What they want is to be able to upload packages to unstable that support build profiles so that when unstable becomes stable, you can bootstrap it. That's what they want and parallel infrastructure doesn't solve that because they'll never be able to upload that stuff to unstable until a release has happened. There's things you've forgotten about is one thing and also as you say having maintainers be able to upload stuff which currently does nothing but is ignored but will be useful in the future. So the problem with doing that is that because you haven't set up the parallel infrastructure is that that code has effectively never been tested. And using our production infrastructure for testing new code is probably not the best idea. Isn't that what experimentals for in this case? No, because you're not testing infrastructure. Right, okay. Yeah. Okay. Well we need both. We need to have a for this kind of thing if you want to have a way to exercise the infrastructure without risking the integrity of FTP master what you need is a separate staging instance of FTP master that people can upload stuff to and that will be published. So we're kind of going down into the specifics around particularly build profiles and the talk is actually general. So one comment I kind of wanted to make on the general overall thing is that my experience having done large infrastructure at Stanford was that there are going to be things that you're going to want to customize or tweak or fiddle with that are not in Debian proper whenever you have large infrastructure. And I don't think that Debian's infrastructure itself is any different than any other enterprise environment running Debian in that there will be things that Debian's infrastructure itself will want to tweak. So apart from the problem of the code not being tested there's a trick we use at Stanford where we would have a private app archive not only just private to us but where each role of service had its own specialized distribution inside that app archive and you can upload packages targeted at stable-eldap and only theeldap servers would see those packages which it gives you a little bit of flexibility to expose packages with custom tweaks to only a particular part of your infrastructure without risking pulling them into other places where they may not be appropriate and I don't know Tolik can probably comment on what kind of mechanisms you have for doing those kinds of overrides already but we found that mechanism extremely useful at Stanford in unblocking little problems like this where you need to put a custom hack on a particular set of machines you still wanted the whole packaging infrastructure around that with upgrades and everything else but you didn't want to expose it to all the other systems. You said canonical does exactly the same thing. Okay. Yeah, and another case for the general problem is cloud images I'm at Google right now but it also applies to Amazon and every other cloud out there as well as private clouds like OpenStack it's not just the public clouds where everything is changing at a faster pace than the two-year cycle and maybe you want to integrate with the host environment get some bit of metadata or access some API and if we if we put a version of a tool into Debian assume no licensing issues, if we put a version of the tool into Debian before freeze then it's already halfway through its lifetime as of the release and then at some point we have to convince people to pull things in from back ports to keep things up to date and similarly some of the settings that are great defaults in a local workstation environment may not be suitable as a default for the cloud so there's a lot of settings and features that need to be added sometimes. So it seems like again we're focused very much on Debian's infrastructure here as one possibility for things that block us from making changes in less than two years but there's a related problem also about de-packaging where there's a certain class of in-archive changes that we can't make for a couple of years because first de-package and apt-in stable have to understand them so that you can upgrade. I'm wondering if solving both problems we might want to say well, our packaging infrastructure as awesome as it is that it can upgrade itself maybe we should pull the packaging infrastructure out make it a little bit special and say hey it can upgrade itself in a way that it does so first and then parses the rest of the packaging method metadata for unstable or next stable then we could upgrade whenever we feel like it. Certainly, as you say I think if sometimes you want to be able to upgrade de-package and apt-in first and we've basically done that by telling people to do it in the readme before I think I don't know if we have a mechanism obviously canonical does. So I mean there used to be a but it was years ago it was like potato era that we had a special archive that you were supposed to upgrade the tool chain from before upgrading the rest of the system lost the thought. So just in general with this kind of infrastructure problem we're talking about here this was actually in fact very narrowly prescribed where the only compatibility issues there would be between stable and unstable would only be in parsing of build dependencies which the average system doesn't need to do at run time and you could simply say very likely to be fairly harmless for that reason. Right and sure there would be compatibility issues and people might be trying to build stuff but then you could just tell them ah here's the stable release update, the stable updates package that you just pull in and you could do that but even that we've found has been difficult to do for a variety of reasons. I'll cede the floor to Colin now. Just going back to Josh's point the for a succession of Debian updates of various kinds we've had release notes that had a procedural system for performing the upgrade in various cases and that's obviously been as varied as the types of changes involved scaling up to anything like the libc transition of your or $8.12 and when we were thinking about this for Ubuntu upgrades we sort of had the idea of executable release notes, I don't know if Michael's in the audience but this turned into the release upgrade that we used in Ubuntu and the idea of that is that you have a blob that the upgrade fetches from the archive that knows how to upgrade to a particular target and that procedure may involve going and fetching a different set of packaging tools from somewhere else but rather than having to ignore bits of metadata you have all of this happen before you even look at the next set of index files at all and you have the opportunity to install whatever preparatory tools you like and I would love to see something like that in Debian as well, although it's a lot of work to maintain. So in particular if we have something like executable release notes there's no good reason that they need to be limited to a stable release only, we should have incremental steps much like database incremental upgrades or a similar where you say to do this upgrade you need to apply this thing outside of the packaging system and outside of any particular package and just run it and then you can proceed to read the packages file but apt would need to proceed to do that before it can go read packages.gz or similar so that if it needs to upgrade its parser to handle a new version it can. We could incrementally apply upgrading from stable to stable might involve 27 individual executable scripts and upgrades but that would mean for example we would never have a problem again like how do you upgrade from user doc to user share doc without a five-year plan. That definitely work. I guess I'd like to bring the discussion back to the earlier point about staging of infrastructure because regardless of what we might do with upgrade or scripts we still in some way have to solve that problem of how we get these changes in and obviously the right answer is that we should have a staging server but I would like to ask in general I guess I would view that as whoever is running the production service should kind of have a staging service where they can stage the stuff and it does dsa, does ftp master today have a way to do that? Should it really be the responsibility of every individual who wants to patch it to provide the staging infrastructure? Now obviously you should test your patches before submitting them but shouldn't there be a way to test the environment before being pushed into production anyway and shouldn't that be something that's handled centrally? I guess there's two things. As you say we could probably just better document the process of setting something up yourself so you know Tulliff says it's not actually that hard. He's probably right but you know it seems like a scary idea and in fact if it turns out it's not that difficult just telling people how to do that maybe it already is you know there's a page somewhere. That would help a lot if that was in fact straightforward. There's a second thing maybe we should run something ourselves anyway and then it's easy to try. You do get the kind of experimental problem that you've got 16 different experimental things in it so it might be a bit broken. Which is exactly why you wanted an integration branch before you pushed it out to production. Okay. Is that what you mean by a staging branch? We're using a staging implementation because we're putting software from development onto staging so staging is running the tip of all the different projects that it's actually managed into run and then we watched that for a week before it actually goes through to a production release. But the problem we're getting with that is yes it's fine for the developers who are interested in these kinds of problems to submit data to this staging instance but that isn't real data it isn't actually what's going through the production instance and you end up with corner cases you end up with failures and so a false assurance and therefore bugs in production that simply because the data you're putting through the staging instance isn't a mirror of the data that's going through the production instance so you would need to actually then make sure that in Debin's case it probably might actually be easy to do that because then you could just mirror incoming and making sure it's going through but it's just making sure that you think about that as well because if you're going to have a staging instance the data going through it has to be the same as the data that's going through the production instance otherwise you're going to invalid test. Okay although in this particular case you explicitly want to test some other things but I guess you could test those as well so yeah as you said it's pretty easy to test that in Debin's case you would mirror incoming or have the FTP uploader actually upload to two places to answer Steve's question about staging infrastructure we generally don't provide that unless people ask for it we do have quite a bit of free capacity so a full extra copy of FTP master is probably it's doable it would stretch a bit but doing kind of a reduced scope FTP master is absolutely not a problem I guess that's the question is how much resource does one need to set up something fairly realistic so the big problem with FTP master is really the size of the archive in terms of I'm not entirely sure how that big is nowadays but I'm guessing it's growing on a terabyte and so that's expensive in terms of actual disk space but nothing else so you don't necessarily need to do all the architectures a couple might suffice for tech purposes yeah exactly so you've given me an opportunity to be on the drum that I actually started beating on in New York which is that all of the questions about how we do this in Debian aside those people out there who are running Debian and want to include some local packages all run away from DAC screaming because no one has any idea how to set it up or how to run it and if someone could document how that stuff works you would solve problems for so many people outside of Debian I mean really you would be a hero to just explain how do you start from scratch and set up DAC and a build D network and there would be numerous enterprises that would love to have that how many people know that how to stop them being quite so busy go volunteer to do their jobs for like a couple of weeks it would be easier to find people to trust them if it was documented I discovered a build D's that again that's quite fiddly and not very well documented so I wrote a page which is slightly half fast of course and now we've got the old page and the new page which don't quite agree with each other and I was talking about doing for Debian ports and they were talking about doing for Debian and now of course you really want to merge those but yes so I did a bit of that I guess if I could be persuaded to actually try and set up a thing I'd write down what I did which would be useful but I still wouldn't necessarily understand it so smattering of random points here to your point about staging and the size of it in theory you could throw away all of the packages in the archive as you were setting it up because the only things you're actually testing are the infrastructure and you care about the packages files being valid and so forth and the actual contents of the archive you could process once and get rid of and that would be perfectly fine for a quick staging thing if space was a concern second we do have one member of the FTP master team here at DevConf this week Paul Taguiamonte so we should find him and talk to him about whether he wants well whether they want a staging set up and again to the point about ease or difficulty of setting up DAC there seem to be a variety of opinions here I think people who have done DAC setup need to understand that a README setup that tells you how to set up DAC is not the whole picture and it's not the quality of documentation that we're usually talking about when we're expecting random people to be able to bootstrap themselves into a thing unless there's a linear series of these are all the things you have to set up and as far as I'm aware we're not using DAC out of packages for the archive so the only thing that's being as basic as where do you find the right software you're supposed to be running is an issue and these are all things that if you expect people to be able to get their minds get a handle on this stuff there needs to be a clear set of steps that anybody can follow from start to end it may well be that most of the off-puttingness is just that it's not it's mysterious rather than that it's actually difficult well the I've operated DAC later tried to set it up from scratch so I mean I in theory know what to do I know the conceptual steps involved I still couldn't do it it's not that it's not merely a problem of lack of documentation at least as of mid last year as far as I can tell it was not possible to set up the database from scratch given the get branch the database schema did not exist outside production but what I was going to carry on to is that this is exactly the kind of problem that you solve by having a staging server which is not operated by the developers you must have your staging server be operated automatically in such a way that in order to get changes on to production you have to have the your schema changes in gets properly otherwise they can't happen and otherwise it's way too easy for this kind of thing to creep onto production just because of perfectly normal people doing what they need to do you mean the DAC developers? somebody else needs to so I guess I'm a little bit naive here but I was expecting that most of our infrastructure can be set up simply by running puppet on a two blank server somehow and magic happens and things work again so that if our hard drives break or Astrid crashes the building that runs the server we can get up running again in a day or so so maybe that's not true but if it were true that might solve the problem for people who want to contribute they can just run the same deployment scripts and maybe modify them to change host names and get this thing running the same way it's running on our systems and I've had quite a few instances where I wanted to change a little thing at some infrastructure in this case dead bugs but I couldn't test it and the amount of effort it takes me to find out how to set up an instance how to test it is just what's the benefit from this little but still useful change and if all our infrastructure were reproducible reproducible in a change route or in a docker instance these days I guess within one script or in one command I'm sure we get a lot of nice little patches for various knowing things I think you're quite right I want Wookie to a little bit before about the buildies that are not easy to set up I think one of the program is also related to puppet in Debian it's very great that everything is integrated in puppet you can set up a build them in less than one hour now but at the same time a user wanted to set up a buildie doesn't have puppets it doesn't get the