 So we're good to start now, we have Colin and Steve and they're going to talk about Ubuntu Daily Quality and enjoy. All right, hi, welcome. Hope you've all had a good lunch. I'm Colin Watson, this is Steve Spangasek. We've both been working in Ubuntu for many years, among other things, we're both in the Ubuntu release team. And we've also both been developing developers for rather more years than that. We're co-release managers back in the Sarge cycle and so we share an interest in the in the Debian release cycle and in its more detailed mechanics and how they might be improved. We'd like to talk with you today about some of the things we've been doing in Ubuntu to help improve what's been called our Daily Quality. That is the ability for reasonably skilled users, people who know the way around a computer but who aren't necessarily developing the distribution itself, to use our development release from day to day and not have to worry about things breaking on them all the time. So this is very important to us, obviously. We want as many of our developers as possible to be dog-fooding the next thing we're going to release and we really don't want to cripple their velocity by making them deal with avoidable problems all the time. This is meant to be a buff, it's not meant to be a lecture, although I have a half dozen slides or so at the start to explain what we've been doing and start things off, but this is Debcon, FOPACY, one of the main things I want to do is figure out how to use our experience in Ubuntu to help improve Debian. So we used much the same workflow from the finding of Ubuntu back in 2004 until about two years ago. Please excuse my polling graphic skills. Warty, that was Ubuntu 4.10, was a copy of Unstable. We had about 10 or 15 Debian developers beat on it until it worked. From Ubuntu 504 onwards, we started each of our six monthly release cycles by syncing verbatim copies of anything that we hadn't changed straight from Debian Unstable and we went around merging as many of our modified packages with Unstable as we could possibly manage. So we did this for a few months, then settled down to close to floodgates again and settled down to release. Once or twice we tried syncing from testing instead and hope that that might make things a little bit more stable for us, but that had its own problems as it turns out and I suspect some other derivatives may have had similar issues. We often find that bug fixes we cared about were significantly delayed either due to the Unstable to testing lag or sometimes just because they were waiting for us to get around to manually merging something. Also the safeguards inherent in testing help out Debian, but they don't help out derivatives as much as you might think because they often depend on things like the build order in Debian and so on, which we don't automatically inherit. So in practice we mostly stuck with Unstable and we dealt with the problems that resulted from that. So yeah, this kind of worked. We generally managed to get everything settled on just by release, but it really sucked in many ways. We used to have an alpha one release, our first alpha, and that was more or less the first thing that we managed to get to build at all. Before that, only the really kind of scarily brave people ran our development release and to be honest, if it broke, we didn't have a great deal of sympathy for them. We told them not to do that, they did it, it fell over. But this did of course mean that there was a certain amount of impotence to overcome to get people to start running our development release once it got to this point and things actually worked. And we spent an increasing amount of time on support dealing with stupid things that really should have been sorted out automatically. With apologies to my co-presenter, when we enabled Multi-Arch in 11.10 for AMD64 users, that made things worse in some ways as well because it now mattered very strongly that AMD64 and i3 at six were exactly in sync. And if Debian switches on Multi-Arch by default for a large population of users, I suspect we'll see the same thing there. And that was often not true during development. So at the start of the 1204 cycle, I persuaded Canonical's engineering management to let two or three people at a time rotate it for a month or so into what we called plus one maintenance. And that was spending most of their time just doing packaging level maintenance of Ubuntu plus one of the development release. So their job would be tackle build failures installability failures, make sure things kept, make sure our images kept building reliably and generally to keep our backlog of technical debt under control so that the release team, we wouldn't then have to panic for a week or so anytime we had to put out a milestone release. And a side goal of this was to try to spread more knowledge among our engineers who weren't necessarily quite as familiar with Debian packaging as some of the rest of us and to get more of them contributing patches to Debian. And this all coincided with a general push among our management to try to sort out quality problems. So we, you know, we started to see problems, to see things like the director of Ubuntu engineering looking at the uninstallable package list and nagging people about it. And that certainly made my job a lot easier. In the 12, so in the cycle after that, 12, 10, we did a lot of proprietary infrastructure work, mainly particularly on our main infrastructure system, Launchpad. Most of this was moving thing, moving some of the archive admin scripts, the equivalent of the interactive parts of the DAX suite. So things like override changes, new review, that kind of thing. Out of the core of Launchpad, where we could only run them by SSH into a scary privilege machine. And out to general purpose APIs that we could let people who weren't canonical employees use them, that sort of thing. This also had the useful effect that we could maintain our own tools a lot more easily, build better ones. But as a side effect of all of this, at the start of the next cycle, 1304, we realized that this would be pretty easy to do. So I don't think the name's actually mentioned in the slide, but Brittany is, of course, the program that Anthony Tons wrote way back in 2000, I think it was, to manage the propagation of uploads from unstable to testing. And various people in the release team have been hacking that ever since. In October last year, I hacked it up to work in Ubuntu and convinced our infrastructure, sort of with a hammer, to redirect all uploads from Raring, which was the development release for 1304, to Raring proposed by default. So this meant that for us, Raring plus Raring proposed became more or less equivalent to unstable. And Raring and its own, which is what we were telling all of our development users to use, was functionally equivalent to testing. There've been some teething problems with this as developers got used to it, but the reactions have been, I think, overwhelmingly positive. People have very quickly settled into the assumption that our testing equivalent, by and large, doesn't break and upgrade. We miss the odd one, of course, but the cases where it breaks, any sizeable, the cases where this has broken any sizeable number of people, there've been few enough that we've been able to go and explicitly investigate each one, as opposed to the previous state where we just kind of come to accept breakage. One of the things that came up in several of those investigations was the need for packages to be able to make assertions about the behavior of their dependencies. People wanted to be able to say, my graphical application needs to keep working in the following ways, even if GTK changes, let's say. And they wanted to say, if those tests start to stop failing, then keep the dependent upon package out of testing. So in 1310, that's the current development cycle, we've started using Ian Jackson's auto package test tool for that. That's also Debian enhancement proposal eight. So auto package tests are triggered for us now whenever the package is built in enough architectures. So you can use this for package local tests, some people do. But the real part is that auto package tests are run for packages reverse dependencies to whenever it's changed. There are various other things we've been doing. We've started doing phased updates for some of our stable releases so that we can rule out changes to a subset of users to start with rather than everyone at once. So packages uploaded post release get a special control file, control field applied as an override, and that's gradually ramped up to 100% over time. Whenever our error tracker system, Errors.upinter.com indicates that there's an increased number of crashes coming in, or new crashes, then we suggest that we assume that's a regression. We back off the updates, apply to users, and start investigating manually. And of course we've got other future plans. We've lots of continuous integration. We want to make that go as fast as possible. We can now turn around source upload to installable image builds in under two hours, even on ARM, depending on the complexity of the package. And we think we should be able to go faster still. We also need to make sure that developers can understand all of this and have some idea of where their changes are. I don't know exactly what we're going to use by way of a dashboard law, I've talked to Raphael briefly about the changes to the package tracking system, which perhaps we can use. And maybe I can persuade people that it's safe to have shorter freezes now that we have all this kind of stuff in place. So, both of us I think we're in the Debian release team for long enough to have a general idea of what we didn't want to happen. We didn't want to end up with a huge backlog of work in develop proposed, for that read unstable. That's blocked for weeks for landing in our equivalent of testing. That is exactly technical debt. And the less of that we have, the better. We also didn't want to rely on humans running proposed and reporting problems to us because half the point of all of this was to minimize the disruption that was caused to users of the development release. So we decided to use proposed purely for automated tests and not to do any checking of release critical bugs, which would mean users would have to be running proposed in order to be able to find them. Or we also decided not to have any best line delay before a package could migrate. So this seems to be working pretty well for us in practice so far. It means we can keep the delta relatively small. When I wrote this, I went and looked and there were 280 or so unmigrated uploads in Ubuntu, there were about 2,600 in Debian. And most of our human developer attention is focused on what's in develop, that is what we're going to release rather than what's in develop proposed. I think there are several things Debian could improve on too. Well, this is the point of having a buff, so hopefully people can suggest others. Keeping testing as current as we can is something that Debian developers should be doing and it's something that's in all of our interest to do. It usually involves fixing bugs. It makes our release process run more smoothly, which we all complain about when it doesn't. And it's better for our users. One reason I think we haven't cared so much is that our users are split between testing and unstable and they have to be. It's a vital, they're a vital input in the decision of whether to migrate and upload to testing in the first place. Lucas's opening presentation this DEBConf showed he showed data from ftp.be showing 12%, I think it was 12% of hits to testing and 11% to unstable. Look, we've divided and conquered our own user base. This is ridiculous. Working on testing migration problems is a really slow, I've done a lot of it, it's really slow, it's a process, it's often very frustrating. You have to wait a very long time for anything to happen generally, although you can sometimes make some predictions. We find that hooking automated testing into Brittany is a very powerful tool. There's a small but growing number of packages that have auto package tests in them in the Debian archive. There are other automated possibilities such as lentian, adequate, pure parts. And we might as well use all the tools available. There's absolutely no point in having humans pay attention to things before automated systems have done their work on it. As, so as a straw man to start off the buff, I'd like to ask what you think of this. Cut the migration proposal, sorry, cut the migration delay for testing in half, starting now. Encourage developers, especially any developers who object to this, to write auto package tests or whatever other automated tooling they want, to, for the things that we're currently relying on users to catch by being our human safety net for unstable. And as we ramp down the delays for that, I'd be willing to bet that this makes it much more attractive to work on migration problems because you'll get feedback more quickly. I'm also willing to bet that it makes testing more attractive to users and we should be able to test this by looking at similar kinds of mirror stats. And our end goal, our end goal I think should be to have unstable to testing consists solely of automated, of entirely automated tests so that we can start honestly with a clear head and encouraging all of our users to use testing not unstable. So, does anybody have any comments or wish to throw tomatoes at me? Joey. He's looking for the tomato. Somebody's passing it now. I think this is a great proposal. I had a couple of questions. How many architectures is Ubuntu currently migrating into testing at the same time? Four at the moment will be five soon. Right, so that might be one reason that you're a lot lower also because we have things like broken S390 that breaks all Haskell packages. This is obviously not a complete solution to all the migration problems but I think it is something we can easily tackle. Yeah, the other point I want to make is that we can, individual maintainers can already cut the migration time in half. Obviously you know this. You just change it to medium but that doesn't help your dependencies migrate. There's also kind of a social pressure against doing so. I mean, sometimes certainly the... I do, whenever I feel like it. Nobody's ever said a thing, so... Well, that's true. That's because you're immune to social pressure. That's you, want to solve the general problem. Right. So I knew that there are these teams sometimes on the edge of things when they feel it's inappropriate. Yeah, okay, and in general, I think it's a brilliant idea and I think why shouldn't we just try it and see what happens? What's the worst that's going to happen? We're going to get a log jam or maybe things are going to be a little bit worse in testing for a while because we're not tested as much, right? Right, yeah. That's the worst-case scenario. The best-case scenario is that we have half the packages waiting to migrate or something. Like I said, I think that we should kind of wean ourselves off the idea that we have to have all this human testing of unstable first. Yeah, so expanding on the architecture point, so one of the comments that Colin made in his presentation was the fact that we do auto-package testing on, I don't forget how you worded it, as many architectures as possible, it might have only said, what in practice- I was a little vague to avoid overloading that. Well, great, I think it's worth mentioning that in fact what is being done in a bunch right now is those packages are being tested on, I think they're being tested on AMD64 plus i3-6. Correct. And so we don't have any testing, for instance, on ARM, which is a very important architecture, but in order to not slow things down, we consider the testing on x86 to be sufficient to move it through the chain and that way we don't have to wait for the much slower, we don't have to wait till the end of the ARM build to start our testing. And so there's a little bit of parallelization there, which we think is a good, it's a sweet spot in terms of the trade-off between automated testing and the delays. I do consider that a compromise, but it's a, I think a reply- It's the current sweet spot. Yeah, just to the urgency thing raised before, maybe we could just make medium the default that would have the advantage also that you can have a longer period if you think that this is an uplight that is likely to break something, but my question is something different. I was just thinking that the human testing part is also quite important and there are some kinds of problems you can only catch that way and we want to have these problems out of our release, which is kind of testing or testing becomes the release. What do you think of having a stage before unstable that just does these automated tests in the same way you're doing it for Ubuntu? Yeah, I was arguing with Nils about that on Monday or Tuesday, I think. My worry about that is that it runs the risk of fragmenting our, some users will decide that they're going to use that because that's what the package they want is in. And I'm concerned that that would run the risk of fragmenting our user base even further. So in practice, we end up getting lower quality testing because we're now divided among three targets and not among two. I'd like to see, I agree that we need, we clearly need human testing before we can actually release something, but it's not like we ignore release critical bugs filed on testing. I just think that we should wet until things have passed all of the automated stages before we unleash them on humans. Right, so it's really interesting. It's probably worth talking for just a sec about how we ended up with this thing between unstable and stable being called testing. When I first dreamed up the notion of using the concept of a package pool and having in effect different releases as sort of reference things into it, it was specifically because I thought the vast majority of our users wanted to run something that was a little more structurally guaranteed to be correct than unstable but didn't want to have to wait for a stable release cycle on client devices. It's just that the person who initially implemented the code for doing pools and to create the thing that ended up being called testing was a release manager trying to figure out how to solve a rolling release candidate problem, i.e. AJ at the time he wrote that code. And so it's kind of interesting that in Debian, this thing between unstable and stable has always been thought of as the rolling release candidate for the next stable release and we call it testing and all of that. But in fact, I've been hoping that someday we would get to the point where that went back to the original objective and vision of being something that most of our users would want to run most of the time on their client devices. So I'm actually very enthusiastic about this and it's a great idea. And if I thought there were a need to have some additional human interface and testing, I would want to put it downstream of testing and before stable as opposed to upstream of unstable if that makes sense. It's interesting. This is, I think the first time I ever met Manoj in person was at a USENIX tech conference in New Orleans years ago where he and I had a lengthy discussion over a lunch about whether it should ever be the case that things automatically promoted out of unstable at all. And because I had this notion that we wanted to have some release that was more structurally guaranteed to be correct than unstable, I was a big fan of automatic promotion into such a release. And I think he was thinking of it much more from the what will release a stable next time and he had this notion that a package maintainer ought to have to consciously make the decision that these are the bits I want to see in the next release. And so I know that we're always gonna have that tension. He and I in fact got to the end of that meal agreeing to disagree over how that should work. And so. I'm quite sure, yeah. Yeah, and so this is one of those places where I think we'll continue to have this debate and discussion but I would really love to see more of this happen. And in terms of shortening up the times that's always been a hack to sort of balance the stability versus ability to catch RC bugs thing. So I'm not bothered by that at all. I'm actually kind of surprised that people haven't played with the time since AJ really very much, but. So I did, yeah. My feeling of the delay that happens between unstable and testing now is that we have painful and long transitions. You said that these painful and long transitions don't happen or happen less in Ubuntu because we don't have users using the things before the transitions so that you don't have RC bugs. Is that right? Not quite. We have taken the human factor out of the migration process which is not to say that we ignore bugs. Or at least I hope we don't ignore bugs. But the main difference is simply that it's possible to, you know, you've got, if you've got a stack of things backed up which I understand is often the case for the release team nowadays. There's like a queue of half a dozen or more transitions waiting to happen. It makes a big difference when you can do this within maybe a couple of work days end to end versus having to wait for weeks just in order for everything to get old enough and then the clock resets anytime anybody uploads anything. This is a fairly regular complaint among maintainers and certainly the release team do sometimes intervene and poke things by hand. But it's really a lot easier when your aim is to ensure that everything is automated test for structural correctness as, if BDL doesn't mind me appropriating the term, rather than relying on, has any human happened to notice that something is wrong yet? So if I understand correctly, if we take the delay down to zero for all the packages involved in a transition, they will transition as long as we solved all fail-to-build from source and then the transition happens faster, right? Right, I'm not expecting anybody to do that immediately but I think I would be a nice girl. Ian? So I'd be interested in hearing, there are a couple of members of the release team here in the audience. I see at least two of them and I don't know if they'd like to comment. So Colin and I, our experience with Brittany is from the battle days when before Brittany knew how to handle NBS libraries and deal with untangling library transitions. So it used to be that you would have to get a library transition run through and you'd have to be at the helm driving it through because if you didn't get it done and somebody else who wasn't paying attention uploaded another library, then suddenly you have these both libraries both going through SO name transitions, you have to get them through into testing at the same time and then you've got, you've expanded the set of things that have to be in sync at the same time to transition and then that makes it longer which increases the chance that another library gets added to it and I believe Brittany now, in fact, lets us carry, lets us keep old and new binaries in testing at the same time to untangle some of that and so I wonder if what the release team's initial reaction is to this proposal and how much they think it will help Debian with their problems with transition. Do they see this as a good thing? Is it a labor saving thing for the release team and what do they think the outcomes would be? He's hiding behind a desk. I guess, Julianne, you're the only one who can answer that question. Yeah, so this feature of Brittany helps a lot to be able to move things, not in just one big blob of packages. So one thing that delays transitions today is when, so we don't allow this transition before the old library is decrusted in unstable usually, which means all reverse dependencies have been rebuilt on all architectures against the new version of the library. Sometimes we forth it through anyway or we ask FTP master to ignore the reverse dependencies and remove the old library from unstable. Maybe you should do that more often, I don't know. But do you see this as still being something that would improve your experience as part of the release team? Having more automated tests and cutting the delay. Yes. I think it was trying, yeah. I don't want to put anything in the spot, so I think it would be more appropriate to take this to a mailing list of people like it, but yeah. It certainly sounds very interesting and possibly... Ian had a question just before or a comment. I just wanted to make a comment about your automatic testing of the R-depths. I think that's a very interesting, it's almost like a social hack because what happens there is if you're annoyed by the way that some other thing keeps breaking your package, you can write the tests that will prevent that other thing from migrating. Takes a hand. This is a really nice way of encouraging people to write the tests that ought to be cared about. I haven't seen anybody use it explicitly that way yet, but... We kind of did a bit. There were concerns about, I think, GTK and PyGTK breaking in certain ways on major upstream updates, so there's some defense against that. How adversarial this is, but I wouldn't like to comment. So, at first, when this auto-package testing was introduced, I didn't like it very much because you have to put some effort in writing these auto-package tests. And it's not very uncommon that we have different test environments for our builders and for test environment. And then you are adding a lot of time in determining what tests do fail in the test environment, which ones fail in the build environment. I mean, we have something similar on our build keys, which do not have a common setup at all. So... So, you have to put some extra work into that. And the second thing is you need somebody running these auto-package testers and monitoring them. And if... Well, at least if you strive for migrations within some hours, it is noticed that, well, if the auto-package tester doesn't work for eight hours, it's a problem. So... I don't know what the capacity on things like... We're using Jenkins to run these tests as the driver. I don't know what the capacity on Jenkins.dev.net is like. How many architectures it has available, et cetera. Sorry, Axel. Try again. OK, now it works. Thanks. I'm glad that we're back to the tests, because my question would be, can you elaborate a little bit how that auto-package test thing works? What's the maintainer's part in that? So, as far as I understood it, every package needs to have such tests. Otherwise, your scenario wouldn't work. Well, only if we used auto-package tests as the only mechanic. That's what we're doing in Ubuntu right now, but I think we could easily add things like pure parts into that to add an extra guard. But as for your question about the mechanic at the moment, one adds a file called debian-test-control. You can go look at Ian's documentation. It's probably the best thing, but the idea is that runs on over-installed packages, and you can run your packages unit tests, if you like, or you can add... More commonly, I think you would add some kind of integration tests that exercise the behavior of the whole thing rather than at the level of C functions. I do think we need more coverage than just manually added integration tests in order to do a good job of this. Is there a number about the current coverage? I believe we have on the order of 100, 200 packages with auto-package tests at the moment. Right, and that includes such key packages as egilibc, Python 2 and Python 3. In Ubuntu, we're currently working on getting some good integration tests in upstart, which is a little bit tricky because the testing environment implies that we require nested KVM to accomplish that, and so we're running into some implementation difficulties there, but a lot of the core stuff is tested, which when you think about it is also the stuff that... When you're talking about a seven-day delay for Brittany by default, if you look at the edges, how many users can you actually count on installing that package and testing it before it migrates anyway? The seven-day window is a heuristic to catch most of the stuff that's common and that's frequent, and the breakage is going to be most severe for. It catches brown paperbag bugs. It doesn't catch subtle, really critical things at all. I'm sorry, which doesn't catch that? I think that the time delay catches brown paperbag bugs pretty well, but anything that will only show up once a broader audience starts using it doesn't tend to catch it all. I had a question. Sorry, a lot of the things that it does catch are things like pew parts in particular would catch anyway. Right, so I got approached a while back about adding autotest support to one of my packages, and I pushed back fairly hard because that particular package has a really deep internal regression test suite, and that left me struggling a little bit with the notion of do I really want more than one regression suite for this package, and I understand conceptually that there's some things you would want to test about the packaging that are different than things you might want to test about the packaged software. I was wondering if either of you could speak briefly to sort of what the situation's been like. Are you trying to sort of externally run any of the tests that are part of package regression suites as part of this process? I certainly have been in the mode where anytime upstream has a regression suite, I turn that on in the package build, but I don't know that there's a very strong ethic around that in Debian package building generally. What are your thoughts on this? How should we think about those two kinds of testing, et cetera? So with our implementation, and one of the things that is a definite advantage to running that very same test suite that you might run at build time to run it again as an auto package test is the fact that we do run the auto package tests when a package you depend on has revved, which means that they do get used to pick up regressions in your underlying libraries, or perhaps not regressions, but they identify unexpected assumptions in your package and things like that. So that's a- Is there any particular harness required on the part of the package maintainer to include in the package for that to happen, or are you just noticing internal regression suites and using them? How does that actually work technically? It does have to be declared. You would use the Debian stuff that basically eons documentation to say how you wire it up. As Colin mentioned, you create Debian tests control, and one of the things you can say in that file is you can say to run my auto package tests, I require a copy of the original source package unpacked for me. And then you can do all that and unpack it and everything else. And so it's fairly flexible. I mean, like I said, we're currently right now working through how we can use upstart to drive VMs from the harness. So... Can I just- Oh, sorry. I'll just say something quickly. Go ahead. I know from my work with known packaging that no much stream is starting to push some of this. So as installed tests as part of their release criteria, so what we've started harnessing when they've been making these available within the known team in Debian, but mainly in Ubuntu where we actually run this stuff, is running these installed tests as the dep8 tests. So what they do is essentially what we were just talking about here, which is just enabled in Fajilip, which basically runs the entire relatively comprehensive test suite as an auto package test using the provided runner and the methods for installing that are now being starting to be provided by upstream. So where upstream is providing this stuff? It's, it can become pretty easy to actually implement this in the dep8 tests, which at the end of the day are just shell scripts. So we just have to say, install the test runner, run the test runner on my package and then you get all of this for free if upstream is helping you out here. And I wanted to make another comment which is for the Debian release team, you might want to be interested that in the Ubuntu release team, we have to currently, where we're kind of starting to bootstrap this process, we have to spend a reasonable amount of time dealing with broken tests. So you might want to be aware that maintainers may not always be keeping on top of their auto package tests. And if you're having these run by RdApps and then preventing propagation and there are things you actually do want to be migrating, you're gonna need to be watching. And we've actually, Colin actually added some force hint types to Brittany to make Brittany ignore the auto package tests both for individual packages and for when they're triggered by RdApps. So it's a bit of work that you're gonna, you may have to be taking on. It depends on how you go about it. I took the approach that, I took a slightly absoluteist approach and said that all tests had to pass in the story. And I figured that we had few enough tests in the archive that this wouldn't be a big problem. In practice, we're a couple of months in and there are still a bunch of failing tests. Maybe I should have used a ratchet instead so that you have to improve the... Yes, but it also, I don't know. If you get tests regressing and then the maintainer happens to have gone MIA for a bit while the test regress, then you're, I mean, any sensible implementations probably going to be preventing migrations in this situation. In which case, if the release team is looking to be pushing transitions into testing, they're gonna have to be... If the test is failing, somebody has to own resolving that. Whether resolving means the package is broken and should be removed from testing or oh, it's a regression. You may open NMUs considering these are RC bugs so that people can NMU to fix them. So you're expressing concern about there's a regression. So the test suite is in a package that is not the one that got uploaded, right? Because if that maintainers MIA, they didn't upload it. So you've got your underlying library package which ran the regression tests, which ran the tests for the reverse dependency and that reverse dependency didn't work. So somebody owned that issue. And whether it's the maintainer of this package or whether it's the release team or whether it's the maintainer who uploaded this library, at that point you have to have a human involved to figure out what exactly is going on. But I don't perceive that being a huge burden, at least not any time in the near future. No, I wasn't trying to make any assertions about the size of the burden. I'm just saying that it might be a new thing the release team has to become interested in. You're right. Test failures and deciding how they're gonna have to be dealt with. Okay. Ian had a... I'd like to just respond to that before I make my response to BDAL, which is, yes, this is a potential problem, but it seems to me the same kind of problem as if any R depends of your, if you're a library maintainer, any R depends of yours has an RC bug that prevents your package propagating. And ultimately, you might very well end up doing an NMU of the RDEP. And that's just what you should do. And if indeed the test is broken, then you can always NMU the RDEP to fix the test. So in response to BDAL's question about how you have to wire up something for Depeit, it is true that we don't automatically spot upstream test suites or somehow run them in the as-installed environment. And that's, the reason for that is that there's no standard way of taking, make check and causing it to run on the as-installed version of anything. And I don't think it would be practical to invent such a thing. But for most packages, when you actually want to do that, you have to write a very, very small amount of code, tiny little shell script saying, well, actually the test thing is over here and we'll set some environment variable, mess with the path. Normally you just have to undo the thing that the upstream make check has done to cause it to run out of the build tree. And the first package I did like this was Gork. And that's probably quite easy to see how I did it in Gork and do a similar thing. And I think there's probably about like 10 lines of diff to make that work. Joey? Two. I've been waiting for a while and now I got two. Okay. So BDL sort of asked about, I think also how many regression test suites were actually running. And one thing that we may not have thought about is that now that we're mostly using CDBS and DH, they go off and find them and run them and they can do this for standard things. And it may have gone up, I really don't know. I was kind of thinking about how Dep 8, you were talking about regression testing or I'm doing the test against the things in the library, they use the library. So could you just rebuild them or could you just say run their tests, build them and then run their tests and get the same result, do you think? We actually don't rebuild them because the natural, if it isn't an SO name transition, then the natural effect on the archive will be that the library, that the RDEV is not rebuilt. So we rerun the RDEV's test to make sure that it still works. Sure, so you're making sure that it has a broken compatibility. You're not making sure that, yeah. In particular, we explicitly want to find out that Brex without rebuilding it because that would be a problem. Right, because you may have an API change in a header that if you did rebuild it and ran the tests, they would pass, but the actual binary in the archive is broken, so we check for that. So another thing that's happening now as far as auto package tests that are originating out of Ubuntu land right now, concerns, stuff that Canonical is upstream for in particular, we're running a lot of the graphical stuff using a harness called autopilot now and there's talk about, I don't know that any of those are currently wired up to auto package tests, but there's been- They're not, and I think there's some technical difficulty which I haven't wrapped my head around making that work. Yeah, I mean, so there are some technical challenges making that happen, but autopilot is, it's not a new concept, but we run graphical applications under test harnesses that hook into the toolkits and tests for correctness at that level as well. So that might also be another thing that particular maintainers in Debian might be interested in as a way to to add auto package tests to their packages that they don't currently have testing. Autopilot, as I understand it, to best by knowledge, it's open source. We're using it in Ubuntu itself, so it should be. I have one more thought, which is you said, I think you said you only have about 100 auto package tests right now, is that okay. Something that- And there are important packages, but we can talk about adding more tests and that'll clearly make things better. And having the framework there to do it's a good thing. So maybe talking about more tests isn't the most productive use of this time, I don't know. Right, I mean, I view, I guess I view more tests as a consequence of making people, making that be the way that you ensure that you improve quality of testing. I think it's probably a futile exercise to wait for those to arrive before we change anything because we could wait forever. I would much rather put some kind of forcing function in. Any more questions? I think we're almost at time. Only one minute left, no time for us to answer your questions so you don't have time. All right, come and find us afterwards if you want to talk about this more. Thank you very much. Thanks.