 Ok, j'avais dit qu'il y a beaucoup de sessions QA-related à Beppconf. Et que, bien, les works QA semblent aller dans beaucoup de directions, mais nous avons un petit team-like interaction à l'intérieur du world QA. Le world QA n'est pas vraiment une équipe, mais c'est... Donc pour cette session, on a eu la chance de discuter ce que nous faisons entre les efforts variés que nous sommes en train. Alors, si vous allez au Wiki et regardez la page de QA-related à Beppconf. Vous voyez que c'est une page de Wiki avec des items de 2006, donc peut-être que vous pouvez essayer de le réfriger un peu aujourd'hui. Et aussi, pensez sur ce que pourrait être fait et l'improvement de ce que nous faisons. Alors, premièrement, j'ai essayé de lister ce que le team QA fait. Il y a deux grandes catégories. So feel free to just shout if I'm forgetting something. So first, we maintain infrastructure to detect problems and report issues found using that infrastructure, either using bugs or through other means. So for example, LinkedIn, and LinkedIn, the BN org, Qpads and Qpads, the BN org, CI, the BN net, Jenkins, the BN net, which is related to reproducible builds and the archival builds. And then we also maintain infrastructure to expose information to maintainers and teams. So mainly, Tracker, packages, QA, the BN org, so the old PTS, the Debian developers package overview and ultimate Debian database. And yes, there's also the MIA team, that is part of the QA team. So did I forget anything major on that list? I don't think so. So then I tried to think about new stuff that happened since last year. There's gating, pure Pupads being used for gating to testing. CI, the BN net could be at some point too. It could be one of the easy things to do in a similar way. This release cycle, I tried. Sorry. I tried to do that, this release cycle. So CI, the BN net, cool, nice. Another thing is archival builds happening again on a regular basis. So that's what I've been mainly working on over the last year. And then since this is above, I have a list of discussion items. So I'm just going to go through them quickly. And then we can go back to the ones that you think are the most important. So the first one is the QA Debian org machine, the QANTS, and the QA SVN repository. So there we have many services glued together, many unmaintained services too. And we could discuss what we want to do with that, especially in the context of the stretch release. We are probably going to be asked to upgrade it to stretch at some point. And that's not going to be something to be fun to do. So I'm of the opinion that we could probably kill QA Debian org entirely and move useful services to separate VMs or containers. It doesn't really make sense to have this org glued together. So we can discuss about that later. Another thing that we could discuss is from the infrastructure point of view, what we duplicate quite a lot is scheduler for QA checks. So we use different methods for each test infrastructure. You use something different for that. And clearly, as we could do better than that, I have at least a similar way to schedule jobs. About tools, something that is a bit lacking is a common tool to manage mass bug filings. So I have quite a lot of things for archive rebuilds. So probably if we had better tools for that, it would make it easier to file issues for other kind of failures as well. Then we could discuss about additional checks to do. So I have two ideas. I need to be careful about the size of the screen. Let me try to do something like that. We could discuss additional checks to do. And there's a tool listed there, but we could brainstorm about that. The question of broken packages in unstable. So there was a discussion on the debian QA list a few weeks ago about that. What was done was I filed bugs against all packages that were neither in jc nor stretch and still not in testing and not uploaded in the beginning of the year asking if they should be removed. We could go a bit further. So clearly we don't want to aggressively remove tons of packages. The goal is just to identify the ones that are clearly un maintained for years and that don't have anything to do in unstable. But we could have a kind of rolling requirement that packages cannot stay out of testing for more than one or two years and target zoos or at least go through zoos and ask if zoos should be removed. Lucas. Regarding this last comment, this morning I actually discussed the use case where we could have, so I'm thinking of the bike shed PPA thingy where packages that basically have a upstream way faster release cycle we have in debian where old versions don't make much sense. I could imagine that we have something that is in the archive remains in unstable and is provided via a bike shed or a PPA and then this requirement would basically mean that impossible. So I don't like this requirement for such a future use case. The important part is here, except exceptions. When I filed the bugs on packages, not in testing nor stretch, I excluded all zoos that were uploaded at least once since the beginning of the year. So if you have that kind of things, I think the goal is not to force maintainers to remove their packages. So the requirement is and not update it, I guess. Right. I was going to say the same thing that we have packages, we need to distinguish packages that are not suitable for a stable release because they're not stable in the sense of not suitable when they don't change and packages that are not suitable for a stable release because they're broken. And we have a decent number of packages that can never be in a stable release because it would just be useless, but we still want them. Potentially, we still even want to compile them against stable. It would be good to be able to have backpots or something similar. I think we are never going to just blindly remove everything that matches a list of criteria. In that case, I filed a bug against each of the packages that was in my initial list, saying that if you think that it should remain in a stable, just close the bug. If the bug remains for more than one month, then I will remove the package and before that, do another pass on the packages and identify the ones that should really stay. It would be nice to have a bit more in the way of guidelines for what we as a project consider to be a reasonable way of doing these should this package be removed, bugs. I know I've filed a lot of those from bug squashing parties but looking at a RC buggy package and going, I don't want to fix this. I don't want this in Debian. So I'm relaying one question for IRC, Raphael Herzog. He says, as tracker.dev.org maintainer, I would be interested in some shared infrastructure to schedule jobs and to run them. And then as the CI maintainer, I'm also interested in that, I think. To do what? Sorry. He's interested in some shared infrastructure to schedule jobs and to run them. Like the topic you said earlier. And as CI maintainer, I'm also interested in that. So right now, I use like a forever running batch of code like is there a new version of this package and then running the test but then I don't really need that. If there's something else that's going to request test then it just works for me as well. Yeah. The difficult part is that I'm not sure what is a good way to do that. I mean, from a design point of view, designing something that suits all needs is not going to be really easy. Yeah. So someone needs to think out about all the requirements. Yeah. Yeah, so I have to rewrite that thing for DevNCI anyway at some point because the current implementation is not optimized at all. It's very slow. So maybe I'll throw that in and try to think of a way of actually also being able to trigger stuff elsewhere and we don't even need to run that in DevNCI itself. Yeah. Maybe what we could do about that is everyone doing some kind of scheduling explain on the list what it's currently doing because that's something that's usually quite internal to the checks not exposed publicly and it would be interesting as a starting point just to compare what people are doing because, for example, I can think of two different things regarding that. For the archive rebuilds, I generate a list at the beginning of the rebuild with all the packages you need to build and then schedule using a central process that just does FSH on various nodes and for InUDD, there's a Uscanner scanner and that runs on a regular basis and just looks at which packages need to be tested. To do this. Yeah. Raphael also says this is something that we are reinventing everywhere which is the point we made already and we should be able to have some sort of high-level descriptions run these scripts in a CdCH route and send me back any delogues and the artifacts. Maybe we can talk a bit about possible checks to run during the restore release cycle So there are two things that annoy me quite a bit. First one is the packages that failed to build twice in a row. So a long time ago I did some work on filing bugs for those. So that's typically what you run to that when you try to debug packages that failed to build. You try once it fails to change something you try again it doesn't work but probably there are lots of packages in that case nowadays How relevant is that check still? How relevant is that check still in the way we currently I think everybody just builds every time fresh from the source package that you have so I think this check made a lot of sense in the past and I'm wondering I mean as a maintainer I really don't check anymore for this So I run into this when I work on a package and try to debug a build failure for example I know two of my packages which are extremely complicated to build properly anyway and if I'm not sure that it would build twice in a row depending on where you are actually so I think for my package it would be a lot of work for me to guarantee this I think extremely little gain so I do see it but it makes the requirements on my build rules an extremely lot higher because upstream doesn't help me there as well and I have to intervene a lot for my Debian package so today how we build stuff and how even I operate I think it's a hard requirement I kind of wonder whether if we have like a QA requirement that clearly doesn't work for a reasonable number of upstreams and there's like an easy way to not need this requirement like don't use de build use get build package or whatever I kind of wonder whether the answer is just don't have that requirement then like admit defeat and go well okay you're expected to build from a clean get tree deal with that we could do that probably the discussion to have on the either Debian developer or Debian policy many list because currently what the policy states I think is that all the targets are supposed to work even if you call them twice in a row without clean between them you can do Debian rules build supposed to work right but policy doesn't exist to make us happy policy exists to help us build an operating system and anytime we have like a policy rule that's really annoying to comply with we should like consider what is the cost benefit here is the cost of having this policy rule greater than the benefit and if it is throw it away exactly in this case I think the policy is old and in those days I think it made sense I think now rebuilding is just a lot cheaper the time of the developer is more expensive in that respect I think it's still quite annoying to have to really clean completely your get tree each time you want to test a small change to see it's extremely annoying for me to guarantee it so I agree with you that this could be really frustrating for you in some random package and you want to fix that particular bug after you build the system is so much screwed up that it basically has to do get source again afterwards so I think it's a requirement still makes sense for a few to get clean for several packages but when you for example have packages in the repository I think that could be relaxed because using the repository then helps you a lot to go back to a state which is more safe that's not a typical workflow we advertise when doing NMEs for example balance I think this build twice in a row is it a must in policy I think it's a should so it's not really mandatory as far as I remember so maybe we are nothing I don't remember those bugs being ROC I think there was some the RCS is decided by the release team so yeah I found an other interesting corner case where I went to create a source only build and it failed but it can be tested in sbuild I think sorry so you don't build all either any so just a source and I downloaded a source package wanted to fix something create a source to upload it not of course to unstable but for testing and it failed just a clean target but not after a full build but right before I did anything that would be surprising because when I do archaibry builds I start by building the source package it was something old in the clean target I will check that I'm interested because I think I should catch that ok maybe it was temporary it's only relatively recent that Sbuild supports sbuild I have a build tool that wants to do this and I had to have some really horrible workarounds while I was using sbuild from jessey backports or from like the sbuild branch or whatever it was but in stretch versions that should work I tried jessey backports sbuild so there was maybe a problem here could be I think if I remember correctly in that version you had to tell it build all the things except don't build arch all except also don't build arch any and then it like thinks it's doing an abd64 build or whatever that happens to produce no binaries and it gets really confused and if you look in the githistory of this build tool there's this horrible hack involving a pearl one liner to rename the changes file sbuild thinks it should have produced to what it actually produced but in stretch that just works fine so you might find that this has been fixed cool thanks so the other thing is packages that fail to build randomly so that's not considered by that I mean packages that fail to build and if you retry it works and try again it fails sometimes it's 10% sometimes 90% of the other case and that's quite a lot of those currently probably about 100-200 packages in the archive currently but my goal here was mostly was more to discuss additional ideas of checks that you have in mind no time to implement and would like someone to pick up cause that's a good time of the really cycle to do that so somebody mentioned the case where apart from testing build stuff that if there's an upload of a library that you actually want to check that everything that reverse depends on that builds at that time sort of what I'm now trying to do with auto package testers actually have that sort of gating before it migrates to testing so rebuild and I think it doesn't scale but it would be great if it could rebuild everything that reverse depends as it is in testing with the library from unstable so there are two ways to do that the first one is in the Ruby team we have scripts to automate that so we can just rebuild all reverse dependencies with a new version of the package and run auto package test with a version of the package of all reverse dependencies so that's something that we should probably push so that others can use it because it's really useful and the other way is I do archive rebuilds for new versions of interpreters or compilers before they ask default or before they reach the archive for example for GCC I do that quite frequently so that's an option it would maybe so from a maintainer point of view here it would maybe be nice if there was a way for a maintainer of a package to opt into this like set a flag in your changes file or whatever to say I think this one is kind of risky please do the big rebuild thing and don't land it until we have the results yeah that would be really nice but it's probably something that needs to be done using bikechats it touches like CI and S build and wanna build and all these things that like three people in the world understand right? okay sorry one as you mentioned there are only like three people in the world understanding those tools I think exactly because of this we might do something like that because then the people really don't need to understand them because there's some way to do those checks automatically so it would be a great thing I think well it doesn't have to be done using the same software as the one used to run the big an archive be done using aptly and whatever it's easier to use yeah for smaller set of packages there is RAT RAT all the things so you can rebuild all the reverse dependencies by just typing one command but you need some CPU time of course and I would like to help in making the scripts performing the rebuilds on Amazon I would like to have been making those a bit easier to use because I wanted to try them but it was confusing so I gave up and I used some other stuff and it would be nice if others could could use it easily and it's really cheap I think do you have some numbers how much does it cost to do a full rebuild of the begin on Amazon well it's zero because Amazon is offering us unlimited credits ok that helps thanks another thing I would like to mention here a nice byproduct of the rebuilds would be a report of the lintien warnings going away with the rebuild of a package for example I worked on the transition to buy binaries and I had to check if the lintien check reports no warning against the binaries after a rebuild for example some DH update could also help in packages resolving things which are checked by lintien so in the past I was running lintien after each build it just takes a lot of time to run lintien on some packages or probably multiply the total build time by 1.5 to give an idea especially because the total archive rebuild duration is limited is caused by the longest packages so everything else fits in the time it takes to rebuild you know what's the longest packages currently if it's still LibreOffice but basically you can rebuild everything else during the time it takes to rebuild LibreOffice so if you run lintien after the build of LibreOffice on LibreOffice it just adds more delay until you can file bugs but I note your offer to help with the scripts so the full story is I did not work on archive rebuilds for about a little work for about 2 years and someone else was interested in taking over that work the difficult thing is need to find the right balance between being able to file bugs on a regular basis and doing work to improve the infrastructure part so over the last year I mostly did work on filing bugs the work that was done to improve the scripts was actually never reached a stage where it could be used so I'm still using my old scripts from 3 years ago I agree it is far from perfect but yeah I just need someone to spend some time on getting them in a better shape another really nice addition would be running auto package tests on the build binaries and GM would like me to do it before we can enable byte now archive so there would be some incentive doing that but I'm not sure how far we are from this possibility you mean being able to upload arbitrary binaries to CI and have it run on those binaries I was thinking of bootstrapping CI node on Amazon's infrastructure after rebuilding the whole archive and using the rebuilt archive to run all the auto package tests on this set of packages yeah it's something that we can think about I need help to do that but yeah it should be possible something that I think is sometimes missing from these big rebuild or big QA across all the things kind of tools is having a way for individual maintainers to reproduce the problem so like if your package is like failing peer parts test or failing auto package tests in the particular environment that CI uses or failing to build an S build it's not always obvious how you can determine whether your version that you think you've fixed is actually going to pass because some of these pieces of infrastructure are like well this is some magic cluster somewhere in Amazon and it's not clear to an individual but it's like a small version of it you have something like that with peer parts that I'm not fully sure on how to run done it a couple of times so I don't mean to be like self advertising here but something I've been working on for a while that might interest people is I've been doing this build tool called Vectis which Vectis there's an ITP bug open the idea is it's it's like primarily a build tool for maintainers but it also does like QHX like peer parts and things and the idea is it creates a throw away auto package test VM it installs the infrastructure you need to do this particular test like S build to do your build or peer parts or like LXC to do an auto package test in LXC or whatever it does all the builds or testing or whatever and throws away the VM so for the things that I've added to this it's also like reasonably good executable documentation if you say I mean of how to get a particular test environment and for some packages it's quite obvious how to set this up like thank you for documenting ci.debian.net LXC so well that's that was quite easy to reproduce as like close enough but for some of them like S build I have to like reverse engineer from the puppet modules how the real S builds are set up and then go off and do the same slightly strange setup so so for the archive you build I just use the standard as build set up script there's nothing strange about what is being configured so I mean there are ways to do to configure all those tools that are super complex but also ways to configure those tools that are that simple I mean for the Ruby team scripts I think we do just the basic stuff basic default stuff and that's enough I mean right so my concern with like doing things the simple way is that's fine as long as what Debian actually does in production it's equally simple but it isn't always like the real Debian S build for a long time was not stable S build and it wasn't Jesse Backport's S build either it was like some minor fork of stable S build that's true but your starting point was reproducing the failures that are reported in bug reports so those are not necessarily from the real build most of the cases they are either from me or from the reproducible right so there is kind of a question here of if it fails in your infrastructure and it doesn't fail in like the real Debian infrastructure how critical is it really I mean it's like this fails in a reasonable situation that someone might want but release critical is that if Debian's real production infrastructure is perfectly happy with it oh do you, I don't get your point because the fact that it fails when I do archival builds doesn't mean it doesn't fail on the big infrastructure it could just be that the package that is in the archive was built 2 years ago at a time when GCC was an earlier version I mean your GCC version are a good example you can't rebuild all packages in Debian when there's a new GCC version the only way to detect issues introduced by a new GCC version is to do an out of the archive rebuild like I do right sure but what I mean is if we were to rebuild like the package that is failing on the real Debian infrastructure now it's not always clear that that would produce the same result as the rebuild in like Amazon maybe in one percent of the case usually what I do is when there's someone who cannot reproduce the failure I ask for the build log and S build is quite good providing a lot of information in the build log usually just diffing the build log using the GIF tool gives you a good idea of where it's broken there are some strange cases like all the random failures are annoying because of that but when I file bugs when I do archive rebuilds failures are automatically retried once so I don't see most of them I only see random failures that fail twice in my testing environment I don't see it as a big problem in practice it's not a big problem but usually when digging into the failure you end up finding a root cause that is a real issue it's really rare it comes from smaller memory size in the VMs you use for rebuilding I think or one single CPU where some special tests fail the other case is more frequent at some point I was using really large Amazon VMs 263 cores 266 gigabytes of RAM because that was the only CPU with the special TSX instructions enabled and that caught a few bugs and so some packages I would say a dozen maybe fail two builds when doing parallel builds on really huge number of cores so that happens but that's a real problem it means there's a dependency missing somewhere in the make file or something like that I think those cases the severity can be reduced to important or something but I would fix those but if it doesn't fail there'll be an infrastructure I usually decrease the severity a bit to give me some more time before my package gets removed I'm testing I'm not saying it's not a bug I'm saying it's not necessarily a release critical bug there are bugs and there are bugs I don't think that's our problem in the end the release team decides and I've always fine with letting the release team deciding on the severity of bugs except for auto removals if you file them are she yeah but usually if the maintainer decides to upgrade the bug and the release team agree perfectly fine with that I don't care I have nothing against those packages on the services that we provide I think there could be a couple of the checks that we actually do that we push more to the maintainer than we currently do I mean I find the CI test that we currently do as a maintainer I need to look for the results instead of being notified now I'm particularly interested in the CI test but I mean the duck check as well I mean I'm never notified the only way to see it is on the tracker and there's extremely lot of information on the tracker if you have I don't know on my page there currently there's like 50 packages showing up there's just too much information sometimes especially not about the current state because that's typically something that you could accept but what you want to be notified of regressions and I think we can always push more on these kind of issues even on the but I guess that maybe needs an opt in kind of mechanism or at least an opt out for being notified I think that standard mechanism to report problems in Debian is bugs and we should not try too much to invent new things outside of the BTS it's not that hard to do mass bug filings I mean you could for the duck it could be minor bugs or even wish list or wish list yeah but that's actually quite easy to do but I think we should that's the dangerous Enrico word we should yeah but no I think well one of the things I really would like to do at least by the next deep conf is turn the tools I use to do a mass bug filing something that is easily usable by someone else to do a mass bug filing on something else which for instance could be used for the current use of CI yeah CI could be a good target I mean I hope it's not needed for CI soon but still there's someone has strong opinions about that QA Debianog and QA SVN repository there's one so that's the deep start of what was changed since last year so there's still quite a lot of things moving but it's really scary when you look into it that we have tons of different stuff that is completely unmaintained for the last 6 or 7 years and so I don't know one question I had maybe Raphael can answer on ISE cause he is listening what are the current blockers for removing the PTS are there still some blockers for that maybe does this also cover DDPO and the thing in UDD that looks like DDPO but isn't cause those are like obvious duplicates of each other unfortunately each of them has some information that the other does not last I looked I'm not sure I'm not sure of what's in DDPO and missing in DMD I think it might just be the colour coding I think it might just be the colour coding actually DDPO is really quite information rich at a glance because it has this angry fruit salad output model Raphael said that the blocker is pebs who keeps maintaining it pebs but wise he keeps maintaining the old tracker so it's there Raphael is in favour of dropping it and Paul will probably point out a few equations that he would like to see addressed first I think we should communicate to DSA that we don't want to keep the machine as is well just just ready to stretch and delay the problem for the next two years DSA has this plan of having a container based infrastructure so we should probably think about moving each part of to its own thing in the old server so I talked a bit further with DSA about that their point is they don't want to split QADB and OGG into 10 different virtual machines but if it's not 10 but 3 or 4 probably it will be acceptable the problem with the container approach is that they have nothing ready at the moment so could take some time until that's something usable for the DPO page I found one that's the results of the builds that's not on the UDD page the results of the build so if a build failed on a specific architecture you have it on DMD I don't take a package that fails so it's not in the table but it's in the tool list at the beginning ok fair enough should we stop now or do we have maybe we could try to list some some action items we want to get done by quote someone next year so action items wants to do something for next year so one thing is the is this scheduling checks yeah it has this and scheduling checks common tools schedule checks what else people want to volunteer for simple script for rebuilding for rebuilding yeah sure oh yeah sure ok that's fine yeah if you want to work on that probably by the end of that course we should just meet and I can show you how I currently do rebuild it's clearly far from perfectly yeah ok