 Hello, welcome. Thank you for coming. This is a boff. So that's why I'm turned around facing you a lot. I'm not expecting to do all the work here. So this is about CI, auto bookish test, ci.debian.net, other kinds of related things. There's a gobby document. Thank you, Andreas, for fixing my URL. That's on IRC. I think possibly we have some people watching via live stream. I can't remember whether that's going to be Antonio or Paul or both of them. I guess we'll find out on IRC if they write back. So if somebody, could I have a volunteer to like watch IRC to check if anything comes from IRC that we should say? No, nobody wants to do that. Okay, well I will try to do. You can do that, Andreas. Thank you. So I'll just do a quick intro. I'm not involved really with ci.debian.net. I wrote auto bookish test a very long time ago. It was over a decade ago when I was working for Canonical. It's very nice to see that code finally deployed for its intended use in Debian. For my own work in Debian, I have found it a massive improvement. I can now throw my package, which has a fairly good test suite, at least the ones that do at the ci. Now I don't have to do a lot of manual checking and the automation will do all the right things. My testing migration is generally much faster. I've had at least one serious data loss regression spotted before it migrated to testing. Then I was able to work with the maintainer of my dependency to resolve that with an appropriate breaks field. That interaction went relatively well as well. I didn't have a big... There was a little about a friction, but it was not too bad. The thing is still very new because I'm quite an exciting user. I have a tendency to explore edge cases and everything. I've discovered a number of edge cases. I don't think I need to go through them in detail here. I think probably what would be most useful in this boff is if people were just to share their experiences and particularly good parts. Also, I've got a couple of questions also from Antonio suggested, which was if anybody tried to add auto package tests to their package and came across problems, we can maybe try to help resolve those or collect them as bug reports or feature requests. If you have a situation that you came across and you were able to solve, then collecting that, sharing that experience with us would be probably helpful. Yes, go ahead. Right, you put your hand up, Mike, please. Hello, so I think that auto package tests are great. So thanks a lot for that. I use it a lot and I try to add them to all of my packages and encourage my friends to use them too. However, I came across one problem and this also concerns the integration into testing migration of auto package tests. This is failures of tests which are due to unsatisfiable dependencies of the test cases. I think that these should be treated differently than tests that just fail. That is tests for which you can install all the dependencies and then they fail executing. The reason for this is that I have one particular use case in fact where I have alternative dependencies in the package because I have a package which can cooperate with various back ends. For each of these back ends, I have a test. On any particular architecture, of course, it's perfectly okay that some of these back ends just cannot be installed. I don't want my test cases to fail just for this. Hello. For those tests, I just skip this architecture in the test and install the dependencies in the test itself. It's not nice, but I think it's a reasonable midterm solution. If a special dependency is on architecture-dependent packages, I just install them in the test or I just skip the test and pass it in the test suite. The test wouldn't be triggered by the upload of those particular packages, but you wouldn't have to change anything in the infrastructure. But something relying on the architecture would be nice to see. I'm not sure. I understood. The purpose is also that, of course, I do not want to code in my test in the control file of my tests which packages are available on which architecture. This should not be hardcoded in the control file. Maybe we can discuss this later because I'm not sure I understood completely. I just proposed the workaround and it's fine that you are not happy with that. Your workaround, as I understand it, you install the test dependency in the test script. If you cannot install the dependency, now you cannot make the test fail because that was the whole point. You shouldn't make it pass because it's not passed. You need to be able to skip it. There's only very recently support being added for a test script itself to decide that the test has been skipped. Is that what you're using? Basically, yes. I can skip a package based on the architecture. I know that it's not installable there, but I still want to run the other test. Never mind, it's just a workaround. You've included the thing in the... You've removed the test dependency from the control file and put an architecture restriction in instead. I can see why that's not brilliant. It's basically a similar idea. You can check if the package is installed and if not, then you can also skip the test. I think what I'm actually implicitly suggesting accidentally is you move the dependency from the test depends and have it installed in the script with apt. That's annoying. You now need to declare a needs root because you will need to run apt. Also, you need to declare the new restriction for I might return this skip status and then you can return the skip status. I'm not sure that's a... This all feels like... I would like to suggest a different alternative, which is that you negotiate on the list to define a new test restriction which doesn't have this bug with test dependencies. Then when auto package test is fixed so that it no longer has this bug, your test will be run and you can maybe run them manually with a false option and in the meantime at least your test won't be run and won't break. There's a comment on IRC from Elbrus. He's obviously watching. You can use kippable now. Okay, thanks a lot because I was told by Paul Givas that there would be a flaky restriction which I used in my package and I just discovered three minutes ago that it didn't work, so apparently there isn't skipable now. Excellent. Thank you. Skipable defines an additional exit status for your script. I forget what it is but you can read the docs and then if you declare skipable then you can exit the special exit status to mean I know you ran me but actually the result is skip. Was that the only problem anybody had with auto package tests? So I have more issue which is not really an issue but I started adding some auto package tests a few years ago as soon as it was reasonably supported in Debian but most of my auto package tests are not have very poor coverage and are not as extensive as I would like. So far so good. Now that the migration testing is affected by the results of these tests it means that my very minimal tests make my package migrate faster to testing and in some cases I don't feel very comfortable with this. Basically I think I would like or need something that tells me where I can express I have this test it's better than nothing but please don't reduce my testing migration delay just because of the pass. But you would want it increased because of a fail? Yes. Right, that's a really nice suggestion. Have you spoken to anybody about that? I don't know what we needed to implement that. I think that would need support in multiple places. Yeah, not at all. I just noticed this last week basically I was like, actually this upload I would like it to stay a little bit longer in seed and I tried setting urgency to low and that was not enough because it seems to be overridden by the early steam magic somewhere. Yes. For now you could put a restriction in that stops your tests running in CI, Debian, Net. You just invent a restriction. If you're going to do that, you could ask on the list. You could preemptively define a new restriction that means these tests have poor coverage and therefore shouldn't... Badness is definitely bad but goodness is not necessarily very good and then if you agree on the name of that or just decide for yourself have a small conversation on the CI list and then you can define that restriction and until the required features implemented your tests wouldn't run. That would not be brilliant but it might be an improvement. I added these tests precisely because I want them to be run when my build-ups or dependencies change. Having them disabled on CI, Debian, Net I prefer the current behaviour to disabling the tests but I guess it's a poor package poor maintenance decision to make in the current set of things. Rhine. I can see that this is a needed feature. There are some comments on IRC. I've read it out. It's from Elbrus. Tribal tests can also skipable plus always exit 7.7. Thercio says if I recall correctly there was a discussion to mark some tests as trivial so they don't speed testing migration. Elbrus, there is a bug open about trivial tests and Thercio says the bug number is 904979. Should I repeat the bug number? 904979. Thanks a lot. Yeah, that's great. Okay, well I wanted to share one thing that happened to me. I had a problem. The way the tests are triggered based on your dependencies is based just on the test dependencies and the dependencies of your packages. All of those direct dependencies will trigger tests but some of my most important dependencies are indirect dependencies and I didn't want to add direct dependencies because of this situation where they might be there might be alternative versions and I didn't want to restrict what version the test ran with. So in the absence of a way to explicitly request additional test triggers in a sensible way I discovered that the test triggers field is generated by a duplicate source when you build a source package out of the dependencies of all of your tests. And if you add a new test and of course, the duplicate source doesn't know anything about restrictions so if you add a new test that has a dummy restriction meaning this is just here to modify the dependencies then its dependencies get included in your test suite triggers line and because the test has this restriction it's never run anywhere and it costs no additional resources and this is a bit of a bodge but the mailing list seemed to agree that this was probably the least bad bodge for the situation as it is. So I think maybe the patch to merge this new only for dependencies restriction has been merged maybe not but that's definitely a technique. So if you have additional dependencies that are indirect that you want to trigger your tests on you can add this new just add a special test which has all those things as all those dependencies as test dependencies and then a restriction saying don't actually run it. Maybe it's a RTFM question but can you add let's say triggers on source packages this way? No, SRC Linux. Random example. No, I don't think so. That's an interesting question. You could try adding a dependency on Linux image AMD64 though which is always updated, I think. Anyone got anything else? Are there restrictions on who can ask failed tests to restart? Not really, at least. I think you may need your Debian SSO to get an API key. Okay, I've seen significant friction from the auto package tests because tests are flaky and somebody who has an interest in a package may not have upload authority to trigger a rebuild and to trigger retests and uploading new ones to do it is kind of a pain and there's a happy retest button right there but they've overly restricted it, in my opinion. Right, I wanted to make sure that Debian didn't make the same mistake. I have not seen any complaints about somebody who wanted to do a retest and was not able to either push the button themselves or find somebody to push that button. I guess if that starts to be a problem we might have to think about that. Another comment from ILC from Elbrus. I forgot to file a patch for his botch so he's not merged. I don't know what that means. That means I have a to-do list item. Sorry. Another question for myself. I've heard there is a hook for P-Build or to-run the auto-pick-up cheaters. I haven't checked yet. Is it true and should be a bit more visible? Because I think it's a good idea if you build the package to run the auto-pick-up cheaters in this change route. Somebody gave me the mic as if I know the answer to this question. I don't use P-Builder. Why would I use P-Builder? Yeah, whatever the builder you are using. I don't think S-Build has such a hook. I think that would be really annoying to me. Many of my tests are very time-consuming. Also, in the package I have that has the best tests, I have a way to run the tests on the... you know, out of my git tree directly without needing auto-pick test. This is a really good thing to do this. It makes your life a lot easier because you end up occasionally debugging in auto-pick-up cheaters formally weird auto-pick-up cheaters problems about how the environment changes. Most of the time you can just run the script and it runs on your working tree and you can use the same test for dev testing as you do formally. As for running it in P-Builder, is that not just a waste of time? Why would you run that on your laptop when we've got a perfectly good server out there in the cloud and all the stuff? We're not now into the stage of uploading stuff to the archive, crap to the archive and letting it maybe migrate to testing with bugs in because you get a migration delay, you can send it to ci.debion.net and ci.debion.net will run the test for you. I would prefer to run it in P-Builder because I will notice immediately if the test is broken and will not upload a buggy package. Your test must be much faster than mine if you say immediately. Well, I have tests which are fast, others are not. But if it's a hook, you can disable it. I don't know, have you read the manual? I should. I just heard it. Has build has to brought to run of the package test. Great, great. So the answer is switch to S-Build and it has it. Because we have someone here who will read the S-Build manual. That's kind of confusing, right? Because you run... This is very amusing, right? So you run S-Build and S-Build runs alter package test and then alter package test probably runs S-Build. Probably just runs S-Chirw, doesn't it? It does the build again, though. You are now having S-Build does the source package build to binary twice. Great. I see Tercio and Albus confirm that both have the option people run S-Build. You just need to check the examples. I need to check them. I guess that's the advantage of an informal boff. We can ask staffed questions. Anybody more want to ask RTFM questions? No, no, but it's much faster. You're only wasting the time of like 10 people or 30 seconds. That's totally fine. Well, I have sometimes the problem that autodiscovery discovers a test in my Python package, which for some reason doesn't run. Can I switch that off? The autotest thing, autodebtest, I think that's called. Yeah, probably. I don't know. I'm pretty sure there's a way to switch that off. I think if you just provide your own what should I put there? Well, the working tests. That's not always trivial. Well, I mean, if you don't know which tests work and which don't, then that's, I mean, I don't know how the computer would know. No, sometimes they're just broken somehow. What, all the tests or just a few? The tests. Well, they're meant to run somewhere on the author's computer with some environment which we don't have and I just don't want to run them. You don't want to run any tests at all? No. I think you... Especially while there are trivial packages and... I don't know what happens if you provide an empty Debian test control. Maybe IRC knows. I don't know. That would be a thing I would try. If that didn't work, I would provide a test control with one test with a restriction that says this is not a test. And then I think auto, the auto test generator thing does not... Okay. If you have an explicit set of tests, it won't generate its own. I remember a thread on the list about somebody who wanted to add a test in addition to the automatically generated ones and that requires some kind of... That requires some kind of complexity where you merge the two things together. So, comment from IRC. CI.debian.net has a wide list of Python packages that are supposed to work with the automatically generated tests. Elbrus-Britney doesn't use that. That's right. So, not used for migration. Elbrus let us know we want the wide list to be empty. I don't think I understand that. Me neither. Please don't add the empty things, says Elbrus. Okay. So, my suggestion is wrong. So, I don't know why you would want the white list to be empty. That seems backwards. Maybe that's a white list of... Maybe the white list should become empty by having all of those packages declare the tests explicitly? Well, hopefully IRC will answer this question. Yeah, maybe. It's not so fast in typing. Or are you sleeping in European sleeping? It's a very bad time of day at the moment. I think the answer is you'll have to ask on the list. White list is unsupportable. This is answer. Right. That's great. I mean, I'm... Right. So, Tercario says the white list should be empty because every package should declare their tests. Right. So, you should declare your tests. But if there are no tests... But if there are no tests... If I don't have a test, I can't declare a test. Right. So, we just told don't add the empty thing. But how do you declare that your package has no tests if the auto thing is adding tests? Usually, I would think that the auto tests shouldn't directly go there, but shouldn't, if they succeed, that lead to a visual spark, which means please add this test. In principle, you can generate the control file for the test. Yes. We're told that the people who are answering this question have a delay in their live streams. So, maybe we will wait. Right. Elbrus says, yes, declare your tests by creating a test control or a Debian test control.autodepeit. I don't know what a control.autodepeit is. I'm not really sure what it is, but I learned if you have... From the language, for instance, in R, you have autodepeit.getTestR, something. It's automatically done. And if you have an additional test control file, Lyntheon says you have... Once you have this autodepeit.getTest for the language, and then you can add this autodepeit test. It's more specific than, for instance, R, just load the module and if this works, the test is parsing. And then you can add some additional tests which do more intense testing. At least it's my understanding of this. Does anyone... Elbrus is explaining this whitelist to us on IRC. I'm not sure R. Tercario says, if the package is being run but shouldn't, it's because it was incorrectly added to the whitelist. You can ask it to be removed. Right. This makes sense now. As I understand it, this automatically generated tests are only run for your package if they're on this whitelist which is maintained in the infrastructure which they want to be emptied because everybody should declare their tests. So certainly don't add a thing to override the whitelist because that's silly. Because... Okay, makes sense. Right. I think we're there now. So I asked earlier if anybody from the Salsa CI team is here. Apparently not. But is anybody of you using Salsa to run out the package test on every committee? So I want to do this. My own thing is... So the DGIT package has all these tests which I run on my laptop whenever you commit. It's very tiresome. Where's my battery out? And I'm told that the GitLab CI, if I can do the right metadata and the right little script or something that this is possible. And I spoke to someone in a bar, I think, who told me, oh yeah, that's totally possible. We should have an example for how to do that. Okay, because I'm doing it now. During this talk, I read how to do it and I started it now. Okay, when you're done and you know what you're doing, can you post the... The list is a good place to post. Saying, just like pointers to the things you did and how it works. I'm going to make a merge request to improve the documentation on the Debian CI team and then we're going to post a link to that merge request. That would be awesome. Just follow up question to that. Is that the same as auto package test or is that independent of that? I think it's independent, isn't it? Well, so if you have your tests in kind of auto package test-y way, then quite likely they will run relatively well in this infrastructure either via auto package test with the null-vert server, which just runs in here, or possibly there is a way to, you know, run it directly. Um, so often often in fact what an auto package test script is doing is running some existing test case in the package and you just need to hit on the head the thing in the auto package test that makes it, use the installed version to make it use the the entry version instead and then you have a test that can be in a nice format that you can iterate over. Maybe I, well, when I upload something to Salsa then it's in the source format only and before I can run any auto package test I principally need to build that one, but during the build time already in principle usually have the same test just on the build image so it would make more sense to have something on Salsa which builds the package when I get something instead of while running some specific test. Yes, maybe. I think that depends on your package. My package doesn't need to build around the test but obviously if your package has well, it seems to be about 50-50 whether a package needs, because there's a needs build thing you can declare to say whether you need that and many packages, script packages often don't need to declare that and if you have compiled code usually you do. Whether plumbing in your auto-pickage test into Salsa is the best way of getting testing on Salsa I think will depend. One of the advantages of the documentation patch we're about to get that's great is that that will give you because you know how auto-pickage test works already probably and at least that's like a standard thing and not package-specific if we have documentation that says how to wire your auto-pickage tests into your GitLab CI then you can infer from that how to add all sorts of other tests because it's not going to have too much package-specific stuff in it so I think this will be a good documentation example even if not necessarily the way you want to do it for your package you can just throw away the part that runs auto-pickage test and replace it with other thing. There are some more comments in Wickey. Otto, are you using are you reading the IRC list? Who? Otto? Are you reading IRC? Because there's a hint for you you should add the wiki in copy and the wiki is linked to copy for your test I don't know if it's answering a question but I'm really sure if you understand as well. This IRG channel is stepconf18-set but maybe you'll write simply an email and ask for clarification and Albus also said try to avoid needs builds and try to only build the tests in answering what? I think that was just in I think Albus was just prompted to say that by the discussion that needs build is usually needed I think Albus' concern is that depending on the package actually building it may be very time consuming and waste the CI's time so you should only do that if needed so they're concerned in particular they've added the link to the auto-pickage test best practices wiki page to the gobby document I'll paste that into IRC as well and yes if anybody has best practices and stuff that doesn't necessarily fit in the docs or maybe even if it does then add it to that wiki page. Anyone else? I could raise something if you would like I'm sorry I missed the other session so please tell me if this was already discussed so for context I'm maintaining this auto-pickage test stuff for Inibuntu and we run it a bit differently to how we run it in Debian so inibuntu we're blocking on regressions instead of Debian where you're adding a delay to testing migration we're actually permanently blocking until regressions are fixed or overridden by the release team but the problem that we have there is that we don't really manage in a lot of cases to pin the regression on the actual change in the archive that made the regression take place so we manage we run the test for the package itself when it's uploaded and test for anything which could break it so the dependency change test so anything that depends on you you could break them so we run your tests when you upload the package though libraries their consumers are tested when they upgrade them but this means that you only catch first level dependency breaks when they're introduced but we miss second level dependency dependency change test breaks and we miss basis and dependency test breaks that kind of thing I don't know if you guys are seeing the same kind of thing in Debian or if anyone has any clever ideas about how we can improve this situation because it's quite painful for us and in fact we've ended up training our developers to try to instead of fixing a test trying to instead say that a particular regression is in fact misidentified and therefore the test regression should be skipped and ignored which means that you end up kicking the can down the road forever because we didn't identify the regression it slipped into the release so now nobody is on the hook to fix this regression so this is a problem of attribution of the regression to the right place and we're not in the correct place there and I guess the same problem exists in Debian so I'm wondering if anyone has any thoughts there so in Debian because we only have this it's just adding a bit of delay and I think we have this problem in it much less severely so the problem still exists in theory though the problem still exists in theory there's a comment on IRC about this Elbrus says the intent is that it's going to go to blocking that's not where we are right now but that's definitely the intent but right now we don't have this problem where you don't if you impose a testing migration delay you don't necessarily impose that on the person indeed, indeed, absolutely that is true in my day job I also maintain a CI system and some of the same principles apply there so one of the things that the Zen project CI sometimes does is to re-test the supposedly good baseline and that can help you discover that a thing is if the thing is in fact a regression that it's not a regression in the thing you thought it was because now you re-test now you re-test the supposedly broken thing with the supposedly good thing and now it's broken now it's broken, right I did think about this in principle what you could do would be to take the whole testing migration thing and do it quite differently and you could take whole batches of packages like a whole night's update and if it fails you test all of the tests of all of the packages even the ones you're not updating and then when it fails you could start bisecting the upload batch and the result of that with our current infrastructure and probably with your current infrastructure as well this would mean you'd be able to do one migration of some packages a week because you'd find some set that would pass you could maybe prefix that with the normal kind of more ad hoc approach that both of us currently have and that might mean that actually the full test run would pass and then then if a regression did pass through the first stage and cause massive delays in the second stage you'd at least you'd eventually know who to blame and what test ought to be souped up so that it didn't happen again I think that would be the only theoretically sound way to solve this problem it would require massive amount of compute power exactly you could do that by running only tests that have reasonable run times or something like that to cut that down and that might still allow you to spot more of these it would maybe be good to have that option in auto package test directly because often I have that case that I get a failure and it's not in the primary dependencies and it's then hard for me to find out what else could have been changed to get a test and for that automatically bisecting would be great locally yes so that's a really good suggestion so we already have I.debian.net publishes what are alleged to be the set of packages installed when the dependency ran but I was told I think on IRC when I was discussing something trying to debug some test that this was not necessarily accurate my understanding of that list that you see there is that it's the list of the packages in the base image that were installed and not the things that your test installed on top of that right so if we made a we running out of time are we 10 minutes yeah we should be starting to wrap up but I think I think what that would be would be a feature in auto package test to print out the list of packages installed at the end of the test maybe and then you would be able to diff that and you could probably have some kind of robot that would somehow bisect that a wrapper around auto package test that would be cool I think it's a small a diffish thing would be quite nice so because then you could use that for some kind of bisection purposes I suppose the problem sometimes it happens is that it's been a very long time since the two test runs have happened regression has happened somewhere in the middle of that and intermediate states are not all co-installable so that's more stuff that's interesting from work so at work we have a cross tree bisector that can bisect changes that occurred in multiple different trees multiple different VCSs I think what you need to do in order to get this to work is you take the first package set and the second package set and then you try to break it down into you have to have some kind of algorithm that would turn that into a set of into a graph of possible updates, ideally a line but some other kind of graph would be okay where each individual update is as small as possible I think we have tools that might be able to do that I can't think of exactly what the algorithm would look like but I don't think it's probably that awful if you take unupdated packages one at a time you just have some kind of opposite of greedy algorithm look at ISE I'm told why don't you read we run tests every week this was his comment I don't know the question is if you can construct that graph then you could make a little stunt get branch out of it and run get bisect even if you could construct it after the fact maybe you could construct it in time as well that would be good the thing is you wouldn't know which test the problem is the regression has already gone in the archive you could use this procedure to catch the regressions as they occur so it probably ends up being I haven't thought about it very much but it probably ends up being run the entire reverse dependency tree for the whole package for every upload which maybe becomes a computationally infeasible task right but one thing you could do and a thing that we do in Zen is we automatically all regressions are automatically bisected and sometimes sometimes that automatic bisection doesn't produce a conclusive answer in which case it goes to some list that nobody reads and if it produces a conclusive answer then it produces a mail to list that everybody reads like a thing from the bisector saying this commit here you bad people and you could do exactly this automatic bisection procedure every time you detect a regression and then at the very least when you when you had one of these regressions slipped into the release you would find out after maybe a few days when the bisector has finally managed to pin it down which package it was and you would also discover that the update you were blocking was fine and you could let that in and so nobody would have to be bothered with it. In policy terms what do you think the action to take on the introducer of the regression that you found is just filing an RC bug on their package or something? In Debian I think you would do the same thing to their package that you would for a normal auto package test failure which I think we haven't really quite agreed but I think a temporary RC bug to do that thing until they fix the bug I guess and the other thing is you can take that as a cue to improve the test so that the next time a similar regression occurs it will be caught before it goes in that's normally possible like for example to add the relevant dependency as a needs as a test dependency somewhere I think we're getting quite close to out of time normally I think we were supposed to finish two minutes ago no two minutes from now we're told on IRC the stream has cut off so isn't it? it's not meant to be? maybe it's just like one of the people watching so unless anybody has any final thoughts ok well thank you all very much for coming I hope we've captured most of this in the gobby notes we'll probably post those to the Debian CI list I think is probably the right thing to do and I'm happy to talk to launchpad people outside I think will probably be helpful