 Mynd i'n oed bwysig yw dyfodol ar y flol. Fyfodol o ba gwybwysgwch gyda gwybwysgwch ar y llyfr�lu ammwy o'r ples. Felly fy ranges, ei gyfwilio yn 05 wedi ffrynnu, Cyn widelywyd ei gwybwysgwch ag yn oed gyflymu ac mae'r ddechrau. Felly ond idd yn gydych chi gynnig ar gweithio ni'n gweithi'r lleol. Felly rwy'n gweithio gyda chi roeddiodol o'r pobl hynny a'r lleol. Mae'r content oherwydd yn dweud bod yn gallu gwybod o eth. So mae'n meddwl yn meddwl a'n meddwl yn meddwl. Yn cefnodd yn meddwl ar ddi gdyb. A返 o'r meddwl i'r meddwl am dwyleddodol i'r meddwl a'u meddwl yn meddwl a'r meddwl a'r meddwl i'r meddwl am y meddwl yng nghymru. Mor meddwl yw y meddwl. A yw'r startfyrdd, mae'n darparu. Mae'n edrych yn gwbod at gyda'r gyda'r continue integration, installation, removal, upgrade, reinstallation, testing, all that kind of stuff. That's a very useful tool if it concentrates on one package at a time. CIW and.net concentrates on one package at a time but runs the unit tests or runs a test suite for the installed package. It's, there's some overlap there. Some of the packages have to run the upstream test suite which is actually testing the code in the source, not necessarily the installed package. That's because the idea of testing just the installed package is something familiar to how a lot of the upstream unit tests are actually created. So the support isn't necessarily there to do it against a different path. Another big part of our validation is the archive-wide rebuilds. We can do more with that. We're getting faster and faster machines for multiple architectures so we can do archive rebuilds on different architectures in the future. So each of those concentrates on one package at a time. Lava is the new part of that equation and what we're hoping with lava it can already test kernels and bootloaders and it can start to test the actual system. So you're starting to test multiple sets of packages against other sets of packages. You can do it across a whole range of different nodes as well. So the full details of lava will be in the talk at the end of today, so I should get a full understanding of what lava can do. But think of it as a test environment initially for kernels and bootloaders but then escalate up through the system to do testing across multiple nodes and allowing you to actually control the data being exchanged between different machines and its cross architecture. So there's a whole range of things you can actually do there. But once we've actually gone through some of those methods we can talk more about what we're trying to what kind of targets we want. We still want individual packages to be validated and we need the continuous integration loops back for the maintainers. Individual packages is where most of this stuff comes in. But we also need to make sure that we are able to test sets of packages. Sets of packages are either dependencies, which is somewhat what Pyro parts can do and again CIdebin.net can do, but also packages that are related by something that's more of a user space thing. But people tend to use this too. We'll also tend to have this tool installed. There's no dependency relation, but do they work well together? Can we actually test that these things side by side? When you've got forks of the same source code or the original original the same source code can you install them side by side? Can you build things against one and against the other? Can you switch between them? And can you actually compare them? Those kinds of tasks. So the microphone's come up. Hey, there we go. That's nice and loud. Yeah, so Antoni is out of the comment there about CIdebin.net can actually help with collections of related packages. So installer and bootstraps. The installer testing is quite manual at the moment. Steve does a lot of testing with the CD images. Kibi does a lot of testing for his own DIY installer bills. We actually have the DIY bills there that people could test. Bootstrapping. We are regularly bootstrapping a new architecture. We're putting new tools in that can actually help and actually develop a way of bootstrapping. So we need to validate whether we can actually bootstrap these things. And it's not necessarily a case of waiting until the next new architecture string turns up in d-package. It's can we pretend that there's a new architecture called Fubo and can we build for it? So can we simulate these things? Can we actually go through the logical flow of the tools that we've got? And show that they do actually work? Can we identify the dependency loops in advance independent of which architecture we're looking at at the time? Kernels and bootloaders. Well, yeah, there's lots of different kernel bills that all need to be validated. They all need some kind of proof that they do actually boot. Now a lot of that will happen quite easily on x86. What are we doing for the other architectures? Are we even testing a lot of the kernels that we've actually got up there? And bootloaders. We're probably only testing grub on x86 with the standard Linux kernel. What can we do to extend that? And then something that comes up again and again and again from the release team point of view is, yeah, that's all for well and good, but you're starting with a freshly installed system. You've got no user data on it. You're not messing around with it and changing the configuration of stuff. How do we validate a system that's closer to what our users have actually got when they come to upgrade from one stable release to the next? Can we anonymise dirty systems in a suitable... Can we get people to trust us to give us their systems anonymised and then run them through these kinds of... Keep it there as a rolling dirty system. Can we develop tools that will artificially create changes of configuration files for LDAP and other complex tools? Can we artificially break someone's XM or post-fix config and see whether the other tools survive? These kinds of aggressive testing. Can we do that? And then all this validation is not much use if it doesn't come back to the developers. We need to integrate it. We need to allow the developers to get proper notification of what's going on. And how do we do that? How often do we test? Generally, can we come down to how much hardware can we throw at it? And how quickly can we push through the tests? How exhaustive do you want the tests to be? The more we get this problem in lava a lot, a lot of the Android people put in tests that run for 24 hours and then they put in two jobs in one day. That doesn't work. If you're submitting jobs regularly, you've got to at least balance the length of time it will take to run the test or the time between the submissions or add more hardware. It's relatively easy for other places to just throw money at the problem. It's a bit harder for Devin. We might not have that ability to add yet more hardware. We made it. And how do we get the developers to be aware? It's not necessarily enough to have an RSS feed or a pointer in the PTS that says something in your package is broken. It's not necessarily enough just to tell the maintainer this package is broken. The maintainer is going to need much like a upstream developer. They're going to need context. They're going to need to know, well, why is it broken? Where's the end of the test log that you did? What kind of breakage is it? Is it something that I can fix easily with my next commit because I knew what and it went wrong? Or is it something that actually, oh, that's a real problem that's going to have to go up to someone else? And once we start getting to the notifications part, how do we stop overloading maintainers with four emails a day saying you've got critical failures in this package and that package and this package and that package? So let's start thinking about where we are with the tools as they stand. Or what do people want from these validation methods? Are there other tools we're not using yet? Or are there other kinds of validation that we need to think about? Who's got some ideas on what's missing in their mean at the moment? You have a microphone? And please, someone make sure that these comments go into the Gobi document. So for the moment, what we are doing is that we are running this kind of test after the packages are uploaded. And I believe that the good approach to do what really a CI is is to validate the packages against these tests before it gets into the archive. I think that would be the first thing that we should work on to have this kind of infrastructure in the FTP master in DAC or something. Yeah. Well, yeah, there's the idea that unstable shouldn't be necessarily a free-for-all for absolutely everybody to break every package, that if there's an auto package test in the package when it is uploaded, then why not have a short delay? There's a delay anyway, because I've just uploaded a new version of a package. It's not going to show up on the mirrors until tomorrow. Not only auto packages, pure parts and everything. The whole set? Yeah. That gets back into the... Even if it's not rejecting packages, at least one developer at the time of the upload. Yeah. You get into the time delay there, because pure parts in particular can take quite a long time to run. But yeah. We shouldn't necessarily be waiting for these systems, whichever one they are, to pull the archive and say, tell me what's happened since I last tested, but actually for the upload to poke the CI and say, something's happened. Cue this. Yeah. We could also try... I'm sure it's working with it. Is it? My profile? Cool. Yeah, okay. Yeah. That's it. The other thing that could be interesting is that you'll be able to trigger a test from... For example, you're using a code review tool. Like if you plug your grid on CI. I don't know of any teams that that's code review, but that's something that probably would be interesting to explore too. Antoni, any comments on whether we can have something like LavaTool, something like the XMLRPCs, for a maintainer to poke a test into CI? Yes. So, in CI.NET documentation, it is documented how to run the test locally. So, it's DevCt. You can even start the test on a stable and use its helper scripts to create a test batch locally. It's just one common, and then it can run really easily. It can integrate in your workflow. I do that. Every time every package has a test, I run the test before uploading. The limitation with that, Antoni, is that then if it's a third party, like I feel, who has run in the test, and it's my package, well, the data about the failures on his computer, not on ci.dev.net or on mine. So, it would be nice to be able to, for someone to just poke ci.dev.net, if there's enough resources available to say, put this into the queue for a normal test that is then visible to everybody. Okay, so it's like making possible to users to provide, like, I run this test on my machine and the result was this. Or just to add it to the queue. Just like LavaTool does with XML RPC, can we do something that actually does the authentication to allow someone with a GPG signature or whatever that just says this source package, put it into your queue to be run as soon as you can. You mean an arbitrary source package from anywhere? From the archive. It would need to be a package from the archive. If it's in an archive, it will run it anyway, as soon as it can. My point was that there's a lot of value of enabling all our test infrastructures to run against packages not in the archive, or versions of packages that are not in the archive. Another case where this would be useful is mentors.dev.net. Because if you have a new package, and you see that it validates against all tests that we have, you are much more likely to dig into sponsoring it than if it doesn't. How long does the ci.dev.net run take for the entire archive now? Antonio? Three days. So we may well need to throw more hardware. Yeah, that's a hardware issue. We can just throw hardware at that and get the time down. So I have two comments to make. One is on this stuff. I don't know how close PPAs are, but if we get them it would be really cool if... This is supposed to have an automated step where you can say, put these packages in unstable now, right? So it would be really cool if at this step or before this step you can say, please run these tests on my PPA packages, and then only if these are okay, then do the copy into unstable. My second comment is about what you're talking about dirty environments. All of these tests, especially I'm thinking the ci auto package tests, these run in like basically a churrut, which is not actually a very realistic environment. So even one step before dirty environments, we could run these things in clean desktop systems maybe. So take a desktop ISO, boot it, and then run your tests inside that. And then at least you have things like login D and the proper init system and all this stuff. I think it's an idea that it's not full systems that have been used by user for six months. The difficulty with using a desktop like that is that a lot of the times tools that are useful for a desktop user are difficult to automate and script. So if you've got something which would cause a password prompt or require some kind of clicking access rather than actually command line, then it's harder to script that inside the desktop OS. So then you start writing wrappers and then you're actually testing the wrappers, not the actual software. So there's difficult issues around that because that's what you start to think more about system testing and you're starting to think, well, okay, well now we need a graphics capture card and we need to be able to compare screenshots from static screenshots against the next screenshots. I just want to say something quickly. So we have, in Ubuntu, we have a tool called Autopilot which can do some of this kind of stuff, which is like... So what's the GTN queue currently? And you can say things like, open the application, click this button, and then check that the application is displaying this thing and do this kind of stuff. But a lot of... So you quickly get into problems where you're having some mock parts of the system because you want to be able to compare the outputs and you don't want it to depend on the real hardware and things like that. So this is where there are problems, but there are tools which can do some of this. I'm just saying it might be a step up from a true route, but it does introduce its own difficulties. It's true. So who wants to see something more than that for Devin? So... Can you make sure it's getting into the copy document as well? Please, someone, make sure all these comments are getting in. One thing that I would love to have in Devin is a kind of battery review system, probably Garrett, because that's the most advanced in terms of feature. If there is something better, easier to maintain, because Garrett is hard to maintain in production, I wouldn't be against it, but I only know Garrett. So Garrett is tied to Git, so we could use DGit, for example, and then have a representation of all our packages in Debian on Garrett, so that anyone could propose a patch against a package. Once the patch is sent, then we would run all these individual tests. So like rebuilding, pure parts, adequate or whatever you want, auto package test and whatever, and only when these are passed and approved, then the... It gets a plus point in Garrett. Yes, it gets a plus one, and then after any DD, for example, could be considered as a core developer and then approve patches if the maintainer of the package agrees with that, I would... I'm already on the low threshold enemy list, but I would love to have that and say that any DD can approve patches. So of course it would need a lot of CPU and infrastructure, though we have support from HP. HP has HP Cloud, which they already use this way for OpenStack. I'm sure that we would get a lot of support from them. So if somebody wants to implement it, then we would have it. So that's the first thing. Yes, it's the same comment on the first section. We have been a few people at the cheese and wine party who are already working on a prototype of what you're suggesting. So it's currently sitting in the DSA department, like we are requesting for the according resources, but we have an idea or a vision of getting the Garrett integration with what we exactly... or what you proposed. Does it work with git build package? I mean it's completely unrelated to how you build it then. The idea is that you can bring new developers also into Debian for proposing stuff. And the project owners, the ones with the coding permissions on the repo can then decide what to do about that. But the idea is, of course, that it works with git build package and whatever. I just wanted to make the point that the best way to start experimenting with Garrett is to convert an existing team to using it. You don't need to have a full project-wide approval to do that. Just start with a team you are involved with these people to work on that. Let's go ahead. Ask the service to be set up and then start with your team. Yeah. Yeah, so I don't think I'd have the time to do it myself, but anyway, just... But the idea is out there, hopefully it's in the notes and then someone else can pick it up. Another thing that I think would be really helpful is that when I build a package, it may break others, but I don't know because I don't have enough time to check for it. For example, I do a Baishan module update and then so many other Baishan modules will depend on it. Are you talking about direct or indirect dependencies or are you talking about completely updated software? We're talking about build dependencies. Yeah, so dependencies is where CIdebin.net comes in because it will trigger a test when any of the dependencies of the package changes. So you don't just get tests when you change the package, you get tests run when another package in your chain is updated. Yes, but it should be done the other way. It's like when I want to update a package which has reverse build dependencies, it should try to rebuild these other packages before my upload is accepted. Like that, I could fix the reverse build dependencies before uploading my package. Are you talking about here about migrations to new libraries and things? Yeah, okay. That would work well with CI libraries, but it shouldn't be limited to that. I'm thinking about the idea of rebuilding it. I'm not sure why you'd rebuild all of the dependency chain. Because maybe I'm too much focused on Python because I've been doing that for a long time, over the last two years. But the reverse build dependencies, most of the time you have unit tests, so it would fail to build if there's a problem. Yeah, but that's what CIdebin.net is currently doing. It's running the unit tests. I see it's to trigger the unit tests of the actual dependencies as well. But isn't that what happens, Antoni? Is that if a package fairly low down, say in the Python two-chain changes, then all of the packages that depend on that will be tested. Yeah, so if anything in the dependency chain for your package changes, then your package gets tested again. Yeah, so the lower down that change happens, the more tests get added to the queue. But currently it happens after the upload, right? Okay. What would be nice is to have it before the upload to make sure I don't break anything. Yeah, well that ties in again with the request for mentors that have been not net and other changes along the way. Just follow up on that, so that's why we have experimental. So as soon as we have enough CPU power, we can add experimental to CI and then run everything on experimental. So if you have a package and you're not sure if it's going to break anything, you can upload it to experimental and then let the infrastructure run the test for you. That's not a satisfying answer because I don't know if I'm breaking things. I want to know by the test that I made it and I don't want to test it myself. Yeah, if you know what I mean. Well, if you're testing unproached changes or changes that have been uploaded, you need to do that locally. You can't expect the whole thing to actually kick off from there. I know I could do everything myself, but I just don't have time and there's too much work and it has to be automated to make sure it works always. Well, again, it's our bias within Demian. We drift towards testing individual packages all the time because that's where we focus our work and our input. And I want to get a feel for what we think we can do to test combinations of packages. What kind of combinations can we come up with? Are we testing an MTA chain? And we'd sort of test that from the ground all the way up. Not just the actual dependencies, but can we actually test that the system itself is working beyond the scope of unit tests? Unit tests are, again, inside one individual package. Well, can you test that the actual operation of the chain is working? What kinds of collections of packages can we actually come up with and think, well, yeah, that needs a test because that's the kind of thing that Lava can actually do. You can actually set up a system with lots of different packages installed, your own configuration, however you want to configure it, any way you can do that via scripting. You set that up, start running and then throw some data at it. Now, what kind of stuff is that going to help us to validate and test and get feedback back to developers? So, as a reference point, HPE and H Linux team has put together a test frame where we pull in specific packages related to supporting HPE Cloud and then we test against that specific set of packages for all sorts of challenges, conflict between packages, missing dependencies, proper operation in a Cloud environment. So, we actually do that more to holistic level than a package level because we know that Debian proper does that, that focus on the individual package. So, if you assume that individual package is going to somewhat properly operate, then you put it in a collaborative operation and see correct results, then you make the assessment, right? Yeah. The other thing you might need to do is to start tracking performance figures. So, on the same system that shouldn't necessarily get any faster or slower is your software actually performing faster or slower. Now, again, that's the kind of stuff that Lava can do certainly in his current state is doing that for kernels and bootloaders and other sort of lower level stuff like that. We're looking at escalating up from there. So, one of the things I'm looking for from feedback here is what kinds of things can we do on performance testing, not just perf and other things that are kernel-related or... But little groups of packages that are known to be problematic and can we track those and see what we actually need to improve? So, one data point I know so, Tails, the live distribution is doing black box testing using a mixture of Ruby, Cucumber and scripting SQL-ly which is a framework when you can actually... It is basically OCR from the screen and then you can say, click that button with that thing or do you have this? But I... So, one thing we have in DBN is actually meta packages for tasks and I believe we can... That's the right... Auto... It seemed to me that auto package tests was actually flexible enough so we could put tests for... like the desktop as part of the task desktop GNOME meta package, for example and so we have all the framework we need to actually reuse here and all if we wanted to script an actual testing with running a virtual X and trying to figure out if the desktop is... It's basically running Okay, so what about how we get these results back to the developers? How do we want to be notified when these things break? What do we actually... What information, what context do you need as maintainers that says this test whether it's package-related or whether it's actually just system-related? What information do you need to be able to work on that bug without having to necessarily go all the way up to the top level CIDemin.net or whatever else it actually is PyU parts log or go up to the archive rebuild log or go up to a lava log All these different places there's lots of data up there but how much data do people need to be able to start work on this without that resource and to free that and get the information directly into the hands of the developers Is it enough just to leave it on a random website or do we actually want notifications? Of course this is my personal view but I started adding auto-package tests I have no idea what the current status is because it's not where I'm looking for so unless I get an email about any status change it's basically non-existent for me So I'm not afraid of overloading I guess we all can manage email well we have to otherwise we wouldn't be well still alive I guess in this project So I would be very happy to receive lots of emails about any status changes about any PyU parts or CIO or whatever things you have I like to do for the auto rebuilds like the archive rebuilds I get a bug report that's useful For that I would wish that they would also close the bug reports automatically if it gets built again because sometimes it's not a problem in the packet itself but rather something else I've seen CIwdebin.net actually marking it as a temp fail when one of the dependencies doesn't install properly and it says I couldn't actually get the system up to the point of installing your package I'm a little bit more hesitant about automatically closing both just because a temporary bug went away because the next upload is going to come back There are packages I maintain one where the test suite deliberately feeds in randomised data well sorry but every once in a while you're going to find that you get a bug and then you can't reproduce it it's a pain with that particular package but I can see why they do it more often than not it doesn't cause any problems but every once in a while I never have to go back to a portal box and actually see what's going on on that particular arch I probably need to patch the library to fix the seed It seems like you should be able to say don't generate some random data but use the one you use for this run Yeah, I know I'd like the time to do that but it doesn't happen often enough Just a clarification about archive rebuilds the bug filing is manual there are tools that help writing the bug report but still it's manual so I don't think it makes sense to automatically close the bug later however there are other scripts that takes care of diffing the list of current failures with a list of open bugs so it's easy to go through it and read the bug and close it if it doesn't apply anymore I'm not sure I'm not doing archive rebuilds anymore I'm not sure David Thurres who took over has been running it recently Actually on that point when the archive rebuild manual bugs are filed I know there's a fair amount of manual checking to make sure that the bug actually exists that it tested the right version and things are going on like that so that raises the issue of how reliable are the tools we're using and how much do we actually need to manually check these results before we notify the maintainers so keep this in mind as well these tools are only as good as the code that we write and they will let us down from time to time and you'll get false positives or worry, even false negatives So yeah so having just done my first ever mass bug filing and I got a couple of complaints going Oi! doesn't even apply to my package you're an idiot and there's a new version you didn't check that either I suspect we have an awful lot of tools lying about that people have hacked up for all the various purposes and we probably haven't done a very good job of collecting them to save us all writing more crap tools instead of some slightly better tools I know Docko has some and so on and I've just been comparing notes with the other porters and discovering that we've all got different manuscripts so there's definitely some room for, as you say, some library of stuff for doing standard checks on mass bug filings would be a jolly helpful thing for me right now especially if it was in Pearl it probably isn't and yeah personally it depends very much on the frequency of emails you know if it's going to send you too many they'll just get ignored they'll get a lot of boring email from computers telling me that something's broken and I ignore a lot of them so I like stuff in the PTS personally if it's on the page in the thing that kind of collects everything together have one email hand a notification in the PTS so that you're prompted to look at the PTS I tend to look at the PTS anyway to just check general status what is currently broken in my package how much whinging have I got from various Debian tools which run I mean personally I like the DDPO summary because that has got columns for CI, Dev and Notnet and you can review all your packages on one screen and you can see all the information so you say like archovary builds if you're going to get a mail once a year that's fine if it's going to be every couple of weeks that's probably not fine or if it's like with CI Dev and Notnet if your package fails because of something wrong in your package and then another package is uploaded for completely several reasons that triggers another run then if your package hasn't been updated and fails exactly the same way you probably don't want another email so that's the more difficult thing to do yeah the general question of edge triggered notifications versus state level triggered notifications and do you want to receive any notification when your package comes back into a good state so that you don't have to investigate it further and everyone will have a different opinion on what they want to receive yeah so maybe that should be something that's configurable per maintainer but that increases the work from that level I just want to make it quick on one thing that would be interesting to me personally regardless of how we get notified about these things is how to go about reproducing the problem without having to set up the full environment again so I don't want to have to set up ci.debian.net let's say even if it's really easy or like whatever other tool it is so I'd like to be able to run this one test and then get the failure and then I can fix it and then run the thing again and show that it works and then close the bug or whatever it is it's tricky when you know if that's happening on a MIPS box then that gets tricky because you actually have to have the hardware or you have to get to a port-a-box and replicate it there are tools that can do this kind of stuff I mean Antonia we've looked at p-route and care from ST before there are tools that can track all of the binaries that were touched are affected or accessed by a particular test run pack them all up into one big lump and give you the package back and then you just deploy that over a clean truth and you're back to where that test was so things like that those tools exist and we can think about whether we actually put those in I found when debugging auto-package tests recently the test runner gained an option to drop you into a shell when a test fails so you can then go into the environment and play around with the actual environment that's really useful this kind of thing is very useful but obviously it doesn't work if the test is not running on your system so maybe something like this so that the system can deliver the environment to you in a bad state and then you can poke at it then it would be very nice yeah okay Wookie, as far as bootstrapping is concerned what can we do to actually what are you having with the mic okay you'll comment yeah I was wondering if we how much we could make that as a service like you know you have a web form and you upload the dot changes file or an archive with like the dot changes file sign and a couple of dev and it runs dev ci or something some other test for you so that would solve the problem of Thomas for example that because you can send that you could use the testing framework before uploading the package yeah I was just thinking what can we do in terms of simulating the bootstrapping stuff and all the have we got tools for identifying these dependency loops in advance yeah so we just uploaded botch along last after many months of it not being in the archive like two days ago which has 40 scripts of various sorts of is this buildable is its dependencies installable what's cross buildable today and all that sort of stuff oh I'm supposed to stand up not as a big bunny hello yeah so there are useful tools in there and as I said I think we who's got mass bug filing scripts or libraries and things here so he ends up anybody nobody's actually written any code for this there should be lots of it lying about because people have been doing this stuff for ages and that would be helpful so yeah there are some tools now in the archive I don't know if it's exactly what you need and we've still got some pieces missing but yeah okay until you have a question don't go how do we make sure that auto package tests are kept up to date currently I see a lot of packages well at least when important to a window which fail their auto package tests and not just once but well several times and and I don't see some tests ever succeed and that's a lot of pain if such a test is triggered by well some common package for example by setup tools all Python packages are triggered and trigger all the non-working tests uploading GCC triggers everything so that's that's something which we should avoid because then none of the core packages will ever be great this is what we have for build failures already only block if you have had a failure before if you if you change from good to bad I guess I mean in Ubuntu we had a problem where we let lots of packages in which were bad and now they are still bad and we're blocking for that reason because the infrastructure wasn't wasn't solid from the start but I guess in Debian we should take care to try and before we turn on the the naughty switch make sure that everything's working correctly one minute left Antoni, do you mind? No so hopefully we got all most of this into the goby document I'll be using that as the basis for ending we actually work on from here if you've got particular questions about how new tools and how we can actually work particularly with lava on this kind of thing or if you just want to know more about lava that's the evening talk today seven o'clock I forgot which room it's in it's on the schedule okay thank you very much