 I imagine most of you are probably on Twitter. So feel free to take photos and post it on Twitter. Use hashtag FOSSTEM, F-O-S-D-E-M. Let's get trended on Twitter. Because there's no actual official registration, FOSSTEM doesn't really have a good explosion like in land yard and websites like that. We should make it like a trend. Okay, so this talk is about continuous integration at a distribution level. And here we are with Martin Pidge. Martin Pidge works with Ubuntu, I guess. He works on Linux, plumbing upstream, and in Debian Ubuntu, SystemD and stuff like that. And have created and maintained a distro-level CI in Ubuntu. Please welcome Martin for the talk. So welcome, anyone. And yeah, it's a pleasure to speak in front of so many people. I'm overwhelmed. And of course, lots of thanks to the FOSSTEM organizers and the volunteers. It's a great conference, so thanks a lot. So yeah, I've been in the Ubuntu developer since pretty much its inception in 2004, and a dev developer even longer, long time. And in Ubuntu, we have practiced continuous integration at the distribution level. And it is, we do test-based gating for like the 30,000 packages that we ship. And as far as I know, we are still today the only major distro which does it with that rigor. Like test-based gating of the development of the distribution. And so I want to share some thoughts and experiences and maybe convince you that it's a good idea. So what I'm gonna do. So I first, why did we do this? And then how do our tests look like these days? How do we use them for gating? And then I want to lose a couple of words about the infrastructure where we run all this because at this scale, you imagine it's not accepted easy. And then how did it change our life doing all this? And then we should have about 10 minutes of Q and A. So where did we come from? In the first few years of Ubuntu, we had this six month development cycle. And in the first four months, this was basically anything goes, toss everything over the fence, feature development. And then we hit feature freeze. And in the remaining one and a half, two months, we tried to fix about half of the regressions that we introduced in the first four. And during that time, as a developer running the development release, it became kind of a day-to-morning exercise to fix your broken boot or fix your broken x-word or try to restore half of the packages that you upgrade, removed underneath you because you're not paying attention. So while you learned a lot about how all the system worked, it was certainly not a very enjoyable experience for non-developers. And as a result, we did not have a lot of non-developers using the development release. And as a result of that, we were losing a lot of potential feedback for how the development series worked in real life scenarios. We didn't get much critical back reports. And as a result, a lot of regressions trapped into the stable release. And another problem was that in the distribution, you're facing a lot of archive-wide changes. Things like library transitions or attribution in your Qt or major Python version. So back then, these things were often not finished. People basically tossed them into the fence, half done, saying, oh my God, I have this deadline to make or there's feature freeze next week. So it's dumped it all in and sorted out later. And in the event, it became someone else's phone. And usually in the end, the release team had to clean up all these bits and kick out packages and desperately rebuilding packages against the API. And it was just a horrible mess. So once Ubuntu became popular enough and was being used in mission critical deployment and commercial products, this was simply not good enough anymore. So we set down a straight goal. The development series must be stable and usable and safe to be used at all times. Nobody's able to knowfully break it anymore. And so by using this kind of continuous testing and continuous integration, we basically want to ratchet towards perfection. We never ever wanted to have a bug which we can automatically detect and then in the development release. And yet another problem was that many upstreams already have test streets. It's just that like every project's done differently. So there is no uniform and thus automatic way to actually run them in a distribution at the time when you need it. So that was the situation. So initially like everyone else, we started with some rather naive approaches. So we set up a QA team and that created a similar duration of standalone test streets. Like Ubuntu desktop tests, Ubuntu server tests and whatnot tossed them into Jenkins and run it on the day and then basically pestering people about you broke the tests and please fix it. Well, of course, none of that really worked because it's too late. I mean, at that point when they detected it, the damage was already done. Also it's never an inflamed game. Developers point you to QA, but it's your tests that are broken and QA point you to developers. No, it's a software that broke. Also none of that was really reproducible and easy to replicate for a developer. So the conclusion from that was the only people who are able to be responsible for the test in a meaningful way are the developers of the very software. So developers are responsible for testing, they're responsible for gaining and the QA team basically only does the infrastructure and making sure that the test results actually arrive and provide some kind of result to people. Like if someone has a question, how do I simulate, I don't know, a Wi-Fi card or X or whatnot and the QA team could help you with that. But it must be possible for every developer to replicate this test easily without reading lots of manual pages. So the idea was instead of having the centralized test, we would put the actual test for a source package in the source package itself, where developers can develop them by themselves and they don't need to go to any QA process. And these tests would then be triggered when either the software itself or any of its dependencies would get up. And we, of course, would use these tests for actually gating. So once these tests break, the package doesn't land, period. And this was named auto package test. This is both the name of this whole kind of test and it's also the name of the tool that you're actually running to execute the test. And back then it was submitted as a Debian enhancement proposal number eight, which is why you can still hear a lot of people calling them Debian test, so don't be confused. How do these look like? So this is one of the simplest and oldest tests that we still have in the archive. So it's the kind of toy example where you begin. So every source package which the test needs to have the Debian test control, which gives the test metadata. Basically it enumerates all the tests that we have and it says, like in order to run the simple gzip test, we need to have these and these packages installed. And you can also give other properties, but I'm going to this later. This one doesn't need anything. And so for every test that you enumerate there, simple gzip, you'll need to have an accompanying Debian test slash test name. And if you see here, this can be very simple. It's basically a slope test that gzip and gansip does what we expect it to do. And so the contract is if the test that you run exits to the zero, then it's passed and otherwise it's failed. So you can write the test in any language. You can compile it if you wish. So that keeps you pretty flexible as a developer. So once you have the test, how do you run them? So this is the auto package test program. This is the thing responsible for creating a temporary test that. Copy the tests inside that, run it, make sure that all the logging is right. You get the results back and any kind of artifacts back. And you can influence the final behavior of the test, but details aren't that important. And it supports various packets there, which you can run the test here. So one of the oldest ones that we have is like a churrude, which you see like in the second row here. Like churrude is basically Debian's tool to manage, build and manage a couple of churrudes. So very simple. But of course this kind of limits the things you can do. So you can't start services, you can't interact with a kernel, because that doesn't work in churrude. So after we created all the package test, these new fancy technologies came along, QEMU, containers, whatnot. So these days, we have backends for LXE, LXE, QEMU, arbitrary SSH posts, or cloud instances, ADB if you want to run something on Android. And these days in production, we're actually using QEMU by way of using OpenStack instances. So you have a full virtual machine to mess around with. And we use that for X86 and the PowerAid, and for the other two architectures, ARM and S390, which we support, we run them in LXE because we don't have OpenStack integration yet. So a slightly more useful example is the schema that we use in a lot of libraries. So we basically do a simple compile link run cycle to make sure that the library that we ship or update still works. So this is the simplest test data data. It should be straightforward, I hope. And this is what the test does. Basically, it creates a simple C file, which, like in this case, it interfaces with SystemD's logindy. So we instantiate the logindy monitor, call it and basically make sure that the smoke test is right. Then we call GCC on it and run the program and if it succeeds, the test is good. So at first sight, this looks very simple. But on the second look, there's actually lots of things that can go wrong. So for example, we update the program to the upstream version and our development package is missing the specified new dependency. So it's trying to use a header file which is just not there. Or it actually forgets to install new header files. Or package config followed wrong and we did have new upstream versions which ship with a broken package config. Like all of that happens. So as I said, you'll see the same pattern and lots of libraries and everything that went to these days. So it is just... And a more intricate example is the, for example, the one from SystemD package where we run an upstream test. So upstream ships the test.net for the test.py, which while exercises network D. And as I said, normally the test is to live in that in such test. But with the test directory clause, we can say that the test is to be looked at in a different directory. So in this case, test slash network test, py. And this is a way how we can interface and run upstream tests by saying basically just giving the metadata for the test. And that's it. And you also see the new restrictions column here. So this is where you can say like, how well isolated does my test that needs to be? So in this case, we need to have root. And we also need to run it at least in a container because network D, you need to set up a couple of VEs and start services. And for example, this wouldn't be safe to run into root. And there was one like a stronger version of that would be isolation machine, which you, for example, using the network manager test. We had the logic kernel module, like Mac 8.0.2.11, HWSU, which simulates Wi-Fi devices. And this is the kind of stuff that doesn't work in the container. Also our kernel tests, they do really nasty things to the test bed and beat them to a pulp. And so you really want to nicely contain this in the VM. It's just not safe in the container. But I don't think you want to go too much into the details of that. I just want to give a broad overview of what's possible there. For example, in system D, we have tests which simulates the suspend for seeing how login it works or closing and opening up the lid. Or we create this, we create a loop partition to make sure that system integrates this properly. Or we install a couple of, the same mission critical packages like D-Bus, network manager, or XOR and LightDM and make sure that, whatever that means, system D upgrade everything starts and there are no failed services. And that you can reboot the test 20 times without failures. So, that's it. When we started this in like 2012 or so, we still had a very few tests, as you see. And then we started to introduce this into Debian. And since 2014, I got happy that Debian also runs these tests except that it doesn't use them for gating it. And you see the adoption curve is really nice. These days we have more than 6,000 packages which have all the package tests and considering that you also cover their dependencies implicitly, that's quite a nice coverage already. I mean, it's still far away from the 30s house but it points to the right direction. By the way, the big leap that you see here was kind of sort of cheating. So we figured out a generic way to test Perl and Ruby and DKMF modules because they all look very similar. So we have a central way to run them. And so that explains a nice job. So, now that we have the tests, how do we use them for gating? Initially, there's a developer who prepares a new package update, say, GDK. And uploads it to the distribution. That happens with Deput. That's just the standard Debian tool to upload a package. But this does not land directly in the winter development series. But instead it goes into something which we call the proposed pocket. And this is a kind of an overlay on top of the development series where all the new stuff is being uploaded. And it acts as a staging area for basically CI. And this one has no human users because by definition, that's the bit that is broken. And then we have a thing called proposed migration set of scripts. If you're up on a Debian background, you might also know it as Brittany. And this then does all the necessary checks on it. So a test that the new GDK builds on all the architectures where we expect it to. That it is installable everywhere. That it doesn't break installability of other packages. So we do this for all the 30,000 packages and that ensures that library transitions can only be complete and you don't knowingly introduce uninstallability. And for the purpose of this talk, it also triggers tests, both of GDK itself and everything that uses GDK, like all the reverse dependencies. And only once all of that is good, proposed migration then lands either GDK itself or the group of packages that are kind of belong together into the development series. And with all of that, we ensure that packages develop. They never regress on architecture support. They're always installable and in theory at least the tests never regress. Well, that's the theory part, but for the most part of it. And so of course, in this case, if GDK break something, the developer then might need to do further uploads to adjust GDK users to like new idea or what not. So, and this is a page which we automatically generate. The developers can check the status of their packages. So instead in this example with GDK, we for example, see here that like it didn't build on powerful C64 yet. And on the IS-86 that's an uninstallable binary package. And that's like one reverse dependency unity just failed the test. So this is a very simplified output. So normally we test on five different architectures and not just one. And also, we wouldn't actually start the test before we know that the package builds an installable because otherwise it's pointless, but like it should illustrate the point. And this is actually still a very simple case. Consider updates of GDK, Python, Perl or Hapt and they literally trigger thousands of tests. Like every time we upload GDK, we trigger 5,000 tests times five architectures. So our machine retakes like about two days to grind to that. But the nice thing is after these two days, you can actually lend the stuff with confidence instead of just saying, yeah, I hope that it breaks nothing because it always does. And of course, the exact same thing applies to updates of stable releases as well. So it's the exact same thing. People upload to say Xenio proposed, Xenio is our current LTS. And then it goes to all this machinery and only this green, we can actually publish it. So that's the structure of the test. I want to explain a little bit the infrastructure on where we run this, both because I haven't really seen this kind of structure anywhere else and also to be completely honest, I'm a little bit proud of it. So like many other people, we of course start them with Jenkins. And that was okay when we had like 20 tests in the beginning, but already then it was quite brittle when every update of Jenkins then you read through these three screen folds of Java exceptions. And it's pretty hard to maintain and it's really not easy to replicate locally if you want to develop the infrastructure. And we got a lot of losing test requests and it's a single point of failure. And at a scale of 30,000 packages, the test doesn't work. So we needed something better. So I sat down the other day and designed something on the micro-service architecture, which should use standard child technology as far as possible. And I wanted to have small and loosely connected components. So where it starts is what we could call a policy entity. So this is the thing which wants to trigger tests and they're waiting for the result and then making decisions based on these test results. This can be proposed migration that I was just explaining for the distro, but it could also be GitHub, for example, where you want to use it for creating pull requests and there's a couple of other consumers for that. And the only thing that this policy entity does is it puts test requests into an NTP queue. So we use rabbit for that, which is basically standard implementation. For those who don't do NTP, like very simply, it is a job distribution system. So you have a couple of queues where you can put in requests and then consumers can take out the request and NTP ensures that all of this is very robust. It is atomic, you can arbitrarily parallelize it and it is a very simple API. So basically a consumer of that, if you want to retest request and do your thing and then acknowledge the result and it's basically five lines of Python or you can do it in a single line of shell. And like NTP system ensures atomicity or if anything breaks and the request doesn't get fully processed, it gets hunted back to the queue and the next consumer basically reattends it. So what it does in our case, we have a bunch of workers up there and they basically take requests from the queues that they can service. So we have workers for x86, for example, and then separate standard workers for power to see because these tests run on a different cloud. So you configure the workers a little differently depending on what kind of service they can support. So they take the request, so the auto package test command line, run the test in the cloud, copy back the results and store them in a permanent thing which is written in this case and then at the end acknowledge the request. So if anything goes wrong in the middle, it goes back to the queue and you never do the test request. And like at the moment, we have many dozen of these power workers so that it can scale with the size of the infrastructure and it's quite painless to set up. And for those who don't know Swift, this is basically open stacks, standard, like open stack level can give the simplification, but it's basically a distributed network file system. But as a simple REST API for querying and uploading and downloading is basically you get someone a URL that you can watch it in your web browser or download with a curve or URL or anything. So it's, and it's pretty simple to use because it's a standard component of the cloud. I as a consumer of the cloud don't need to worry about it and it's separate from all the instances you set up so it's basically the data is safe there. And finally, once the logs are all artifacted to our Swift, then the original investor can poll for the results and wait for the test results to arrive and then do its decision and the loop is closed. And finally, there's also a web UI, this results browser, which developers can use and it presents the test results and they have links to the artifacts of logs and whatnot and they have organized history. So it's mostly a developer tool, but it's completely independent of the workers and the policy entity. So it's replaceable and it's not critical. And the whole infrastructure has some due to charms so that's the open tools way of cloud deployment management. So it's very simple to deploy all of this into three local Lexi containers and with the exact same one command line, you can also deploy it into production on say open stack and they can redeploy the entire infrastructure within minutes without using any data. And yeah, the whole thing is maybe two or three hundred lines of code, so which is about the same size as a single XML job for Jenkins, just say. So now we have all this, how did it change our life? It provides an effective coward and sticks for developer. The coward is of course the developer, the better you make your tests, the harder it is for other people to break the software because of this reverse dependency triggering. For example, new kernels have a tendency to break Lexi or f-almore or new X libraries or servers have a tendency to break Qt and in turn your version of Qt tend to break KDB. And in everything in the window, we have tons of KDB tests so our Qt maintainer always has a hard time to land new Qt versions and I guess they are swearing on this all time but as I said once you actually get it to green then you can land it with confidence, that's nice. The other effect is that these cross package changes I mentioned, library transitions and so on, they either land completely or not at all, they will be forever stuck in the post if you don't finish it all the way through. And this both ensures that you always have good development series and it also makes the release seems like much more easy and of course as a developer you can wind all the things all the time you want against the machine but after the deadline and need to meet feature freeze and whatnot, the machine is very patient and it won't give you anything so you need to do your job properly. But of course like there is no free lunch it comes with a cost. As a developer if you have a large amount of tests then you need to keep the tests actually running and for the most part that's of course they can break with new upstream versions or changes to code but sometimes they also break for entirely unrelated reasons. Like sometimes the cloud configuration changes or an external website that you are talking to in your tests changes almost out in which case it's probably an actual regression in your software but still, I mean people are chasing a lot of test regressions which are not entirely obvious at first. And of course also having test infrastructure which is able to process tests at this scale is not entirely free. So we are basically building a reliable CI service on top of a naturally unreliable pool of hardware and clouds. So you need to invest a little bit on keeping it running. I mean this fine gentleman over there took over the maintenance of that and you can probably tell you a lot of the glory details of tracking down Chrome loops and whatnot. And another big problem that we face in the bunch of downstream is be important more of broken tests in Debian. As I said, we have so many tests now and most of them come straight from Debian but Debian doesn't gate test. So eventually the failures land on us. For example, every new Ruby version that we import tends to break two of our modules and in the buncher we just don't have the manpower or quite honestly the interest of tracking them down so we just tend to ignore those. But by last, after a few months of using this people can use it and nobody really discusses the if anymore. I mean people do see the value. So the thing that we do discuss about is tweaking the policy and making tweaks to the infrastructure or maybe discussing how to add in your architecture to it and so on. So in general working people feel a little more safe that way. So, and that works really great for software which is native to Ubuntu but we are the upstream. Say install or Jojo or Unity or Wacmer or Snappy. But as I said, it doesn't work so well when we are just the downstream. So of course these tests find bugs and keep them from landing in the development series all the time but we then need to deal with them and file bugs upstream and do our compatriots and so on. Are we doing this? But it points out that running them in Ubuntu only is too late for most of the bits because in Ubuntu most of the code in Ubuntu just basically happens to us, it gets important. So what we really need to do is to push all of this upstream. So tests are already running in Devian and we see here Kwin failed like half a year ago and nobody notices because nobody gets the algorithm done. But there's work underway to actually enable dating in Devian as well and yeah, then this won't be a lot of my stuff. But the real place where you want to hook this in is upstream. I'm heavily involved in the system D community as a developer both upstream and downstream and back then when we had to do an upstream release of system D it took me like a week or two to figure out all the test regressions that it caused once I packaged in the upstream release and it's a big pain point. So one of these days I tweaked our package to be able to build an unmodified upstream source directly from a pull request without any patches and then adjust the tests so that they would drop some Devian Ubuntu specific expectancies and so that we are basically able to run our downstream system D test straight on the upstream source and then integrate it all this with GitHub. I mean GitHub is a wonderful webhooks interface which makes it quite easy to interface with. So that nowadays every pull request needs to pass exactly these tests. So this is an example of what things go on and this is no developer's habit because when this happens the change is still fresh in the developer's mind it won't land until it's all fixed and as a result of that every commit and extension, every release of system D is buildable everywhere, it passes the tests and then we have good releases finally. And so that easily enables things like daily PPAs so basically have a script which takes coming master, buys the Devian patches, runs the tests, uploads it to PPA if everything is great and so it's fairly safe to use these daily builds because we know it's not gonna break the computer. And basically packaging a new upstream release now becomes an exercise of why do we do a change which is really the thing where you want to be at. And this is not really limited to system D so this is generic facility and we can interface with other projects in GitHub. Of course it's always a capacity issue and you need to negotiate the exchange of some credentials but by and large this is possible. Okay, well thanks for your attention so we still have a couple minutes left for questions and if we don't manage to do it here grab me in the hallway or write me an email or find me on RSC I'm happy to talk about this stuff, I love it. How do you handle if a package has to land together with another package? This means, for example, is a new cube version which has a new interface which process, there's also a new version which person's cube and they have to land together. Yeah, so the question was how do you handle groups of packages which need to land together? Yeah, this is done by the proposed migration bits so basically for example you upload a new GDK which breaks the old unity and you then upload a new unity which depends on the new GDK then this Britney script is able to figure out like GDK is uninstallable, unity is uninstallable but both together are uninstallable and both together if I run the test of the new unity against the test of the new GDK, then it's green and then it lands them both as a group. So basically if you do a library transition from Python 3.5 to 3.6 you will get like a thousand uploads in the station for posts and once you fix the last one then all lands is a giant chunk. And this is the real point to get these cross package updates done in atomic in a safe way. And this involves a lot of reasoning about dependency trees and it's all a bit of black magic. I didn't write this so this is the work of Debbie mostly for me and I'm very happy that we could just reuse this. So Debbie does use that part also. So stuff that migrates from unstable testing and also only lands with library transitions are complete and like packages build uninstallable but it doesn't do the test. So with the test, does it get run? Got it. Yeah. With the test, does it get run by the Adverb in test control file? How do you personally feel about the distinction between tests that are testing the upstream code versus tests that are testing the code as it's packaged to find on the system? So do you care whether the tests are running against the code as it's put into the package location or do you mind if it's only running against it? And I'm not sure whether I got it but so the only thing we, all the package tests is the actual binaries that are in the archive. So we upload something to develop post and that gets a couple of that and they get installed into an actual system and you only run those. Yeah. So you test the actual installation of the binaries. Yes. But when the test suite itself is run it's often testing a source tree of the upstream code not the actual installed version. They can be the same version but it's actually not the same file. Oh, you mean if you run like a main check kind of thing? Yeah, because you're actually in the developer environment at that point now they should be the same thing. You haven't actually got the assurance because a lot of the test policies that we need to use can't actually test in slash user, slash limb. They have to test in an unpublished version. Right. So the question is when you try to run upstream tests which are designed to run against the build tree and start up the system, how do you do this? And this is indeed like one of the challenges. So if you have your own thing right here this will test the system installed package. This is not against the source tree. But if you have packages upstream which are only one in my check kind of style you have to modify them to use the binaries installed in the system because otherwise testing the build tree is not testing enough. You might have a perfect development in build tree but then mess up the packaging. For example, you forget to actually install a file. So the tests need to run what's in the system. Depending on the test, is that simply not all this possible? Sure, I mean for some upstream tests it's easy to do like you introduce like a path arrival or maybe a dash install test or something. Even automake has a make install check interface but not a lot of upstream software uses that. So if it's easy to change the upstream test rate to run against the system then of course it's almost preferable to use that. Sometimes it's hard and I would say then just write a smoke test. So the point of that is not really to exercise all the little early details of your API. That's what unit tests are for and this is fine to run and make check. The point of that is to test integration and packaging so that you can write your own, let's say smoke test which only makes sure that the package by and large works. Let's say for Apache you install the package and you put like a bar www index.html and you do a wget and make sure that you get back the file you expect. And this is actually in this case, maybe easier approach. But I hope that since more and more people are doing this upstreams are more like willing to take patches which make your test suite run against the install system as well. For example, no patches for well with the install test and for example the glib package actually has a binary which has glib install test which you can just install and run because they were doing the same thing in OS 3 for example. So it becomes more popular. But it is like a problem. Yeah, and it's not a problem that you're currently trying to monitor or look at the actual test control script file and work out whether they're trying to do is left to the maintainer to do that. Yes, that's right. There is a potential problem there with the gating that you're using because sometimes the gating is actually operating against the not what you think it is. You're not asking it to account or not. And I must say, even if you actually just run the upstream test suite against your build tree, that's still better than doing nothing at all because it still gives you the thing that you, if your dependency changes and then your build doesn't work anymore because for example Q changes API, you have a K win and it doesn't build anymore. It still points that out and it holds back to your tune until you fix K win. So it's useful. It's just not, let's say using the full extent of what it's supposed to do, but these ripple tests are definitely useful by themselves. And in fact, that's what KDE actually do. So they only basically run a check. And yeah, that's okay. There's one question right here. Ah, okay. So lately more and more program are using Riltico and the test suite in some cases are going to generate some intermittent issue. So how do you manage those? And say multiple? Intermittent issues like in the test suite. So sometimes it's going to work, sometimes it won't. I know that it's ugly, but it's a reality for some software. So these are always our favorite of course. And this happens a lot. So I mean, like everyone else, if it fails like one out of 10 cases, quite frankly, we push the reach by button. And the, like in reality, the report that we have here, this actually has a little recycle icons behind regressions, so a developer can just reach by their own test. If it happens like one out of 10, it's okay-ish. If it happens more, there's someone to actually sit down and debug it, sorry. And as I said, I mean, the point of this is it needs to be really easy to reproduce it as locally. So basically with these overpackage tests, command lines that I did when I showed, please. And there is tools which build the M images for you. Stem of the image, M images are basically almost the same than our cloud images. Or we have tools to build these Stem of containers. And there's existing tools to build to root. And there's not really a lot of steps to walk from producing them yourself. And this has switches, for example, the stash of the shell when the test fails, then you get a, you can attach it to the test set and then debug your test and cycle faster. And yeah, this is basically the daily gory bit of what the developer needs to do to fix the test. Because broken tests are useless. Anyone else? This one over there? Oh yeah. Okay, I'll come back after this. Can you tell me about the underlying compute hardware to support all this? I'm sorry, I can't understand you. Can you tell a bit about the underlying compute farm to do all this? The compute farm? Oh, yeah. Yeah? So we run all of this on, so x86 and policy, we run on, oh okay, the question was kind of tell me a little bit about the build farm that you, so we have currently three open stack installations in Canonical, where basically whenever we have hardware we toss it in there. And so we currently have I think 70 or so power workers for x86, so they can serve I-86 and 8064. And about 28 power workers for policy. And as I said, farm and IBM Z-series, we can't run them in the end set, so we use LXC containers for those. And basically you need to scale them to the degree where the maximum possible cure line is still valid. So of course this is a, this basically determines the delay that you're gonna face when you try to land big updates like GNC. So and obviously if you design it so that you can run GNC in half a day, then most of the time your infrastructure will just be there doing nothing. So this is the capacity issue. And as I said, we not just run distro tests, we also run tests for PPAs, and we also run test farm streams, and they all need to share the same thing. So this is indeed a bottleneck, for those you can basically never have enough. So as a developer you need to be a little patient. I mean if you have the least packages fine, it can go through in like an hour. But for GNC, now there is no fast updates of GNC. But I mean in return, like the more tests you trigger, the more important your package is, and the more the gain in confidence expect pickups. Because if you have the GNC, it breaks three packages which nobody looks at, you aren't gonna discover this until you release. And so it's worth doing that. I'm not sure whether this was your question or you have more specific ones, but yeah. So the point is you're running an open stack so that maintaining the infrastructure is someone else's problem. I don't want to maintain the data center, this is ISS drop, and they do this fantastically. And by and large as a consumer, I do Nova boot and Nova delete, and I'm happy to get by the end. And it just magically happens on the bank. Who had a question here? In your control file, you had isolate container or salation container. This one? Yes, exactly. So how is it like keyword or you have to find another control file to find the container? And no, these are fixed keywords. So these are defined capabilities in the other package test project. So all of these needs root isolation container and there's a couple of others they have fixed needs. I mean sometimes we need new ones, but yeah. So they basically say if you try to run this test in the charoot, they will just skip and say, sorry, your test doesn't have container isolation. And likewise, if you try to run the network manager test on Alexi, it will say sorry, it's not a full machine. So it gets skipped. So it's basically safe. And if you do try to run it on your own machine and it will do bad things. So yeah. So if you want a new one, then you need to talk to like the other package test guys like me and then we need to define a new one. And it does happen. Any other question here? This one has some infrastructure to handle complex applications and for example, this type of application. There's no, sorry? Do you have some infrastructure to handle tests of complex applications or this type of application? Which for example, interact with the user in some cases. For example, you can bomb GTK for one minor version and for example, in one minor version, they change the location of their cherished remodels. Don't change location, it means mine really easy. They are all part of the test path which they try to supply with the application. But your system application just don't handle encodings correctly. Do you have some test for this application? Yeah, so I didn't get the whole question but I think it was about how do we test test applications? Yeah, and we actually do. So originally you only have a cloud instance available so there's a lot of stuff available. In particular there's not XOR to install by default but the test can do whatever it wants to in the end. So we do have tests for example for the install or unity which basically install all the test profits and then configure the dummy XOR driver so that you can start ITM, you can start Unity and then maybe get the screen done and make sure that it works. And we do have tests for testing GTK or you can use Selenium for using Barser tests and whatnot so that actually works rather nicely and we do this. But of course it's limited to what you can do with like the dummy driver so you can't rely on an XOR graphics card there. So in order to do that, I mean we do have use cases where for example we'd like to run kernel tests or XOR tests or an XOR media machine, XOR API machine and we did talk about this and in principle we even have a plan how to implement this so that tests can be clear I need to run an actual hardware but we didn't implement this yet. So this would require setting up real IR machines like with Mars or in the record world satellite. So basically that you can automatically interact with real IR machines and have our tests there. We just don't currently have the capability but the structure would allow it. So some tests will require manual guidance by humans. Sorry? So some tests can be used in that set infrastructure but require some human interaction. Yeah, so this is like anything which uses human interaction like a test which is human interaction this is obviously not the target look here. This is not solving every kind of test case you might have but for the majority of kids you should actually try to run your test so that it runs on the medically so there is for example autopilot which can test desktop applications in a non-interactive way because humans don't scale. But I need them always to be these kinds and it's just not a scale plan, okay? Thank you, test. Okay, well then thank you I guess, yeah?