 So, this talk isn't about testing a cloud itself, like compliance testing or checking functionality of a running cloud. This talk is about testing the cloud. And what I mean by that is testing cluster software that can't run in containers. It requires either a bare metal machine or a full virtual machine. So something that looks like real hardware. And in fact, we need more than one of them. So we can't run it on a single VM inside a cloud. Furthermore, we require full access to the network, which means using a number of virtual machines inside a cloud is going to be problematic at best. So the software I'm talking about is OpenStack on OpenStack or simply triple O. And our goal with this project is to deploy OpenStack, making use of OpenStack itself wherever possible. So this means to deploy a cloud, we need a cloud. We also want to scale out to a ridiculous number of machines. And rather than using configuration management software to try and keep the machines up to date, we're making use of golden images, which means that we will roll out a new image for upgrades and for package changes. You usually don't need to do that for configuration changes. We've got other things that are out of band to do with that. So some of the constraints for testing this that we have to fit in are the ones I've already stated, like that it has to be a bare metal or a virtual machine. And with full access, but we also have around 50 developers working full time on this with a large number of patch sets running. And if we don't have the CI system that can scale out to handle this, we can either end up blocking someone or everything because we can't let it fix. So the CI system we've built hooks into OpenStack's CI infrastructure, which is actually a very interesting topic all by itself. But I'm only going to touch on it very briefly because like baking an egg for breakfast, the chicken is involved and the pig is committed. And in this case, the OpenStack CI system itself is the chicken. So the diagram shows all the pieces of the OpenStack CI system. They contribute a push as a patch up or updates an existing patch. And we go from Garrett to Zool. Zool does queuing, make sure that patches that rely on patches before it are queued in such a way that you don't have something that's going to be waiting for something because it's in front of it. So each patch will result in a number of jobs being created for each test one. It's maybe up to 20, but we're currently maxing out about five for TripleO, and we've got plans soon to double that to cover more cases like HA and things like that. So Zool then talks to Gearman, talks to Jenkins to assign a slave to run the tests. At this point, we diverge as in TripleO does. Since the slave we create is on a cloud TripleO project runs, rather than the RaxBase or the HP public clouds that these CI system users runs. So since we have five jobs, I'm not going to go through all of them, but they pretty much have a common baseline. The job then uses that machine to run our disk image building software. So we create fresh images for each test run. This has some problems. There's a concern that doing this is quite slow. But this is pretty much solved by Cajun and local mirrors. As another issue, Ubuntu uses Upstart and Fedora uses SystemD at the moment. This means we have some rather horrible case statements in our code about that they'd assume that Ubuntu is Upstart, which is probably going to involve large amounts of hair tearing when Ubuntu moves to SystemD or when stuff changes with that regard in terms of it might be a bit of both. Just as a note, we support more than just Ubuntu and Fedora. We support Debian, RHEL, OpenZooza, but at the moment they're not currently tested in CI. Test environments with my terrible, terrible diagram done in Inkscape. This slave then opens a connection to a gearman that we run. This is per one of our clouds and is completely separate to the gearman that OpenStack CI runs and gets handed details about a test environment. Like one of these things, the control plane is a virtual bridge running on a test environment machine. And so you have a number of seed VMs talk to that control plane so that they can talk out to the cloud and other parts of the network. And the other VMs must communicate via that seed. And so this also means that each test environment can't talk to each other, but they can talk to, for example, the cloud, the CI cloud. Then they don't really tend to influence each other except by closing load on the hypervisors. So once we have the details of the test environment, we can start to build out our cluster. This is actually quite a pretty diagram and I quite like it. It's pretty close except the terminology we're using has moved on from under cloud to over cloud, which gives the perception of nested virtualization, which is false. So this contains a seed which has a minimal OpenStack installation on it. And the idea being that you bring the seed in on a laptop or a virtual machine, something like that, pointed at the hardware until it to go. With IP and my credentials for a bunch of machines, we can at the moment go out and deploy up to, I think, 80 machines successfully. And in the interest of eating our own dog food, we use our own scripts to build out our own CI cloud. So we're using our own stuff when we're running CI as well. So using the services on the seed, we build out a deploy cloud. I have what it's showing up here is an under cloud. And so once the deploy cloud is up running, the seed can, in fact, go away. But we can't do that since we'd rather like to be able to talk to the VMs over the network. So the deploy cloud is a complete OpenStack install, contains things like horizon, the wave interface, and is fully expected to be built in a HA fashion. Why can't I scroll? And then we'll use the deploy cloud to build out a workload cloud, which shown up here is an over cloud. You can see here that it shows that the over cloud is using more hardware than the under cloud. And that is to be completely expected in a usual sort of deployment for this. You'd only want to give three machines to the deploy cloud since it doesn't run virtualization. And you're really only gonna be using it to run, to build up another workload cloud. And so that one is also pretty much required to be running in a HA fashion. And it's what you point people at to say, here's our cloud. So once we have all that running, we could test the workload, we test workload cloud by spinning up an instance on it. And at the moment, since we're running in a virtual machine, we can't use nest advert. And that means we're stuck with full emulation or QMU. The instance is Cirrus, so that's actually minimal, rather small, and it's just so terribly slow. We can't really do anything with it. At the end of the test run, we connect to each virtual machine. I sync the log files off as well as state about each cloud, like for example, what the orchestration system is doing, what Nova thinks is running. And so if the deploy cloud files to come up, for instance, we'll stop and then we'll do everything else. And then we can release the lock on the test environment and then tell Giman that we're done with it and Jenkins, the cycle begins again and we'll get reassigned. We don't usually bother about stopping the VMs inside that test environment because our scripts will do that for us. And we don't bother about clearing their state because they're about to get reinstalled by having new images just blotted over the top of them anyway. So if the job pass or fails, we'll come on on Garrett and give feedback to submitter about their review. Triple R core reviews of which I am one, they'll look very carefully at the CI results to see that there's evidence, the code is not gonna break master. And as you can see in this particular check here, the Fedora 20 job has failed. So the Jenkins test up the top are all about the code itself. That's things like Flake 8, like checking the code, checking the running the unit tests, whereas the check triple O jobs, the three of them down there are about putting the code into a full deployment and running up a full deployment and seeing what happens. So since we saw the Fedora job fail, now the submitter usually has to determine why. And since the patch I've been using for screenshots as my own patch, I get to determine why it failed. It's actually rather an odd failure. We didn't get any state from the host and the workload cloud failed to come up. So clearly we have a little bit more work to do with our CI system. And I think we've got plenty of time for questions. I think everyone's in stunned silence. How big do you end up? How many machines you end up having to dedicate to actually testing the whole thing? Is it like the test environments themselves? So you usually have to dedicate more than you think. The test environments are, in fact, the bottleneck. We don't tend to run out of compute in terms of like, for example, running up the instances to wait when we build the images. The resource there that tends to be under the most contention of the test environments themselves. The VMs that are running are three gigabyte each. So that starts limiting you to about five test environments per like 96 gigabyte machine because we don't want much overcommit. So I think at the moment, I think we've got about 20 test environment hosts. And I think only about four or five actual like compute nodes in the workload card. I think Nick had a question. It may be more of an ironic question than a triple O question. The boot loader support for the bare metal, bringing up bare metal machines. Is that currently x86 only or have you got support for some of the other boot loaders for the different architectures? Feel free to take that one offline. I don't know. We are pleased to have some people within HP working on this stuff for ARM. And I know in the triple O stuff we've been adding support for like U-boot. And like, for example, you need to run U-boot to point out the kernel address offset and all that madness, which I thought I left when I left Canonical. Well, we just rewrote Beak, we just rewrote Beak is and I was wanting to talk to some of the ironic folks about it. Okay. But I think it works. I'm not sure. It'd be great to see it for power. So we should totally talk about that. Yes, yes, we should. Maybe you could, you know, lend some hardware. Ah, hardware, I've heard of this thing. This thing that hurts when you drop it. Hi, I had a question about the checking process. Like you check in the code and then the Garrett will have the code review and then the code check-in to the master git, isn't it? Have multiple layers of git checking. Sorry? Can you show the first graph? Which graph? The first one. Oh, that one? The Gitsby code story where after a good Garrett code review. Oh, you got good repository. Yeah, what the process? What the process is from the checking the code from the Garrett code review after the review and then. So there's actually, I was trying not to go into this because this is also related to the OpenSEC CI stuff. So our CI system runs for the check queue. There's also a separate queue called the gate where stuff that if it passes tests will land and that gets marked in the Garrett review and then if that will merge into the Git repository that the Garrett's got and then that gets pushed up to both GitHub and Git.OpenSEC.org. So everything automatically or manually? Automatically. So from the Garrett code review it's automatic to the Git repository and it's pushed to the upstream, isn't it? So once you push up code, we check it and that will cause Jenkins to either vote a minus one if the test failed or a plus one if the test pass. At this point, you're waiting for other humans to come and look at your code to check and to see if what you've stated in the commit message matches your intent, whether there's any bugs in your code whether there's any issues in that code and they'll vote with either a plus one or a plus two. So there are two cases, right? For example, the fail case. For example, you check in and the Garrett code merge back to the Git repository is failed at that state and then do we run an Jenkins job or? So keep in mind that the patches uploaded to Garrett aren't they're only on Garrett until they get until they get marked as merged in Garrett. The only place to get them from is Garrett until they're marked as merged. At which point they're in master and they're on GitHub and they're on git.opensack.org. So when the build process start? Sorry? When the build process start in Jenkins, when the Jenkins job start, start to build after a commit successful. The build is triggered by the review. The build is triggered by the review. Ah, okay, right, thanks. And by any, like if you upload a new patch set, that will trigger a completely, completely new check. Like something has changed. We need to check everything again. Any more questions?