 So I'm going to go ahead and get started. We're on time here, so I want to try to keep us on time. I have another talk I'm giving right after this. OK, so I'm Tim Moreling. I work for Open Source Technology Center, which is part of the software and services group at Intel. I've been there two years, but I'm working on Yachto Project, which I've been working on since before it existed, so I was working on Open Embedded before that. So I've been in that space for about 10 years. I'm also a Fedora maintainer, or package maintainer. So I've been in the Fedora space for about 10 years. So today I'm going to talk about living on master, and I should actually say confidently living on master. In this case, I'm using Yachto Project Jenkins and Lava for basically a rolling release. So in my case, it's a weekly release. You can ignore the Yachto Project part of it if you want to, and you'd like it to be something else. You want it to be a different operating system or some other open source project. That's absolutely fine. Bad aspect of it just happens to be what I was actually using. This past year, I've actually been working on the real-time Linux project. And so this actually is what drove the need for the hardware testing and things like that. So let me ask, how many people are using Yachto Project? OK, great. How many came primarily because of the Yachto Project part of this? OK. How many people are using Jenkins for your build system? How many people are using Jenkins pipelines? Awesome, OK. How many people are using Lava? How many people are doing hardware testing at all? How about automated hardware testing? OK. All right, so we got a lot of manual testing. That's what I would expect. OK, thanks. So I'm going to guess that you're doing everything wrong, but I don't mean that to be offensive. But here's why, because this is what I've been seeing over and over again from lots of teams that I work with. Traditional approach is what I call fork it and forget it. And this is a problem, and this is in fact what the vendor kernels have been doing to you for years if you're working on anybody's vendor kernel. We actually have a vendor in the audience that's very good about not being this way. So what I tend to see with teams that I get involved in and usually I get involved in it way too late, they already have problems. They're close to release and they're panicked. And what I see is that they pinned everything at a fixed version upstream. And by version, I should actually put that in air quotes, because I have literally seen projects that cloned just at some random spot in time. They just cloned to commit-ish, and they froze on that for the rest of their project. They didn't even take a release. So then what do they do? Then they make all their changes in tree. If you're lucky, they actually commit it and use git. They never send any patches upstream. And I would actually add to that git clone pocky, get CD pocky, git clone metafoo. The next part is tar, CV, FZ, whatever, and then email. And I am not kidding. I have worked with teams that that was exactly what their model was, and luckily those projects, those products are no longer existing at Intel, but anyway. So the other part about this is, this is why. You were obsolete from day one. The moment that you chose that fixed point in time, the rest of the world didn't stop. So you just obsolete yourself instantaneously. And what you're creating is huge future technical debt. The cost of that is unbelievable. You have no, if you are not in a team that has had a release and then two years later tried to update it to a newer version, you have no idea what you're in for, because it is extremely expensive. So what I'm trying to tell you about today is not easy, but what I'm trying to tell you about will save this huge future technical debt. And in fact, it'll also save, oh, let's see. I worked on a military project where I ended up being out in the field with a radar system rotating at 30 RPM, six, or a foot in front of my face, because that's how long the serial cable could be for the debugging I had to do. And if we had simply done what we did six months later and had unit tests running, I wouldn't have had to be there because we would have caught the bug before. So that's the kind of thing I'm talking about. So let's follow what the Linux kernel release process is. Linux kernel release model. So if you have not read this blog by Greg KH, go read this. This is really, really interesting because he basically lays out where we've been, where we are now, right? So things changed. So who remembers back in the two, six kernel days when every vendor had their own little funky patch set in their own tar ball off somewhere and nobody ever was bringing anything together, right? So in those days, there really wasn't anything that was stable. So then we ended up creating the three one time frame or the three time frame where we tried to create stable releases and we finally have really learned something, right? So what we've learned is a mantra I think you've heard, you just heard it from the Debian talk if you were in here. Upstream first, okay? So what I mean by that is you catch a bug in something that you're depending on that's upstream that you're consuming, do it in master, do it in their top version, upstream that patch and then backport it, okay? This way you're gonna have a constantly moving code base and this is in fact what Linux kernel is doing right now and this is extremely successful in part because they have a lot of testing going on with zero day and kernel CI.org and people like that. So the other pain point I'm seeing and another reason for this whole concept that I'm talking about today is that everyone has their own solution and it's time to stop that because there's only so many of us that have the skill set that we are doing to make this crazy world of ours go around. The number of embedded devices and other devices that are using Linux and everything now is just exploding and there just are not enough of us to go around. So we can do what we can to train more people but we have got to share as much as we possibly can and we're all open source people so let's do it. So my motivations for this project that this is basically an outcome of the last year of work had to be open source. I'd say it has to be upstream not just open source in-house, something, something. I need it to be external and community around it. It had to be enterprise class. This has to be able to replicate for people who are actually gonna use this for real products not just hobbyist stuff. All the hardware stuff, all the parts I'm buying, all that had to be off the shelf. I'm not gonna create new things. I'm a maker guy. I've got a lot of skills with making parts but I'm not gonna do it when I need to replicate these things and this needs to be distributed. It needs to be shared with everybody that we can do. So my inspirations were two projects from Intel. These are former Nokia folks in Finland that created an operating system called OSTRO. If you were to go look at the OSTRO Git repository you wouldn't really understand what was going on there and think that it was even doing what I'm talking about but the reality is behind the scenes they had a continuous integration system and continuous testing and stuff going on. That project didn't live a whole long time and it turned into what became known as IoT ref kit. That project also didn't really get a whole lot of light a day but they did a lot of the things that I'm talking about here so I was very much inspired by that activity. The other thing is kernelci.org and validationlunar.org have really inspired me a lot in terms of how to do the hardware testing distributed and a bunch of things like that but the main thing was actually two to three years of ELCE and ELCE hallway track because I kept having these conversations with people over and over again about this stuff. By the way I chose Jenkins because I got tired of everybody saying are you using Jenkins? All right. So I used Lava because I got really tired of everybody saying why aren't you using Lava? Turned out those actually weren't the worst problems but yeah, so that all came out of those hallway tracks. So here's just kind of a system overview. I thought a lot about how much technical content to put into this and I want to keep it at a higher level but we can always ask questions later but so basically what our system has is some kind of a trigger coming in. So it's a timer or get review if you're using Garrett pull request, read pull request there if you're used to GitHub or something else or we have a manual trigger that's a human turning it on. So it doesn't really matter. So it's a trigger, it triggers a job, it builds on the Jenkins or it triggers a job so Jenkins master schedules it. That job actually goes out to Jenkins build slaves. So in this case, the Jenkins master is a VM that's actually managed by our internal IT system and the build slaves are like a 32 core slice that's off doing that for us. The outcome of that is gonna be some kind of artifacts. So we wanna stage that. So in my case, I'm making a complete full OS right I'm doing Yachto project. I'm making a full distribution with every single build. And yes, I know I could do it other ways but that's what I'm doing. So we end up with the images and the kernel and the modules and all that stuff and some logs. So we put all that into staging, hold it temporarily. That goes off now to be tested. So we've got a lava server, which is like the master. It's going to deploy that job at the test job to a lava dispatcher or you can think of it as a slave or a worker and that's going to then go send it off to the boot farm. The device is under test or the DUTs and the test is actually gonna run come from some kind of Git repository. This is how lava is designed. The outcome of that then goes off and we've now added the results from that to the artifacts that all gets deployed to some kind of artifact server and the result of that gets emailed out to everybody. And this is happening for if you look at the triggers. So timers I'm talking about nightly and weekly. So every night we're building the complete days work over again and running it through testing. Every single commit is actually getting built 100% and tested 100% and then we also have like a weekly build which turns into an actual release and then I can just trigger a manual build anytime I need it. So just a little bit about the repository layout. This I put in here basically because I wanna try to let you see some best known practices and give you a little bit of idea how I did this. My original intent was just open source, absolutely everything I'm showing today including all of this Git repository and everything but for various reasons including scheduling and stuff like that we just couldn't pull it off this year, we're still hoping to be able to do that. So top level, we got this top level repository underneath that you've got the build specific stuff. So the pipelines in Jenkins, you use a Jenkins file so it's actually in your Git repo and that's basically just kind of a wrapper around some scripts that are doing some individual stages to build things. We added in some groovy scripts and so on which allow us to use the job default YAML so if you don't have any other kind of definition of what the job's gonna be that's what's gonna get run and things like proxy variables and stuff like that is in the ci.yaml. So I could also add some other .yaml file that we actually tooled it up to pick up the name of the build or the name of the job in Jenkins to actually grab the YAML file based on that. So then we've got some upstream sub modules. There's a lot of other ways of handling upstream Git stuff, right? There's repo tool or setup tool, there's combo layer, there's a whole bunch of ways of doing this. All of my team is extremely comfortable with Git and to us we want to know what, we want to very, very rapidly know which commit we were on so we use Git sub modules. So notice I'm not cloning Pocky, I'm cloning BitBake, OECore and MetaOpen Embedded because I'm creating my own distribution. So we've got local directories that are in this Git repo. I've got the distro layer, BSP layer, app layer, whatever, and then I've got my image tests. So rather than have a completely separate repository for the images, I actually have all of those image tests in my repository with everything else which allows your local developer to run it on their own machine. And then we have internal sub module which is just a bunch of CI helper scripts that are helping this all run. So there's kind of a combination of the helper scripts and the groovy shared libraries that are helping this all come together. The groovy shared libraries are brought in by the Jenkins configuration. So they're called by the Jenkins file but they're not actually in this repo. So what I'm doing though is I'm using automation to stay up to date because I'm consuming 14 layers right now and all of those are moving including the ones we have internally. So what we actually do is we have a job that runs every day which auto syncs those Git sub modules. It generates from the short log of all of those sub modules that you've got it generates a commit. That commit is sent in as a Git review or a Garret review so the builder or the bot itself sends in its own review. We originally had it committing. So anybody worried about the robot apocalypse? Yeah, so having the machine or having the build system itself committing to itself we decided that wasn't a good idea. If your policy is that humans can't just push they have to get a review by another human. Why on earth should the machine be able to push? So anyway, and again so Garret reviews or Garret is a Git review thing. Pull request is more like GitHub, whatever other thing you have. That actually triggers a build and when I say trigger I mean literally it triggers it with a Git hook or something like that. So it's automatic. Every single time a commit is pushed in it's gonna trigger the build. So what happens with that? On every single build, literally every single bit and in this case it's Garret so it's every single commit. LavaJob is launched. We actually boot our devices that's Intel so all of our devices boot off of network boot or Pixie boot and that's fully supported in Lava. We may have gone off the deep end a little bit because we were new to Lava but we did things a little bit differently than some people do. So we created a custom init ramfs. So this is actually what is being pulled down from as the Pixie boot, the network boot. So your device comes up. The BIOS is set up to do a network boot. It's gonna grab a DHCP address from the server. It's going to grab the image that's gonna boot. So that init ramfs boots and then it sits there and waits for the production image to be downloaded or whatever the image that was built to be downloaded. We then flash that to a USB stick. If you use the right kind of sticks and they're USB 3.0 and it's the right brand of Sandisk that can be very quick, that kind of thing. We reboot. We actually have power control on each device so we reboot it, hard reboot. It's not just soft boot. And we also decided we were gonna write all of our tests in PyTest. Again, we've got a group of people that's very, very comfortable with Python and so that's what we wanted to do. In case you're not familiar with it, PyTest is one of the various open source testing frameworks that are available in the Python world. So a little bit about what the system looks like. So if we took our view earlier, we had the triggers and the build and everything. So we kind of just got our trigger going to the Jenkins master and blah, blah, blah. So what we're gonna do is we're gonna focus down in the lower right-hand corner now. So this is what's really going on. So Kevin Hillman and others have given lots and lots of talks about lava so please go and look at, if you want a whole bunch more detail, go and look at some of those. But basically we've got some kind of trigger. Tells the lava server it needs to run this job. It figures out where it's gonna schedule it. So what kind of device or which specific device is it gonna schedule it on? Sends that off to the dispatcher that is in control of that particular device. It's gonna grab the image from wherever. That's going to go out to the DUT LAN, the device under test. And we do this so that DUT LAN is 100% isolated, right? It's got no contact with the outside world. So the lava dispatcher is the DMZ if you wanna think about it that way. So in our rack we've got a network switch PDU which is a power distribution unit or you could call it a power control unit. We have a USB hub and the reason for that is the way we're actually controlling this device under test is it's gonna get its image from or it's gonna get something from the network. In this case it's going to actually be W getting the image from the lava dispatcher. The lava dispatcher is actually controlling the power so it's actually flipping the power back on and off. And we ended up using these little ubiquity M power because they're cheap and they actually allow, they're the eight port version allows you to actually measure current per device. And then as if anybody saw Kevin Hillman's lab in a box talk earlier, we use exclusively FTDI cable, USB to serial cables to go to the device under test. So one other thing I wanna talk about though is this test repository up there. You might wonder how that's getting in there. So one of the things that lava does that I like about it is that it actually takes your production image, it untars it, opens it up, it grabs the test repository, it grabs its own static testing stuff that it always does, puts it in an overlay FS, puts it on top of your image, tars it or zips it back up again and then sends it off to the device under test. So rather than me building all of my production images with all the test stuff in it, which is kind of weird, I'm benefiting from something that was already going on in lava. So the next steps for us is, again I said we were using PyTest. So the problem with that is if you run PyTest, it's gonna pick up all of the tests that are in your directory and run all of them and that just looks like one great big one test case for lava. So what we did is we said, wait a minute, we understand what lava's actually doing. So the dispatcher's, once it sends the image in and boots your device and says go, it's hands off, all it does is watch. It's watching the standard out. So that's when it needs a serial port. So it's watching the standard out. It turns out it's just looking at some certain messages to come out. And so we're actually emitting those messages. Right now we're doing the test case events. So I wrote a PyTest plugin to allow it to actually emit these test case events. The next thing we need to do is it allows us to actually do measurements with units. So being that I was doing real time, we're doing a lot of cyclic tests. We might wanna be able to look at the maximum latency or something like that. So we've got some particular measurement, some particular unit we wanna get out of it. So I'm almost done with that particular aspect of it, but I do need to go through this internal process we have to get things fully open sourced and out the door. Another thing that's on my list, this probably doesn't quite go with the Lenaro lava folks, but I would like to actually have these industry standard testing report formats just be available straight off of the lava dashboard. So two that I've used a lot is TAP or test anything protocol. That is very common in the Perl world or JUnit, which is common in a lot of places. PyTest will actually emit either one of those as a file or as a report. There's ways of handling that to get arbitrary files off of your lava job, but I'd just like to see it a little bit more standardized. What lava gives you now is either JSON or YAML-based output. It's just not enough for what I particularly need. So I said earlier that CI wasn't really the problem and lava wasn't really the problem. So originally when I first started this whole thing over the couple of years I've been talking about this, lots of people were having issues with CI and how Jenkins could really work for them. That really turned out to not be the problem for us. In fact, we could interchange to about any build system in there that you would wanna use. Lava's not easy, but it's doable. It's very doable and it's got a lot going on for it and a lot of open source history that's great and a lot of security features and things like that that are great. The next problem is actually you're generating massive amounts of data, so I've got this boot farm. I wanna have that thing just continuously 100% utilized, right? If I don't have a regular scheduled job that needs to be tested, then I'm gonna just run regular testing and replicate the results over and over and over and over again and just generate massive amounts of data and do what with it. Okay, this is the problem, right? So my next step is, this is one of the next things I'm gonna be working on and I talked with Kevin Hillman and Olaf Johansson about this and some other folks inside Intel. We really need to do the data analysis and visualization after this. We're probably looking at something like elastic, the elastic stack, so I don't know if anybody's heard about that. Anybody used Elasticsearch before? Has anybody ever used Meta-C Pan? There's probably not very many Perl people around, but okay. So Elasticsearch is just a full text search engine, so all those logs can just be shoved into this full text search engine and then you can search for what you want. So there's a couple of tools that allow that to happen. Logstash basically just takes your logs, shoves them into the Elasticsearch and then Kibana is a visualization tool, allows you to actually graph that. On top of this, what else do I need? I need machine learning, right? This can't be just some static script. I need to train this thing, how to learn what I look at when I look at those results. What is it that I'm seeing that tells me that was a good or a bad build? I need to train that system to do that. So this is kind of a big problem we've got to do next. So let's do this. Moving forward, what are we gonna do? I said earlier we need to stop doing it ourselves individually, siloed off replicating the same damn thing over and over again, solving the same problem in different ways. We're beyond that at this point. So what I was trying to do with this was actually create an end-to-end reference enterprise class system. I'm talking about 14, 15, 16 different tools involved here. My idea was originally to do orchestrated containers because they're supposed to be easy to deploy. It should be fairly effortless to replicate that. I realized DevOps people might rather use Ansible playbooks or something else, but the problem was orchestrating the containers. I went down a path of Kubernetes, couldn't figure out how to get anything out of the cluster and just gave up on it because it ran out of time. We also had a few other issues with some of our containers talking to each other and things like that. So we ended up having to go bare metal. I didn't really like doing it, but I kinda had to. So then other thing specifically in the open embedded Yachtal project space. I think we want an internal layer index because as a corporate entity we've got our own corporate different layers and Git repositories, but nobody can really see what's in them. And so we wanna be able to have a layer index to actually share that stuff. So obviously you need a Jenkins master or some other kind of CI system in there. You need some kind of artifacts server in there. You need all the lava stuff, data visualization. The big part that I think we need to do next is Lanaro's been running a public lava server, so Colonel CI's got a public lava server for a long time now, but it's mostly been focused on ARM because that's what Lanaro does. But we're in this world now where we really need a lot more, everybody needs this, right? Across the board, the whole embedded space needs this. So the only thing that really makes sense to me personally is we need to move this to some aspect of this to the Linux foundation. So maybe we have a public lava server that individual contributors can, they can share and reuse their test definitions. We're not gonna create a Linux foundation boot farm. There's not gonna be a rack of your board sitting in some Linux foundation server room. What we're gonna do is make it easy for you to run these tests or run your own device and have it connect into this system. Or maybe take it smaller than that, but inside your own company you've got different groups doing that. So it makes sense to make that containerize. ColonelCI.org's already done this. And then the goal of this is to get better hardware coverage, right? Runtime testing. So Zero Day and other projects, ColonelCI.org and things like that, what has it done for us? I don't know if you've noticed how stable the kernel is when it gets released the day that Linus says this is the release, but we basically have extremely stable kernel immediately the first day. That's why it's kind of the poster child for rolling. And so we need to get to the point where we're doing this with runtime testing on a hardware so that people can get away from that prehistoric nervousness of, oh, I've gotta wait for that release. It's a vendor release. So they're already six months out of date and now I need to wait a few months for that to get stable. So now I'm a year past master, right? And then suddenly Meltdown Spectre comes down and what are you gonna do, right? So guess what, where do all the fixes, the CVE fixes and everything go? They all go to upstream first. Then you gotta wait for somebody to backboard it for you, things like that. So this is my idea that I've been talking with people about lately. So, any questions? Yes. Most likely this will either be the Octo project hosted or on Intel's public GitHub. Oh, sorry. Sorry, the question was when this stuff is released, where will it be hosted? So the PyTest Lava plugin, that's gonna go out to PyPy. It'll probably be hosted by Lenaro or the Lava project is my goal. But the rest of the stuff, so all the CI, you know, part of it and the container orchestration and all of that, you know, my best guess right now is it would either be hosted by Octo project or Intel's GitHub, public GitHub, unless we launch some new project within Linux Foundation, which is also possible and it might just all get contributed there. So that's still to remain to be seen, but that's the intent. So, any questions, other questions? You can go as technical as you want. I didn't go very technical for a reason, but. How many different devices are you testing with this method? Right now our boot farm was designed for 16 because we just rolled it out and that's what we've got at the moment. There, it's a combination of our workhorses and our terminal board turbo. It just really, really works for us and it's been easy to integrate. And then we've got, you know, some other pre-release boards that are headed for automotive and things like that and we've got new pre-release desktop systems, new pre-release servers, all that kind of stuff. So that's still kind of a work in progress, but that's the idea, but the whole thing should be scalable. So each Lava dispatcher, because it's actually got to download the full production image and open it up and tar it up and, you know, recompress it and send it back out, actually has to do a fair amount of work. So I would not want to go much beyond eight devices per dispatcher. Maybe four is actually the right number, I'm not sure yet, but we haven't been hitting it such that it's doing a whole bunch of them all at the same time. But it, the idea here is to make the whole thing scalable. So I'm using an off-the-shelf industrial PC as the Lava dispatcher. I ended up getting one that's got two NICs, those two gigabit ethernet NICs, so that it, I don't have to have any other networking stuff involved in order to have that be the DMZ. Any other questions? Hi, so if you, let's say you're working on a project and you're always using Yachto Master, doesn't that mean that you run the risk of, you know, some library version gets updated in Pocky and it all of a sudden breaks compatibility with your application and then your build fails and then you have to stop whatever you're doing and fix your application. In practice, how often does something like that happen? So we had two models in this past year, so we actually started off being building on top of RefKit, which basically meant we were living on Master, actually meant living on Rocco, so living on a stable branch is what we were actually doing. In six months, we only had four breakages ever, the whole time, so that was what actually started giving me the confidence to do more than that. At that time, I actually ran a nightly job that was a rolling release and then we started not seeing any failures in the rolling release and we started saying, why in the heck are we not trusting it? And so we just went ahead and moved forward. So yes, recently there was quite a bit of pain in some things that changed and a few things took a while to get to Master and so on, but that's the reality of it. And the thing is, yeah, that was painful, but there's no way I want to be involved two years from now trying to look at that code base and get it up and running again on something new or ripping my entire code base apart and then getting it to work again. When I can do it bit by bit, right, I can do paper cut by paper cut over time and you start getting better and better and better at fixing those things rapidly. And by the way, because you're upstreaming your patches, everything's going to mainline or going to the upstream layers that you're, whatever the upstream thing is that you're consuming and all of your fixes are gonna go there and be tested by their infrastructure and so on, right? So I understand that there are probably oodles of people in your company, in certain situations that are all writing code that might be brittle, but I would rather know right away that it's broken. So maybe the model is you've got a small team or you just have a CI job running on Master, which just lets you know, oh, by the way, something broke, put it in your queue for future work. But and what you're really doing is living on a rolling branch, a release branch, but I still think that you need to move in this direction and not just be frozen in time. So what might be freezing you in time is a vendor kernel. So get on them and say update your kernels, right? So again, we've got a vendor in the room that is doing a great job of that. If your OSV is frozen in time, ask them why. I think that kind of thing. Yeah, I think the main reason that's been holding us back is just lack of person resources. I mean, if there's only one or two embedded engineers on the team and the managers are saying, scramble, scramble, we got to get this out the door, it's been hard for me at least to justify to the folks who aren't intimately familiar with the Linux system on the device, hey, we've got to update to this new kernel version or this new Yachto version. But hopefully, I mean, personally, my team's not really in crunch mode now and so we have more of the resources to bring all of our stuff up to date, but in practice, it has been very difficult for me to manage actually building an application with keeping everything up to date and building and working. I understand it's a challenge. I just think we don't have a whole lot of choice anymore. We need to move in this direction and all the work that I'm talking about was three people. So that's all we had and me as a lead, I didn't spend a whole lot of time doing any coding. I was just kind of feeding the build system and keeping it alive, but any other questions? Question, can you talk more about, let's say your system detected actually as a failure, right? So in the, and problem assigned to some engineer actually kind of like, what does a workflow, how engineer actually can go and actually debug actually, and there's actually problems in some sense. Understand actually kind of what was actually a test, how to reproduce, for example, like you had some process actually now crashing, does your system retrieve actually a core file out of it and just kind of like present this actually information to the developer. Sure, so since we're using PyTest, everything's actually getting output as a JUNIT XML file. That file actually gets brought back to Jenkins and so we can actually look at that build. So every morning you come in and you see whether you've got a green build or a red build or what happened, right? So the first thing you do in the morning is look at the result of last night's build. Or if you just made a commit, you wait for, you come back later to look at the result of that commit and you find out that something broke. So Jenkins has got a display on that particular build job that's showing you that there was a failure and then you just go into the logs or whatever else and you fix it. And sometimes that's a half an hour to fix it and sometimes it's three days to fix it. But the thing is is that if you didn't fix it immediately, it was just gonna come back and bite you later and you didn't even know it. No, but from engineer, actually, you have this actually set up and you run actually the tests. So I need to kind of be able to reproduce the issue, right? Right. So sometimes actually I need to have a way to look locally around this actually stuff, kind of like understood actually what exactly actually test actually was doing. If I can't reproduce in my setup, but it's reproducible in yours, can actually kind of go and reserve actually the thing, can actually stop your system at the point of failure and logging into it, it's a kind of go fix it actually. That's a simple thing actually, so it's like to say, but in reality actually we know the problems actually require a lot of attention. Right, I get it. So there's failures happen at different levels. So there's failures that are build system failures, right? So Jenkins itself messed up or we couldn't fetch or something like that, right? Then you've got things that are gonna be at the system level or the image level, hardware level, right? But if you weren't doing this continuous testing, you wouldn't know that those were even coming in. So the idea is that the reason we went with PyTest is that we're literally running exactly the same tests on the developer's desktop. And if they're writing new code, they're actually running it through QMU. So they're actually running the same PyTest, but they're running emulated. So they've already got some pretty good confidence that everything's fine, because by the way they're also, they started with last night's build, right? So everybody starts every day using the most recent build. So yes, it's not, again, I'm not trying to say it's easy, but the problem is that if you don't do this and stay up with it, then a week from now or a month from now or six months from now, you've got 10 or 100 or 1,000 or 16,000 failures that are just waiting to be discovered. And you're not gonna be able to get your product to the next release and update to it. And so now your product's gonna be frozen in time forever and then some major CVE or something like that comes down and you have no capability to actually fix those security flaws and things like that. So I just, I really honestly believe we gotta just roll with it constantly. It's the only thing that's gonna work. So this whole solution, I think you said has about 14 technologies put together. So how brittle is that? And how often do you yourself update Jenkins and update our lava and all that? So Jenkins and our plugins are updated weekly and I have yet to see any of our tooling fail because of any of those plugins updating. Some of our stuff is indeed running in containers. We have seen some of those containers have trouble. Jenkins itself and a couple other technologies that are based on Java, absolutely. There's a memory bloat problem and other things that go on like that and it just stops working and you've gotta reboot it. But we've had a few issues with the Garrett plugin where it doesn't necessarily always pick things up and so we have to manually trigger things and things like that. So there have been issues, but I would say it's maybe kind of one issue per week that something had to happen and I'd like it to be way more robust than that but we're not exactly 100% sure what the causes of some of them are. There's one that's quite annoying where actually a couple of characters but it's not random characters. It's always the same one or two characters will get dropped on the way to the DUT from the dispatcher and we have no idea why right now. So the solution is just to rerun the job, but yeah. It is definitely a problem that the system itself has so many moving pieces. The solution to that is actually more tests for the system itself as well. Any other questions? There's one back there, hold on. Yeah. Now I'm curious that you're doing a rolling update out to your products in the field. That's the whole point of this. So you're periodically pushing out there monthly or weekly updates to your customer units. So that could be the model in my case. I'm not doing that because this is inside of Intel. It's the real time Linux project which is sending those images out to other groups which are developing things that are then gonna send it out to customers that are gonna use that to develop real products. Okay, so you're... So all of this stuff is quite early in the development cycle of a real true honest product. So in the case of something that's actually out in the field that you have the ability to update it, I would recommend perhaps living on a stable branch, but I think we need to not have a whole bunch of devices that were just set it and forget it and just launched out in the field because then all of a sudden something as bad as Meltdown Inspector drops into our lap and we've got just all those edge devices are just vulnerable. So that's a different problem. If you weren't building and testing, you wouldn't even know that it had happened until you got hacked, so. Is the current work published somewhere so that we could have a look and then start? No, so unfortunately due to realities of time and schedule and limited resources, we were not able to do that yet. So that is still the intent. So more on that later, but I intend to... So everything was already written with open source licensing, so I don't have to do anything to actually scrub the licenses themselves, but I still have to scrub out some stuff that's internal only IP and things like that and then get it out through the lawyers and the security people. So we're a very large corporation, so there's a lot of steps to that. Any other questions? So also the off the shelf components I'm using, we're planning on making that bomb available somewhere as well, so I'll be working with the kernelci.org folks on this type of stuff, but go ahead. So when you do your CI builds with Jenkins, are you reusing downloads in S-State from previous builds? It's mixed, but so we do all the nightly builds are reusing S-State, the auto sync job, if that build was successful and we commit it, then that actually updates S-State, and so more or less every build is, well, the nightlies are just using S-State. Most of the other builds are actually going to update S-State because things have actually changed, so we're trying to keep it moving as we go along. The weekly builds, which is actually what's the release candidate, so those are the ones that become the promoted build and then actually somebody turns it into a release and an actual release notes is generated and emails is sent out. Those are built 100% from scratch with absolutely nothing. Everything's just wiped clean before that build starts, but we intentionally do those on Friday night so that nobody's even worried about it. You just come in on Monday after it's done. Then we run a bunch of other tests over the weekend that run pretty long, but yes. Okay, so the Yachter Project is very open and welcoming and helpful community, so if you have been afraid to ask on IRC or email or anything like that, come approach us since we're here or whatever. You can ask me questions directly in email, but I will very likely send you back to the mailing lists because I can't give you free consulting when I can share it with hundreds and thousands of people, but I'm very happy to talk to anybody about any of the parts here. I'll share as much as I can of what I've got and some of it I just have to wait until I can actually open source it 100%, but there's a lot of detail here that I just couldn't cover in this amount of time we have today.