 So let's start. Welcome to my talk. I'm Jan Simon-Muller, and I will give you some, well, insights on some, well, let's say, research project I did on how to combine Fuego and Lava for two run tests. In case you wonder, DL9PF is actually my ham radio call sign and not some weird code name. So in case you want to reach me, you can use one of those emails. I'm working on the Automotive Grade Linux project as release manager. Take care of the CI infrastructure and the automated tests. So that's why I'm looking into that topic. If you say AGL what, then come over next to the registration desk in the showcase. There's our table. And you can see what we have been up to. So what do we want to talk today? Yeah, so how do we test today? I will look at a few existing tools to set the stage, well discuss a few problems that we have on read hardware right now. There are probably way more on how we can solve that. If everything works, we'll have a demo. And then I will show you how to replicate the set up or how to work with that. A few words on what was good, what was bad. I left out the ugly stuff on purpose. So I'm still working on that part. And what's next? A few questions then in the end. So who's recognizing his desk? Raise your hands, right? So usually what testing means, you need all the hardware in place, you need everything wired up. You need to upload the right build and then make sure everything works as expected. That's of course a lot of work and a lot of manual work, a lot of, well, three words and so on if it doesn't work. And reproducibility is a topic. Now we can do better, right? So who recognizes his drawer? So yeah, once you get more boards, you see it can get a little messy. And that's just the one level of the drawer, by the way. Also, it's always the problem kind of getting exactly that hardware at the right place to you in your lab or wherever it is. Also, on the other hand, well, your colleague might have the board. And he ran the test, so where's the result? So that's two things that I see also for AGL where we are a little bit in trouble. So I basically ask me, we code with code review systems. This debuted it. But how do we test? It's in the lab, well, somewhere in the basement probably. And so there's a disconnect between how we develop the code and how we actually do then the verification, the testing. Of course, I'm simplifying, it's of course not that easy. So let's look around what tools do we have. There are things like AutoTest, which comes a little bit out of the distro server space. So that's a framework used there. Kind of Jenkins plus some homebrew or BuildBot plus some homebrew stuff. That's pretty common or, well, manual, which is the precise method. Nothing is better than two pair of eyes. But, well, to be honest, you do that once, you do that twice. And then, yeah, you, well, don't pay attention anymore, right? That is mostly non-distributed in a lab. And then upcoming or in the last couple of years, we got more and more distributed systems. It started, basically, with Lava, which is a lab tool. You can say it's a tool to manage your boards in a lab. Fuego, which is, in principle, Jenkins with some extra test tooling. And it basically is front-end kernel CI. So we are going into the distributed space here. We are basically aggregating and using remote labs. So Lava, there were quite a few talks during this week. Also, one talk about bringing up a lab. So that was very nice. You find the main site, validationlinaro.org. So it's maintained by Linaro. What it is very good at is to orchestrate your board farms, manage the boards that you have. We can use a lot of deployment mechanisms. Basically, the whole fastboot, uboot, uefi, all of that is supported, network boot. And the newer version, version two, has a very strong slave worker model. And the slave is very lightweight. It even runs on a raspberry. Yeah. The downside is the initial learning coefficient setup is quite steep. So you need to dig into it, dive through the documentation probably twice until you have all the interesting spot setup. And you need a Debian machine, as it's only packaged for Debian right now. So that's that part. Fuego is basically a Jenkins, which has a set of ready-made tests, basically kind of an abstraction script, and pre-packaged tests that are or can run within that environment. And then you can run the tests, collect the results, process the results, and basically in the end, create reports. So that's the big goal here. It evolves out of the LTSI kernel test project. And it has a very large number of tests from simple things like... Okay. So it has a large number of tests, and it's very good at reporting. Now, well, let's say downsides. It's a different approach. So here we started really in the lab, which means we assume here we have local access to the boards. It uses Jenkins Java. So it's... Well, it might run on the Raspberry, but... And it's really... It started out as local system. Okay. So now the goal is then how to distribute the tests and then aggregate the results. So that's my idea here. Because I cannot have all the boards in my drawer. Yeah, it's already full. And, yeah, what we actually care about in the end is, does it work on one of the boards that we want to test? I don't care to really have that thing next to me. Yeah. So how can we make that happen? Okay, so distribute the tests. Using the latest Lava V2, we can... They split it, so now we have a master and a slave. And those slaves can be small satellite labs. They use an encrypted 0MQ pipe. Yeah. So the satellite lab will communicate with the master over TCP. Which is nice. I don't need to have all hardware in my lab now. So what do we need? We need a way to control power for the board. We need serial and we need network. Well, the minimal is one network interface, but I always set it up with two network interfaces. One is my public one and one is the one to the device under test. Yeah, which simplifies things like fireballing in the end. So such a small lab, which you can do on a Raspberry Pi, has this setup here. Raspberry, USB serial, USB to Ethernet. And some way to switch power and there are nice hats for the Raspberry out there. For example, that one with a relay. So that's a full satellite lab environment already. Okay, so that's nice. I can now distribute my test hardware. But one thing which is a problem with lava to some degree is, yeah, I can generate all those tests, but what I do with the results. If I want to post-process the results, I need to do that otherwise. So let's see. We are looking into Fuego here because it has the ready-made tests and it can post-process the results. And we have those existing tests with existing parsers and we can post-process and generate reports. Basically, we have the whole machinery in place that Jenkins provides here. So how to mesh that up? So we had already in Fuego a hook called target setup link, which is called the first time we need or we try to access the target. Okay, good. That's a good place for me to hook something in. Poor main script. The teardown is a little more tricky because we have no finalized hook yet. So I added some target teardown link. It's working, it's probably not the right place. But more on that once we discuss it. What we need is, of course, a lava drop template. Parts of that need to be inserted to make that work. So for example, which kernel, which device tree file, which root FS, so that needs to be inserted there. To keep it simple, that could be a static file, but in the end, you want it, for example, per commit in your code review system and so on, so that needs to be dynamic in the end. A settings file, well, we need to know what board, what credentials we use for the board and so on. And in the end, everything needs to be sent as a drop template out to the lava instance. There are a few gotchas in this setup right now. I still rely on the transport SSH. So Fuego has the notion of a transport to the board. And we do, I think, three things, get, put and command. So that's the three things we do. And I still rely on SSH. So I still need to cheat a little, because I have a VPN between Fuego and my satellite nodes. So that's the big boo. So I need to get rid of that to be fully distributed. I also use a standard hacking session. What does that mean? In principle, the target comes up with an SSH server, period, and waits for me to finish. It works, yeah, but it's not the solution. Okay, let's see. Here's the Fuego instance that I've set up for us here. I created a board description for the Raspberry. So that's my device on the test. And let's just start one of the jobs here. I'm looking for, where is stress here? Let's take that one, build it. Okay, so let's go to that job. Okay, so we start up the job and finally we end up in the target setup as expected. Now what happens? We submit a job to the Lava system and there it needs to be scheduled. That is what we see over here. So that takes some time to spin up and we see the job going on here. It downloads the requested files and so on. It powers on the board and we already see the bootloader prompt and so on. It boots and then we are ready for Fuego to do its test job. Yeah, once the SSH is up, so it's as if local. I did not deal with how do I go really distributed. That's still something to look into. But there are some ideas and that's fine. That's the RAM disk, we don't use that. In principle, yes. That's the difference in assumptions. In Fuego, we basically have a board up and can connect over SSH. So basically I juggled the SD card, made sure it's what I want to test and then we just run the whole thing. I did it right now in this way that one test job is one invocation and then I close the connection. Of course, now we could say, okay, can we pull this or how can we stay on that or keep the board open? Okay, now the board is booted. Fuego can connect, can do its job. And here we go. And once we have the test run is running right now, once it's complete, we basically tear everything down and kill the job basically. Yeah, so coming back to your question with the pooling, I tried that, but there is one issue. If there are also timeouts in lava and if I pull it and then I might have a job which runs very long, then I hit the timeouts over there. So that's a little bit a game how to sync that up. Okay, and we are done. So how can you try that out? So I cloned Daniel's branch Fuego Next and I worked on Fuego Next. And here's my project on Bitbucket. I just rebased it yesterday again. It's the same instructions. You just use those two repos. And I created templates for the raspberry. What you need to add is of course the lava bits. The credentials for the server are then in there. Username and the token. That all needs to end up there. So what's the good? The session per job works actually quite reliable. It's not super fast because we have the frequent reboots, but it works very well. Now the question is, would it make sense to pool jobs or do you want to stay on the same file system? Those are valid things to keep in mind, but I found it actually works out. The board handling is stable. I get the board reliable up and down. The only thing is kind of syncing both systems up. It's now basically pooling. I have no way to really query the lava system if the board is up to make that happen. So the latest discussion I had with the lava developers was that there is actually a notification system or possibility. So I could from within the job description and say, okay, now send a signal and catch that. So I didn't have time to look into that, but something like that is possible. Then we can say, yeah, we are ready and synchronize that better. For Fuego with the two hooks, it then works mostly out of the box because whenever we need to access it, we bring up the board and then it's as if it were in-house, sort of. The bad, we had predefined time-outs in each Fuego job. And those were, of course, done with the idea in mind, the board is local, the board is up. Now, if I go and push the setup into another system, then, and this lava in this case has to download the binary files first. I completely go over any time-outs that we defined as valid here for a local setup. Also, in Fuego, we have to work on basically the time when we really need to access the board because right now, out of the box, we would start the job, grab the board, and once everything is up, we say, wait a minute, I need to compile the test. Assumption in Fuego is the target doesn't have the test on it, I have to compile it, I have to copy it over, run it, grab the result, which is a fair assumption. But in this setup, we basically keep the board open, we grab the board before we have compiled everything. So that's something to work on. Basically, have more fine-grained steps here. That can also help to parallelize the built part better and to utilize the boards then in the end better. For lava, we need to expose more information about what's going on. So I can, for example, say, yes, the board has booted successfully. Even, for example, in the lava command-line tool, and then I could say for a job, yeah, it's booted, the kernel is up. We have that information, actually. In the log, we just need the signals out in the command-line. What would be cool if, well, that boils down to the whole transport issue? Right now, I'm using SSH. So basically, for having it completely distributed, I'm cheating, I admit. So we would need some sort of pipe towards the device under test. Whatever that is. Another concern is then security, of course. On the other side, we could completely skip that step if we basically build the things up front, push that out onto a web server and the device just pulls from there. Then we get kind of rid of the whole SSH problem and just need to console or need to, actually, Fuego would just have to wait for the result and pull that, right? So that's what I just figured, basically, why we don't need that, right? So what's next? So we are already in discussion on how to improve that and I will have to clean up my code and make that more generic, better usable. As we just said in that idea, do we need SSH? So how do we deal with the transport here? Or can we just work around the whole issue and kind of work in a fully distributed environment? For example, another idea would be, well, instead of making the target setup link execute a kind of dump job in lava, well, just make the job in lava the test job. So there are multiple ways to skin the cat here. We need to figure out what is the best way forward here. Of course, kind of pooling. Pool one lava session for multiple tests. So we stay on the same booted board for speed reasons on the other side. If we look at the Fuego interface here, if we use such, let's call it, orchestrator like lava or board tool like lava, whatever you use, here in Fuego, one board is basically one execution node here, one executor in Jenkins. Now here's some ideas. How do you add the board here? Well, traditional way you would have to add a node, right? Now, in the lava world, actually I have a device type, Raspberry Pi 3. And multiple boards can back this device type. So if we think that through, this is actually my device type. And with a lava in the background to handle the boards, I could just say, wait a minute, let's just crank up the number of executors in this type here. So I have actually two Raspberry Pi's I can test on. With the kind of target setup, target tier down in the job, that is an easy way to distribute the tests then across multiple boards and speed up the whole process. So the rebooting each time, yes, it hurts, but with that approach, I can distribute it across like 10 boards. Now, the downside is we still have some... Well, I left out the ugly slide. We still have some assumptions here. For example, we still hard code the SSH IP and so on. So that would break right now because I don't have two IPs. But that's something to work on. It's a different set of assumptions where we come from. It's a different set of assumptions from where we start. In Fuego, we assume the board is booted, but for example, we assume that it can be a production file system, so there is kind of no test binary on it. So that's the assumption from Fuego where we come from. In the lava world, we basically have the tests on the file system and just say run and grab the result. So that's a different starting point. But that might work out if we just kind of fetch the tests anyway. So that's this idea. If we can scale here, easily you don't have to add an executor per board, which makes it very nice and easy because you set that up once and then just crank up the number of executors. That was another idea. Okay, so where to find all of these? So there is a page on Fuego on elinux.org slash Fuego, which will point to the Fuego website that Tim set up. Lava you find on validationlinaro.org. I did not mention Kernel CI, so basically Kernel CI is a front end which can aggregate test results. If you haven't seen the project, check out kernelci.org. They do a lot of boot testing around the kernel and they basically aggregate those boot results across a large number of targets. That brings me to... Oh, I skipped a slide. That brings me to those slides over here. So as I looked into that, Kernel CI, Fuego, Lava, with kind of my tinfoil head on, I'm interested in the first place in result. Yes or no? That's very brief, but in the first shot that's what I want to know. If I have to dig deeper, we now end up having multiple output formats. So I was like, hmm, okay, so I cannot mix and match here. So that would be a very good thing if we can work out a way so whatever test system we have, whatever you have set up, whatever you use, we actually are most interested in the results then. And it would make sense to aggregate those for multiple reasons. If you have one single test result, it's, well, it doesn't say you anything. Yes, the thumb up, thumb down, but if we have multiple results, we can actually draw conclusions from that. How does it evolve over time? Also you need a good baseline. And if you have a couple of results that say, okay, that's the level that you will see on that board, you can say, okay, wait a minute, I'm way off. Also, if we can share those results, we can kind of raise the bar in the whole ecosystem. And kind of, yeah, instead of reinventing the wheel and redoing the tests on our own workbench, we can learn from each other. The problem with the, yeah, output format. So we have been discussing that in the last couple of weeks. One thing we were looking into is, for example, the test anything protocol, TAP, Google it, test anything.org. It might be a way to have the output of our tests be in the same format, and then we can aggregate it with, for example, the tool at Kernel CI and evaluate it much easier than right now. All right, questions. Yeah, Tim, you asked already your questions. Let's use the mic. Could you please tell us a little bit more about what kind of tests Fuego is currently running, please? Okay, the question was, what kind of tests are in Fuego right now? So I can show you the ones that are in the test plan. That's not everything, right? We have some more. I would have to bring up the shell. So it ranges from benchmarks, functional tests, LTPs in it. That's a subset. There are more. If you look into Fuego-core, slash engines, slash tests, that's the whole set. More questions? Oh, yep, go ahead. Please explain. You mentioned AGL program. What are the links between AGL and Fuego? So the list of tests here are quite generic, I would say. What is done specifically for AGL in Fuego, please? So we are looking right now in the infrastructure. We have no specific tests yet for AGL. Yeah? Yes, it's working progress. Yes. All right. Thanks for joining and enjoy the showcase tonight. Visit us. We are next to the registration desk. Thank you.