 think most of the audience is in now. Okay, so I think we'll just start. Just a little bit about myself. I'm Jan Lüber, I work at Pengatronics and in about 2008 I started with open source in the public by working at OpenMogo, building a mobile phone. Maybe some of you recognized the, heard about the anniversary 10 years ago, OpenMogo started. And since 2012 I work at Pengatronics as a corner developer, a system integrator and so on, helping our customers integrate inox. And last year at this conference I rented about long-term maintenance, how not to do it, how maybe to do it better. And today I'll talk about some experience we had with test automation and automation in general and what we tried to do differently with our own project in that area. So some background, what we do at Pengatronics, we built embedded Linux systems for customers, basically everything below the application. And yeah, there's a lot of components in such new systems which are constantly in development. Obviously the kernel, MISA for graphics, Wayland graphics, Gstreamer for video acceleration, Qt itself, a very large project from Google called Chromium. There's thousands or tens of thousands lines of changes per day and so everything changes all the time and often stuff breaks as well. So we need to test stuff and we do for our customers so we notice when stuff breaks. And basically this level of testing I think has been solved by using something like Jenkins for build automation and using lava for running the tests. But yeah, as we see later, we think that's not actually enough. So during the talk you can just interrupt me for short questions, for longer questions. Please wait at the end. And I hope we'll still have some time for discussions. So maybe a short survey like I did last time, who here has developed embedded Linux systems? 85% maybe? Who of those has rolled out a major system upgrade? So updating the kernel, updating larger user space components? 20%? But you have to. Lava is broken, you need to fix it, so you need to update major components soon. You need to have a process. Who does it at least once a year? 15 people? 15 people maybe? Not enough. Who has automated that? For the application? 8. And the kernel? Because that breaks as well. 6. Who can update their update installer? 5, 6, yeah, okay. And the rollback? Your update, of the update, breaks things? Oh, the same, okay. So I keep talking about automating stuff, because if stuff hurts, automated do it more often, because then you're more sure that it will work. So the 6 people maybe who left, what do you use for that? Just shout it out. In-house I heard. Jenkins? Buildbot? Sorry? Okay, yeah. So the current state of automation. We have things like Lava, Fugio, Autotests, Avocado, TI's, VATF, Uboot has its own tool. Earlier today we heard about the CI real-time R4D. It's remote for device-on-road tests. There was a talk today. I think there's talk about that tomorrow for automation during development. So I sit at my console and I want to run a git by sector, something like that. I do scripting via SSH, maybe expect. And there's also automation and production. So a factory produces hundreds, thousands, tens of thousands of boards per day. And those need software and they need some testing before flashing the software. So there's also often in-house, custom, or stock vendor-specific tools, or ad hoc scripting, so it's not reproducible. And we can't share the experience. So basically what he noticed and I hope that it would be the case is that all these are currently separate, but basically they need similar or the same low-level functionality, basically controlling the hardware. So I have a shortish wish list of things we should have in common shared software. So we want to have short turnaround times for development use. So if I run a git by sector, I can't wait for each step like 10 minutes to submit a lava job, get it scheduled on some board, wait for the feedback to the next step. I want to do that in tens of seconds. I want to support other use cases, like I said, installation of the factory and so on. I want to use the same board for CI and for development because in early development for our customers, there is only one board which works. Maybe there were 10 builds, but only one actually works. So in the day, I want to use that for development. Maybe at night, the CI should work and use that for testing. I want to have more complex state transitions that are currently supported. I come back to that later. And to make it reusable, it should be a library interface and not some demon running on a server. I want to control additional interfaces because many of our boards are getting interfaces more complex than just power and serial, maybe network. Things like it has an SD card, it has buttons that need to be pressed to boot in specific modes. It has jumpers to control the boot mode. It has custom interfaces where I want to have a logic analyzer to see if my changes as the Linux SPI controller, host controller driver, maybe break an SPI protocol that needs to work. So I want to automate a logic analyzer test. We need to automate USB interfaces like sound cards, video capture and output devices and so on. Some of that is supported by TI's VATF, but we didn't use that for other reasons. And I want to have multiple targets in one test case. So maybe I do video streaming, one device captures video, one plays it back. And suddenly I need to have one test case which controls both of them. Lava can do that. But yeah, it has other problems again. So people say, yeah, that's not invented yet syndrome. Why are we doing this again? I think most of those tools in this testing space are shaped basically by those individual requirements. And probably they work very well for the developers of those tools. But they don't see all the use cases we have, obviously. That's not a bad thing. So I'm not here to talk those people down or something like that. Just reporting from our perspective, what was missing for those devices, for those tools and what we did differently. And maybe this is something you can use as well. So I'm just going quickly through other tools we've tried and looked at and say from our perspective what worked for those tools with our requirements and what didn't. So Lava, probably most of you know that, it's used by Linaro or developed by Linaro or used by Curly CI and many, many others. It has some sub-projects like Lava PDU, which is a separate demon to control power distribution units. So control power to individual boards that can be used and alone is very nice to have separate from Lava. It has a very nice web interface where you can see which boards work, which don't control them and so on. See test output. That's very nice. It has automated health checks, which is very important because if you have a lab of hundreds of boards, you want to be sure that they work when you test new software. At least work at the level they did when you added them to the lab and get notified if they break. What didn't work for us is that these boards need to be dedicated to Lava. So you can just connect to them via serial, do custom tests, to tests which need to be directed quickly. You need to take them out of Lava, put them on the desk and run a dedicated test and then put it back to Lava. So that's a lot of overhead. If you don't do that, each single test takes minutes or tens of minutes until you get feedback. See the talk tomorrow from the Belibu guys. Then there's Fuego. It's used by the CE, Consumer Electronics Working Group of the Linux Foundation by the Long-Term Stay Support Initiative by Automotive Great Linux by the Civil Infrastructure Project and so on. How much of that is in production? I don't know. That probably talks about that as well. It consists basically of Jenkins, some integration automation scripts and prepared test suits for certain areas. For example, it has integrated the Linux test projects and so on. But from our perspective, the killers tries to build the tests itself. It expects that it has network access to the board, which is not the case for all our customers. Some don't have network. And it's actually hard to set up if you already have a Jenkins set up what you like. So it brings basically a Docker container fully prepared. So it stays pretty separate from your normal setup. Then there's in the UWOOD Git repository, there's in test slash py, it's a small testing framework by, I forgot his name. Look it up. It's nice. It's basically some automation layer written in Python, which is used with the Python testing automation. So PyTest is a general Python test library. And it makes it easy to test UWOOD. It builds UWOOD and only UWOOD. So it's pretty integrated into UWOOD itself. It's really usable for something else. It's not usable as a library. And it is pretty low level. So from the test cases, you directly connect to the serial console. So that is not the concept of having more than one serial console. Then there's CIRT, R4D, which we heard about today. It basically makes LibWord capable of controlling real hardware. It can control power and can control serial console. It's easy to use real hardware from Jenkins via the LibWord abstraction. But other stuff we need is more difficult and not really fits this concept. For example, you can't have more complex use cases where you reboot the system several times because they just use Kexec, which fits their use case, but doesn't ours. And you can't really easily have test cases where you have multiple systems interacting. Then there's T-Bot from Heiko Schocher, which is also based on Python. It's really nice. You have basically a daemon. You can access sports, ESSH. It has a nice reporting interface from test cases. We can just send events that they get aggregated into report. But again, it includes a build system. It can apply patches. So it does things which are already implemented in open embedded york door, build route, PTIX, this and so on. And we actually want to use these real build systems because we want to test the actual result of our build systems and not use something else for building, which would be half differently. And it basically builds its own test infrastructure and not uses something like PyTest. So you don't get all the nice PyTest plugins through reporting the test filtering and so on. And then there's a lot of project-specific tools which I've not actually tried. Just looked at. There's Autotest4 by Google or Chromium OS, which they use for their system testing. There's Avocado, which Red Hat people use for testing LipWord. There's TI's, VATF, I don't know what the acronym stands for, but they use it for testing their BSPs. But it's basically a test system integrated with test cases in a single project. So it's not really reusable outside of those projects. And it's again only focused on testing, not the other automation we want to have. Yeah. Just a short summary, what most of these projects are missing, whatever their shortcomings are. So we have a large overhead for running and or writing a single test in most of them so we can't iterate really quickly on our development PCs. Makes a painful to write tests and that hurts test courage. We have limited control over the target. For example, with Lava, I can run tests in Linux user space, but I can type reboot and see the system reboot and check variables in bootloader if the watchdog works and so on. It's not really possible in that case. And I can't easily control additional peripherals, which I might need for testing. And I can't use those hardware controls for something like a Bissect test automatically if, or at which point in the bootloader changes. I introduce the bug. For that I would need to flash an SD card or upload the bootloader via USB. So, yeah, I wanted to fix that because those problems keep appearing more often over the daily development. And yeah, I wanted to use these automation layers for normal development. I want to have the same tests and tools I use very often in CI environments also on my development PC. So I can develop tests on my PC, move it to CI and they just work. And I don't need to keep iterating in the CI framework. I want to make it easy to use from normal Python code, because other people might have different use cases and should be able to easily use that. And I want to make it easy to extend. So if people have different power controllers and so on, that should be easy to add. And reuse existing test frameworks. So, yeah, that's a lot of work. But, yeah, I wanted to do it's in a way that I reuse all the existing stuff. So I don't want to have a build system. Like I said, well-working build systems and I want to test those. I don't want to build the test system because they already exist in Python. So I reuse them. I don't want to have a scheduler because they already exist as well. So just use Jenkins or run it from Shell with the cron job or maybe something else. And I don't want to have this fixed boot process that most of the other test frameworks have. So I built a library. So the end result should be that testing embedded systems should be much more similar to testing pure software or pure Python software. So I don't want this library to handle the top level flow control because that should be in the test itself. So I can reboot the system, power the system in a single test. And this code should be pretty high level. So my idea of what the API should be, ideally, it's obviously not there yet, is that it should be similar to what I would tell a colleague if he asked me, how do I test this feature? So maybe I say, power the board on, log in, run this command check if the output is correct and power it down. And that should be the approximate level of detail. So to do that, we have a pretty simple architecture in our tool lab grid. We have resources which describe what we can connect to. They don't have actually functionalities, just information. Then we have drivers which use resources to provide protocols, which are interfaces, you would call them in Java or something like that. And drivers can use other drivers to provide more abstract functionality. For example, in this case, I have serial resources, USB serial port or something connected directly to my board that is controlled by a console driver. And the console driver is used either by the bootloader driver, which knows how to control maybe U-boot, maybe bearbox. And if I boot into Linux, I switch to the shell driver, which knows how to log into a Linux shell and run commands in the Linux shell. And both of these implement the command protocol. So I can say, run this command, give me the output and error status. So if I have other bootloaders or other interfaces which I can run commands, I can implement the same protocol. So to make that easier to use, we have a concept of a target, which is a single board, which basically runs a single Linux system. And that is a target. And targets have resources and interfaces, resources and drivers. And they can have a strategy, which is basically the project specific knowledge of how to bring this board into a specific state because we built embedded Linux systems. They all behave differently. They have different bootloaders. Some have passwords, some don't, some use SSH or whatever. And this knowledge needs to live somewhere. And this is basically a special kind of driver called a strategy, which I can tell, move my system into a Linux shell or move it into the bootloader or switch it off or whatever relevant states this board I want to test have. And I can easily add additional drivers and resources to my boards. So this is configured basically using a simple YAML configuration file, this is a hierarchical file, which describes a set of targets. And for each target which resources, it has, in this case, it has a DevTTY use B0 device as serial port. And on top of that, several drivers with configuration information, like what does my prompt look like on the serial console. And I can have multiple targets in one of those files. And the expectation is that you have one of those files for each board you want to use. So what can I do with that? In the simplest case, I can write tests with the PyTest framework. So PyTest does the test execution, scheduling, finds the tests, builds reports in HTML or JNUX XML. So Jenkins understands the test results and so on. And PyTests, for those who don't know, has a central concept of fixtures, which are objects provided by the framework for each individual test. For example, like a database connection when you were testing normal software. In this case, we provide fixtures for the protocols and drivers. So each test can say I need the Linux shell command protocol in this case. I want to run a Linux command and test the output. In this case, I test if the hardware clock has correct frequency, just run hwclock minus c, which measures the frequency error against the system clock and use some asserts to check if that is reasonable. So basically this is the single individual unit of a test using LabGrid. Does that sound reasonable? No feedback. Mostly misconfiguration of Linux clock trees. So, have a wrong divide in the device tree. That is easy to find via this test. So if you have something like Jenkins, you get results, result pages like this. Get nice graphs without any additional work in Jenkins, just by parsing the result XMLs. Then I said I want to use it easily from the command line. So something like this. I have a tool called LabGrid Client, which I configure which, I pass the information which board I want to use. I lock it so my colleagues can't confuse the state of the board. I can control IOs for something like buttons or boot mode pins. Can enable power. Can load the boot loader via USB. Can load the kernel via Android fastboot protocol. And can connect to the console. That makes it easy to use from the shell. I can write one of shell scripts if I want to automate simple things. And I can use it from something like lava to connect to my LabGrid board farm. So because we in our company have these boards distributed at our desks and in labs in different rooms, and we want to be able to connect to all of them, we have a central component called the coordinator where from our lab PCs, there's a connection from each lab PC where a exporter runs, which provides information about which resources are available in each rack. So for example, we have a system which has 16 boards connected. It has a power switch via USB. It has USB serial converters, different USB hubs. And then it has a configuration that says place one has these resources, power switch, serial console, maybe something else, defined per port. Then the client program which I showed on the previous slide connects to that coordinator, request information based on the name of the board I want to control. And then it can directly access those systems. So we don't actually pass each request over the coordinator because keep it simple. So we just at the moment expect that you have SSH access with your normal users on those systems that you can run commands like fast boot or USB upload or something like that. So just collecting and distributing the information of which boards are available, who is accessing them right now and what are the parameters to access the individual resources. Then I said, besides testing, I want to be able to do simple ad hoc or one-off scripts for things like give a sec. Or in this case, I had a board which sometimes I didn't notice it myself until the customer came back and said sometimes this doesn't work. In 1% of the boots, one of the network interfaces will just drop packets which are sent. No other error messages and so on. So I wanted to have a test which repeatedly rebooted my board until this error condition occurred and then just gave me a shell or left the system in that status so I could debug interactively. So in this case, I just load the environment definition where the resources and drivers are configured, I get a target object and I try 1000 times to run this test which basically runs a ping and checks if the statistics counters increase, good frame sense, if it has not sent a frame, then basically the test breaks and leaves the board in this status. And I can run this program on any configuration file where the required drivers are defined. So it doesn't depend on having a board at my personal desk or somewhere in the network. It's completely transparent to the actual test code. Then I said I want to have something for factory automation. So in the simplest case, what's like the SOCs like the IMAX6 can be completely booted and configured on a single USB device port just connected and it does not find any installed boot loader, it just requests one over USB. So basically what you do is you wait for the device to appear on USB, send it to the boot loader, wait for the boot loader to appear on USB with something like Android fastboot, use fastboot to flash the whole system, configure MAC address, configure serial number and so on, boot into the Linux system, run some tests and show the user that everything is okay. In some cases you have a USB UART as well and usually you have more than one of those places connected in a factory to a single PC. So I did a small implementation of this. It's available on GitHub in the LabGrid repository. We can basically use these environment files to define multiple targets and the small Python script which defines which step to take to install such a system. And it is also usable as an example on how to use the LabGrid library interface to build something for a specific use case. And it's probably something like 500 lines of Python. Yeah. So the question was how would I control more and better like interfaces like GPIOs and SPI and so on. So for each of those interfaces I basically need to have a LabGrid driver which understands how to control the interface for those protocols from the host side. So maybe I have an FTDI chip with a library which knows how to send and receive SPI frames. And then I would need to have a LabGrid driver which exposes that as a Python interface. And then I would bind that to a LabGrid target under a specific name so I can use it from this level. Yeah. I should have a short demo. I think I have some time by the magic of screen. I can see what you can see just in this case a script. Basically what I do here is I run PyTest test suite against a system which runs in QMU where we do integration testing for Raoq which is our open source updating tool. Basically the system now boots in QMU to a shell, runs some basic information commands then tries to install an update. After the update installed it will reboot, check that the configuration of the new image is correct in the boot loader and then continue booting in the full system, the newly installed updated system, test that the correct version of that one is installed and run some additional sanity checks and then basically finish with those tests. So I just can show you some how these tests look. A little bit too small. Sorry. Just about this. So this is just how normal PyTest tests look and consists of two tests. The first one runs a shell command called barebox state mean minus d, parses some information and runs some asserts to check that. No, it's not a test. It's a helper function for this test. Sorry. This test uses the strategies I mentioned earlier to move the system into a specific state, uses the helper function to ask which system is currently booted then Raoq installs the new system which basically is a AP system, moves to a different system, reboots by going to barebox and again to the shell and then requests again if I'm in which slot I am and asserts that it's not the same one. Just very basic. So what we have currently working is mode control of ports. This is something we use daily over the network what I said with the lab grid client. We use PyTest to do regression testing, integration testing locally and in Jenkins with the reports you saw. We do sometimes attack automation for git bisect or reproducing sporadic errors with the same configuration, just some hundred lines of Python for each individual check. The auto installer code is in the git project and we have some other internal QA tools which use the automation layer to collect information from running systems to do basic QA sanity checks. For example that in a look system you don't have any devices which don't have drivers or you don't have any drivers loaded without devices. It's just a sanity check if our counter and device configurations are correct and it's difficult to do on a static configuration. Just put the system, collect some information by running commands and analyze that. Obviously that stuff is not finished. We started just a quarter year ago. The next actual stuff is reservation of targets for Jenkins. If I work interactively on a system and Jenkins builds the newly committed PSP is finished with that and I only have one system to test on, then Jenkins should wait for me to be finished with the board instead of just failing because the board is busy. That should be integrated pretty soon. We want to have automatic integration testing like I showed in the demo for other projects like like barebox where we want to run those just in Jenkins so that we can do testing of those complex workflows, boot into Linux, do something to the system, go back to the boot loader, boot into Linux again and so on. Driver priorities and preemption is mostly quality of live stuff so you can have not just power cycling but reset and if the board configuration doesn't define a reset, it just power cycles instead. Driver preemption will probably take some more work. It's about if I'm running a normal Linux command and my system suddenly reboots because the watchdog expires, I want to have that reported basically as Python exceptions as an unexpected state change and still have lab grid know in which status the board is now so that I can continue testing easily without power cycling. Maybe at this point we can have some discussion and questions. I think, yeah, we still have, I think, 50 minutes or so for questions. I think we have a mic. Hi. So how do you handle the interactive development scenario with console output at the same time? Is it possible to run the tool and see what is going on in the console at the same time? Yes, but it's not nice. Currently you can switch it to debug mode and you get basically internal communication between lab grid and the board, but it's not as you would expect on a zero console. To have that nicer, we need to move that to a different part or generate reports or something like that. I don't really have a good idea how that should be implemented, but it shouldn't be too much work if we have a good idea how it should look. The good use case would be like if you're writing some test, you would like to see how the system reacts. Yeah, basically you can run the test, a single test. I can try to show it to you if it works, it would be nice. So, pie test, and then I'll increase the velocity to the max. Ah, sorry. Should type in the correct window. Yeah, now you basically just get the full debug output. It's not nice to look at. That's what I currently use to debug it. You can basically see now it's booting system-day messages and then it will log in and run some share commands. So, I would want to have something which looks like the output you would get on a shell, but maybe mixed with some status messages from LabGrid, which command it is running, but that's not there yet. Maybe you could pass the mic on. Thank you. Just one last question. I wonder whether Jenkins is really the weapon of choice because I think everything really takes a long time to switch pages and everything. So, is it in there? You mentioned I think kernel CI or this kernel automated test infrastructure for visualization and everything. So, why don't you use that actually to visualize the test results? Why Jenkins? Kernel CI is basically just an infrastructure to distribute kernel, build requests, test requests to distribute the labs running lava. So, it's nice to be able for running kernel tests, but more complex tests are not possible with that. And I think the Jenkins reports from JUnit XML are pretty usable. You can also automate that via the Jenkins API, download them to some of your own visualizations, which we have done for customers. But at the LabGrid level, you're not fixed to Jenkins. It's just the easiest integration you have with PyTest. But actually, the best tool you found was Jenkins to visualize. The easiest to use, yeah. Okay, thank you. So, the showcase tomorrow, there will be also a show of LabGrid. So, if you have more questions, you can go there, or otherwise I'll be outside. Thank you.