 Hello, welcome everybody. Today I want to talk about automated testing for embedded development and how we at Pingotronics are planning our next generation of board farming. Just for a beginning, what is Pingotronics? Who is Pingotronics? We are around 30 colleagues based in northern Germany and we are an embedded Linux consultancy and service provider and we do everything from integration into your embedded Linux environment up to graphics, driver development and stuff like that and yeah because we're kind of versatile in what we do. We've got customers in all industries that basically use embedded Linux and for the 5.14 merge into the Linux kernel we even made it to the LWN statistics with just 1% of change set spot. Hey, we're still on the first page. About me, I'm Chris. I'm senior hardware developer so I didn't contribute a single patch to the statistics on the last slide but my team and I do embedded Linux remote control hardware development. So we have got the feeling that we often need devices that we can't buy and so we started to develop our own a few years ago and so I'm not there on the software side but I'm more on the hardware side and that's basically the reason why my talk is more about hardware today I guess. Why do we need to talk about bot farming? I mean if I could possibly ask a lot of people in this room and they would tell me at least a few standard bot farming software sets that they know or even use like Fuego where Timbird is really active, Lava or Lab Grid where we sponsor a lot of development and there is even a lot of more software available for bot farming. Timbird's Linux org has a good list of software if you want to take a look there and additionally to that it's mostly the same on the hardware side we have had talks from Bay Libre about their Lava lab in the box. They are doing also hardware development for embedded Linux remote controlling. We have seen a Pavel Smooks Pi here at ELC and their devices like the 3M Dab RTE if you want remote control and devices and I guess most of us who already do bot farming have their own in-house solution for bot farming and remote controlling. So why do we need to talk about it? Well because I think there isn't a one fits all solution and we are all just doing the next step while we're on it so I think it's really important to talk to each other and see what the other people are doing what problem they are solving and what issues they had on the way. So my topic is for today. I will start with a short primer. I want to talk about what is bot farming and why you may want to do it. Then I will present Pengatronics current solution for bot farming and talk about what is good and what is bad with that solution and then I want to talk about what we're trying to do next or next generation of bot farming and I hope in the end there will be a few minutes time for discussion. Okay short primer to bring this up to the same velocity. What do you need to control an embedded Linux device? Most basic thing is you will probably need a power supply and some kind of power source but another battery and a serial interface that can be RS232 or UART that can may even be tunneled via USB but a serial terminal is really important. You need to talk to your bootloader, you need to talk to your Linux. Additionally you may need some GPUs because you want to switch boot modes for example or sort the global reset on your device or USB for serial download of new images. To use IMX fast boot, Android, Android fast boot, IMX USB loader, things like that and depending on your use case and the use case of your device you may have some other interfaces. You may need to interface with an SD card. You have an Ethernet port for connectivity, can buses if you're doing automotive or automation, HDMI, DSi or camera interfaces if you're doing graphics, input or output. You may have an audio input or output or some LEDs to show your current state. And I think most of us developers tend to be a little bit of lazy so if we have to repeat steps over and over we tend to automate them and to find a way to not do these all small steps again because it introduces errors and that interesting. And at some point you will not just write the small script that does stuff but you will also introduce some hardware that allows you to for example remotely switch the power supply or remotely switch and GPIO. And if you do that and you don't have only one board on your desk but what two or three then I think you can already call that board farming because you're doing the most important thing, your remote control in your hardware. So board farming, the most important thing is you need to remotely control your devices. You need to have automated control of their state and once you have that and you don't need to manually interact with them you can start to work remotely. You don't need to be in your office to control your device and you can even share your boards with colleagues somewhere else. Yeah, next step is just running tests because well once you're there and you have remote control why not automate your testing and not do all the same steps again and again and well when you're there and have you have written your tests you can also integrate those with your continuous integration but you may already have and then it's continuous testing and once you're there it's just full-scale board farming. So I would say board farming is an important foundation for quality assurance for software development on real hardware and I would add it's even an important foundation for efficient and fast development of software on real hardware. We at Pingotronics have two main use cases for board farming. One is the interactive use case where a developer wants to work on a software component but needs to do that on real hardware. So he's changing some configuration on a build server on the source code in the configuration, baking out a new image or a new root file system and then he's trying it out on real hardware and that is done via automation that is built into a board farm. The other thing he may does is develop a test suite for those customers where we have a test suite or where we have agreed to have a test suite. Yeah that's done interactively and when the test suite is ready you can start to automatically automatically run these test suites by CI and that's what we do in the automated use case we take our test sues and let them be executed by our continuous integration without any human interaction. I would say the interactive use case is a more important one and it's the one that we do most because we are usually working on a project basis and not all our customer projects have automated testing attached to them but all projects have some real hardware in our labs that we then use to develop software on it. How does such a lab look like? This is a a newer one one of those we have have built during our home office peak time during the pandemic. It's a 19 19 inch wide server rack just without the walls and we have eight levels where we can put our devices on the tests where each level has two slots for two devices on the test and there is some common infrastructure that all these slots share directly attached to these racks so these racks just connect with ethernet to our lab network and wire a mains socket to the power supply and that's it and we've put them on wheels so you are free to move them around in your whole office but at least you can just turn them around to get a better access on the back side for example. So what are the hardware that is in such a lab? First of all common infrastructure we usually have there is a power switch in every lab or sometimes two with a lower port count and if you take a look here the power switch is hidden behind this infrastructure here super structure here but there are all these power cables coming out of it so that power switch is hidden behind the rack. 24 port power switch in the newer ones because we want to have some spare sockets in cases where we want to for example switch infrastructure on or off or where a device on the test needs more than one socket that happens from time to time. We are using a 24 port ethernet switch there those are PoE in the newer labs gigabit only and an RS232 serial server in the older racks and yeah that's missing in this newer one here so we can attach up to 16 RS232 serial ports all through the network and then use those remotely. Yeah the power switch is also attached to the ethernet I forgot to mention that so we can also remotely switch the power outlet there. The other thing in every lab is a test server that is a fanless device X64 one hate unit big and we use that for two things mainly one is we use it to attach all the USB devices we have in every lab in every rack and then use it to run the software that wants to attach the USB devices. What USB devices do we attach? Well first of all the newer labs there is no serial server so a lot of serial ports for a UART on the one hand or RS232 on the other hand but we also have logic analyzers in our labs. We have USB device ports on our devices on the tests that we use for iMix USB loader or Android fast boot in the boot loader and so yeah there's a lot of USB devices that that sum up in the end more on that a little later. In the older labs we attached CAN buses via USB and the newer labs that has moved to a mini PCIe card that is directly built into the server so one USB device the device less and our GPUs have been provided by one wire boards that are connected to a one wire bus that's an additional bus in our racks and these one wire buses are connected to the test server via USB. In the newer labs we start to move away from from one wire but a little more on that later. One more thing we have there is a Wi-Fi access point that's just a consumer to peeling device flashed with open WRT. We use that to provide a well known 2.4 and 5 gigahertz Wi-Fi access point where our device on the test can connect to and we also have a Bluetooth stick stick in the USB port of these access points so we have a known Bluetooth beacon where our devices on the test scan can scan for. Okay so far with theory about how our labs should look like this is a photo of a real lab or photos of a real lab that has been in use for about two years before I took this photo and you see it gets kind of messy. There are a lot of cables there that connect to the USB hubs and they are a little hidden actually let me use my mouse to point to which is here in the back side as in USB hub and yeah so there are a lot of cables connecting to USB there are some buses like RS 4 and 5 that interconnect multiple places because that's what the devices on the test need sometimes power supply is done or sometimes we need a continuous power supply so there are so this multi socket thingy in the bottom there and yeah that's the real thing if you don't clean up regularly. We currently have eight of these racks totaling 128 slots that are placed in our server room in storage rooms and in the offices itself so they are all built to be silent and additionally we have some 10-ish labs with 4 to 16 slots on different desks that always depends on what the developers need because the colleagues doing graphics stack development usually want to have their camera and display on their desk or the colleagues doing low level bring up like bootloader porting or developing new drivers sometimes want to poke into a device with a nostaloscope or multimeter and then they want to have the device on the desk but the desk slots are just integrated into the overall infrastructure so there is nothing special there. For our interactive use case all our hardware is controlled by lab grid so we use lab grid as a board or as a hardware abstraction in our lab and on top of lab grid we use pie test for the test case development. For the continuous testing and continuous integration part we mostly use Jenkins at the moment for the for building of images and root file systems and we are currently evaluating to move to GitLab just because it's a little more modern and sometimes you have to try something new. Yeah and our developers seem to be not that happy with Jenkins and we then execute the test sues that are written in pie test via lab grid on the real hardware so mostly special they are same infrastructure. Talking about lab grid if you want to have a look at what lab grid does take a look into the documentation labgrid.grid.docs.io or I've done a talk at FOSTEM this year about the joint tier of testing embedded devices where go a little bit deeper into how you can use lab grid to remotely control a device and how to write tests and how the infrastructure on lab grid works. Okay so far about how infrastructure is built what's good in the way we've done it. What works well is that we just have a single pool of hardware and infrastructure we use for interactive use and for continuous testing. In most cases we only need a single prototype from our customers and then we can do both interactive work and continuous testing but it's usually better to have at least a spare or even a second device that's already integrated into our lab because developers sometimes forget to unlock the lab grid places they've been working on and then continuous testing can't run at night and then you've got failed tests for no good reason and so yeah it's always better to have one device that's allocated for testing and one that's for allocated for real real work which also allows us to have two colleagues work on the same problem at parallel if we really need to. Since we share a common infrastructure for continuous testing and for interactive work everybody can debug failing tests interactively on the device where they actually failed so we don't need to debug anything in another infrastructure than the infrastructure the tests run in and every developer can use every board or every slot in every lab no matter if it's in rack or if it's in your on your colleague's desk it's all the same and that makes moving tasks between colleagues really easy or getting help from other colleagues really easy. Another good thing here is since we can never rely on the state of a device for continuous testing we always provision our devices from scratch so tech for example colonelci that relies on a working pre-flashed boot loader on a device and then uses that boot loader to load the kernel and user space from that but we can't rely on boot loader because we never know what state it is in when we run to run tests and same for interactive work we never know in which states continuous testing has left the board because you never know what went wrong at night unless you have looked into the results before and so you can even use the provisioning from scratch for the interactive use and that tends to yield more deterministic results for your work. Another thing that turned out quite well was that our labs are built to be versatile we do not rely on a specific connector or a specific form factor of our devices on the tests we couldn't even do that if we wanted to because our customers have their own requirements and have to fulfill those so we are kind of a second thought and after thought there so yeah labs have to be versatile and until now I guess we have placed all but one hardware in a lab in one of these racks and this one thing that didn't work out was just too bulky too large and so it ended up on a shelf next to a developer's desk but it's just integrated into the lab on the developer's desk so it's still integrated into our overall board farm. Okay what are problems in our current lab? One problem is that devices are connected with a lot of cables. Think about the the primer I gave you need power supply, you need a serial port, you may have GPUs that are connected, you need USB for serial download of new images, you have Ethernet, you have an SD card, you have graphics connections and so on and I'll just stop because I run the thing as shortly so you see there's a lot of connections going to a single device on the test in a lab and that tends to be a problem. Two things here one is if you're just working on an adjacent slot in the lab whether one above or two down or something like that you sometimes disconnect things on another slot without recognizing for example if you're just connecting or reconnecting a new USB port you may move another USB connection and that's not that stable for example and moving a device to another lab is ever prone because you have to carefully note down all the connections you disconnect then take it to another lab and reconnect it there and that tends to take longer than you would imagine until it's really running again. Another set of infrastructure another set of problem is that part of our infrastructure are black boxes for example the serial servers we had a case where we had serial servers that were missing characters just you toggle the RTS line somewhere in the transition between the boot loader and the kernel this happens and even with flow control off some characters after this toggling of the RTS line I'm missing that was mitigated by firmware fix sometime later. We have had power switches in the past that do not support IPv6 so it's always kind of a special case in our lab we're having IPv4 in our lab net there so that's not a real problem but our IT guys were really like IPv6 everywhere and we had power switches that were just not responding to our commands like they're responding to pings and they're responding to an API calls for switching but not doing the real switching so someone has to go there and repluck the power switch we've just swapped that part with another one with a spare and then it worked again so these are our problems that are solvable but are not nice we have a workaround for every problem of this class but yeah as I said they are not nice and we do not have control over these devices these have been problems and now expensive problems USB I mean USB is easy to use and it's widely available you can find like a lot of USB devices that solve problems for you just just a few out of our lab say from from top left beginning that's an USB to can interface if you need need the can bus connected to a single device on the test you sometimes need a JTAG to for example flash a CPU or recover from some some deep failures or do debugging logic analyzers if you want to trace out some some logic signals RS232 and you are interfaces because well we just need them USB SD boxes if you want to be able to remotely control the content of a SD card and then push a reversal test from that and well a USB mocks another device we have we have built is what basically the USB SD mocks but for USB sticks so yeah a lot of USB devices in our labs and it's just like USB is everywhere and I can really feel woody looking at all these USB connections I would say USB is a bad idea but that's always easy to say in hindsight USB gives us a ton of stability issues we had to find out that a lot of USB devices have strange bugs like USB hubs disappearing from the bus at some point without any noticeable problem and we just reconnecting them brings them back we have seen USB serial adapters that are still accepting USB addresses but do not respond to transfer requests and all these USB problems are usually hard to debug because you're not sitting next to your board you have to remotely guess what's the actual problem on the other side and these problems often need remote hands or somebody that's really really physically on the lab to recover because you have to replug a USB device and these problems tend to affect neighboring slots in the lab because when one USB hub or a part of a USB hub is gone and well there are just a lot of USB devices failing and these problems really cost a lot of time and thus cost a lot of money and a lot of developers' nerves okay what are potential solutions for this problem one thing is we want to get rid of a good amount of these black boxes in our lab especially the power switches and the serial servers are usually a problem Ethernet switches seem to be more reliable maybe because they are built in larger quantities and are better tested and we want to drastically reduce the size of our USB buses so we want to have less USB devices connected to a single USB host and we want to have less ground connections between adjacent slots in a lab where a USB hub spends multiple slots and same for for example the serial connection and Ethernet so yeah that that's a problem and another thing we want to address while we are there is we want to have less cables that run from the infrastructure to the device on the test okay what's our idea to do that the idea now is we want to have a single test controller that serves one device on the test and we want to build that test automation controller in a way that it serves around the 80 use case so we do not want to cover everything we found in our find in our labs but at least the most part of what we find in our labs with a single device and we want to build it in a way that is far more robust than the current setup and more reliable in that and we want to build it for everyday use so I think that mostly means it has to be built in a housing and not just to be a green PCB you put somewhere and needs to have ESD protection needs to have a software updating concept and stuff like that how does it look from the architecture side well as I said you just take a test automation controller per device on the test and we built these test automation controllers around an RM SOC and of course we run embedded linux on that and all the software we need to remotely control the device on the test and that's mostly lab grids and the IO bus IO bus is a can-based bus we use for additional devices we want to attach to our test automation controller for example additional GPIOs we need for device under test or analog measurements and stuff like that we connect these test automation controllers with a single single Ethernet port and we use PoE for power delivery there and connect that to the already existing PoE switches in our labs and we integrate a Ethernet switch into our test automation controllers so we can share the same physical cable running from the switch to the test automation controller to also supply our device on the test with Ethernet connectivity we add a few GPIOs to these devices because you always need a few GPIOs it has a logic level UART interface and and concept how you can adapt that to for example RS232 or there's a CAN bus that you can use for CAN and CAN FD connectivity to your device on the test one CAN bus we can use for the IO bus so extending our capabilities we have integrated a power switch into our test automation controllers we use that for switching of not mains voltage but the lower voltage the device and the test actually need and we've got some USB ports, host ports and a device port on in our architecture how does it look like in reality well we already have got some prototypes on our desk so this is our test automation controller the bottom board the lower board this one here is the board with the CPU knocked out systems some here and the Ethernet switches integrated here and everything we need for the device on the test connections those are mostly in the front-facing side of the housing here and the other board is the power switch that's mostly empty PCB it's just well everything you need to switch power on and off yeah what's our state with that we're currently having the first revision of the hardware on our desk but we're still actively developing software so everything we need in our in our operating system to actually get it working we are planning to validate our concept and our devices on the test this fall after software development obviously and we really want to use these devices to replace infrastructure in our in our labs in our racks there and then we'll see if our concept works out or well what new problems we have bought by just changing so large quantities so much so large parts of our system yeah and I guess that's that's where we are currently thank you for for having me and now I guess we'll switch to the interactive part with a room for questions and hopefully for discussions I would really like to know what your experiences with board farming are what you've tried and where you had problems and what you've tried and what worked out quite well so yeah see you in the discussion