 Okay, sorry for the slight delay here. I had some issues with my laptop connectivity. But welcome to this session. My name is Tim Bird, and I've been giving talks and stuff at ELC for a long time. But today I'm going to talk about testing. I have recently started, well, I recently became a maintainer of the Fuego test system. And I'm a senior software engineer at Sony Electronics. And so I feel a little bit bad because it's taken me away from some of my kernel work. But I think that testing is really important. So here's a basic outline of what I want to talk about today. I think that there are some problems in terms of the test ecosystem, open source ecosystem around testing. I'm going to talk specifically about three different test frameworks. There are lots of different test frameworks, but I chose these three. And really the heart of the talk is the last half where I'm going to talk about what I consider to be attributes of a good test. As the maintainer of a test framework, I see lots of tests. And some are really easy to deal with, and some are not so easy to deal with. And so my hidden agenda, not so hidden, I guess, is that I'd like to kind of help you understand from a test framework standpoint what are some of the things that make it easier or harder to run your test for someone else to adopt them, to use them, to automate them. And so I'm going to provide a whole bunch of tips. And then I've got some pointers to resources. So that's the outline, so let's just get right into it. So what are the problems that I see? Well, I don't think that there's enough sharing going on. There's an awful lot of test frameworks. And there actually are some tests available. There's a Linux test project, case self-test. There's lots of benchmarks out there, kind of standalone programs. And those are all open source. But when companies actually release a product based on Linux, there's a whole lot of testing that they do that is not based on open source software. It's based on custom in-house software. And the question that comes to mind is, why is not more of this stuff shared? Part of it is that for a lot of the custom stuff that companies write, there's no upstream for it. And a lot of companies are not going to go out and create their own open source projects. And the other aspect of this is that a lot of these tests are written to custom in-house test rigs. So if you're testing, for example, just at Sony, we have USB is in almost every Sony product, cameras and TVs and everything. And so we have a whole big internal setup to test USB compatibility. Well, that has custom hardware. And even if we were using off the shelf hardware, there'd be custom setup involved. And so that USB compatibility testing that we do, even though we have it automated inside Sony, it's not really something that we could put out to the world, because it's not based on any standards or anything like that. In particular, the interface between the DUT, the device under test, and the test system, and the test itself, none of that is standardized. And I think that those standards are some that we're missing. I'll hopefully be able to circle back to that. In terms of existing test problems, if you look at the tests themselves that are out in open source, a lot of them have a big learning curve. They generate a lot of false positives. And quite frankly, there's a lot of useless tests. As the open source kind of moves on, I'm actually going to talk about each of these. So the learning curve. So for any particular test that you decide you want to run on your system, the QA developer has got to learn how to build it, how to run it, how to install it. Sometimes tests are integrated into things like the Octoproject. In fact, I think the Octoproject has a package for LTP. So that one's relatively easy to get up and running. But if you go and look at you want to run something like Ciclic test, or you want an XFS test, or one of the CVE testers, none of that stuff is available really readily. And so as a test developer, you have to go learn all this stuff. And then you're going to need to customize it for your environment. There may be certain things that you're concerned about. Well, in the case of Ciclic test, every board has different characteristics. And so what may constitute a pass for you will be a fail for someone else or vice versa. And then you have how to interpret the results. OK, so you run LTP. And it's very, very common when you run Linux test project, LTP, to get 1,056 passes and 45 fails. And it's like, well, OK, great. What should I do with that? Do I report these fails? Are they real failures? Are they some problems with my configuration? You don't know exactly how to interpret the results, what to do with them. What developers need to learn in dealing with these tests is if you want to report bugs upstream, or if you want to find out if these are real bugs, you need to reproduce the results. You need to have third parties reproduce it. Maybe a kernel maintainer. You want to have him reproduce the same problem. And you need to report these issues upstream. And these are all things that you have to learn how to do. It's a big hill to climb. In terms of false positives, a lot of tests have bad or missing dependencies. So the LTP tests often don't do a good job of checking for dependencies, for example. So some of them do. But a lot of tests just kind of fail. And you don't know why. And one, I spent like a week chasing this bug that had to do with it was known to not work on a particular kernel version. And there was no document anywhere. And it didn't check the kernel version. And so you can get these failures that come out of nowhere. And you have to go basically figure them out yourself. A lot of tests are too sensitive to the test conditions or the environment. They're not very stable. And so you get on your visualization, you get what I call blinky lights, where a test fails and passes and fails and passes. And it's intermittent. You can't really pin it down. Those are really a pain to track down. So sometimes it's extra load on the machine. Sometimes it's bad network, bad flash, some server unavailability in your test environment. And then there's a whole category of tests that are kind of useless. And there's kind of an extreme of tests. There are tests that are way too simple that you just know is not going to fail. You don't need to test the open syscall, probably, because it's being used in a bazillion places. And bazillion is actually the right SI unit for that. So open's not going to fail. But in LTP, there's an open syscall tester that's like, OK, great. And some test conditions will get exercised just by booting the machine. You know that the machine is capable of opening a file if you can run your test framework. So you have things that test things that are really unlikely to fail. Some things are super, super rare. They're just so unexpected that they would fail that it's really a waste of your test bandwidth to be executing cycles on it. So it's kind of more cost to execute than it's worth to find a bug. So what are the solutions to these problems? Well, I would like to see a test ecosystem where people can actually start working on tests that have been developed by other people. We can use the open source effect to develop a good body of tests and a good body of useful QA materials. So we need to have tests that are well documented. We want to make them easy to automate. So that means handling building and installation automatically. We want tests that are robust that can handle these dependencies and skip problematic tests. And most importantly, I think we want tests that are shareable with others. So they need to work in a lot of different scenarios. They need to work on lots of different devices. And it should be easy to customize them. So you shouldn't have a lot of baked in assumptions in the tests. And I promise get to some actual, these are kind of high level, cloud level suggestions, 100,000 foot. I'll get to some specific things that I think will help with that. Before I do that, I'm going to take a little bit of a side tour, and I've got to figure out my timing here. So this session is a 40 minute session. If I end at 12.30, is that right? If you don't tell me, I'm going to go over. OK, so 12.40, I think. So it's a 50 minute session. OK, I just need to pace myself. So I'm going to talk about three test frameworks. LTP, case self test, and Fuego. And so let's just dive into those right now. So LTP is a big umbrella project. It's got a whole bunch of tests in it. It provides helper functions for setup, results reporting, and cleanup. It was founded a long time ago. If you look at it, it's mostly, not exclusively, but mostly C and POSIX shell tests of kernel and core system functionality. There's no benchmarks in it. It's really a functional test. It has lots of tests. There are over 3,000 tests. And they're in three broad categories. There's the functional testing. There's actually the POSIX test suite was actually swallowed into LTP. And then they've got a bunch of real time tests as well. It's kind of hard to assess the coverage. It's not done in a formal enough way that you can say, oh yes, we've tested every single item in the spec. Well, maybe the POSIX does. But the functional testing is really kind of scattershot. And new syscalls keep showing up in Linux. And so the static syscall is a new syscall relatively recently. And you don't know how much of its behavior is kind of tested by LTP. So it's kind of hard to know. I mean, you know if LTP shows you a bug, but you don't know what you're missing, LTP historically has had a very heavy focus on testing error conditions. And so basically it looks to see, if I pass a string that's too long to the mount command, does it give me the right error? No. That type of stuff. And it does include a little mini test harness, a really lightweight test harness. So tests can be run individually, or they can be run in groups, or in stress configurations. I don't think a lot of people do this. Maybe there are people out there doing this that I'm not aware of. But there's something called LTP pan. I have tried to find out what the pan stands for. Someone knows. Let me know. Just out of curiosity. But LTP pan is a little command line tool that allows you to run a series of LTP test programs. So each program in LTP, there's a separate individual compiled program. And it will run a named collection of these. And it can do them repeatedly. Can run a bunch of them in parallel. And for a period of time, you can customize the command line parameters. It's pretty flexible. And this is, I think, intended to be able to lie to do stress testing over a long period of time. The thing about it, though, is that I don't think you're going to actually find open returns of different air know if you run it a million times versus 100 times. So to some degree, I think things like SysKiller or some of those other ones, there are other stress tests or fuzzers that do a better job of testing that type of thing. And then finally, there's LTP Run, which runs groups of tests. And there are many, many groups to find. There's about 80 different groups. So there's syscalls. You can test input, file systems, networking, math, NUMA, all these different groups of tests. So you don't have to test everything at once. But you can test defined groups. The output of this is one of the strengths of LTP is its output. It's got a very kind of rigorous output, at least for the regular functional-type tests. It turns out that each of those three groups I mentioned, the POSIX conformance and the real-time and the functional tests, each of those has their own different output formats. The functional test, at least, is all kind of regularized. Because they've been written to a common framework. And so you get a nice schema, a consistent set of text strings indicating the different conditions, the results. And then you get additional metadata when you use their test harness, like the command line that was used, the duration of the test, system times, that type of thing. So I thought I'd just show you really quickly a couple of quick things about an LTP test. I don't want to dwell on this too long. The slides will be online so you can go look at it. But this is from the U-mount O2 test. This is what the output looks like from this. You can see that each line, most of these lines, I don't know if the laser still works here. Oh, it does a little bit. So there's some information lines as the test is doing test setup. And then these are the actual lines indicating the specific tests that it checked for. So it checked for already mounted or invalid address, directory not found. And then it has a summary of the test down here. So if you actually go look at the code, there's some nice things that the code for the U-mount is actually fairly simple. And I'm not actually going to show you the code that does the real testing, just some of the stuff surrounding it to give you a flavor of what it's like to write a test. In this case, so we have these safe macros. That help you do some of your setup in a controlled fashion. And LTP will, if something goes wrong, will help you back stuff out and clean up for you automatically. And so you write a setup function, you write a cleanup function, and you try to make these mirror images of each other. In fact, I think I say this on the next slide. So you clean up in opposite order of resource allocation, fairly standard practice. And you can use, there's a whole bunch of helper functions that start with TST underscore. And there are a lot of them to handle common operations. Like setting up, you know, like we just saw, like setting up for mounting file systems or for setting up for system calls or creating temporary files or that type of thing. If you look at the test itself, the main, this is from again, from part of it, the main body of this, there's a set of test cases. This struct key case, this is specific to the Umount02.c file, but it just has a list of, you know, the description of what it's gonna test, some extra parameter here, and then what, what Erno it's expecting to find. And so if it finds the, finds a particular Erno, it returns a T fail, and it's using TST underscore res. So that's test result. So that is using a library function to get that common output format. And then if something fail, if it passes, then you print out this material. If it fails, you print out some additional material. So verify Umount is the main test routine. In this case, it's called with the subtest case number, and test res is used to report results. And the main, in terms of looking at the test, LTP test API, the main thing is this struct test test, or TST test, which has the test ID, account of the number of subtest cases. And then by specifying a couple of variables here, need root, need tempter, the LTP system will automatically create those for you. So in your test program, you don't actually have a main. There's a main somewhere else that's handling some of this automatic setup and tear down for you. And then you specify some function pointers. So it's a little bit object oriented, but it's in pure C. So you just declare your setup function, your cleanup function, and then the actual test routine that's gonna get called, in this case, in a loop because you have more than one subtest case. And I think I just described all that was on this page. Okay, so that was my, well, it's under five minute introduction to LTP. So you're now all LTP experts, or at least as much as I am. And there's a really good couple, there's some good resources. The documentation for LTP is not fabulous, I gotta tell you. But it is, there are tutorials online and there's obviously lots of code examples. And there's a lightning talk actually just at Fosdome, just like a month ago or on introduction and status of LTP that's worth taking a look at. Okay, so LTP, conclusion. It has a lot of support for writing a good test. It needs more tests to help keep it stay relevant. Please go out and use it. Please add stuff to it and fix anything you find that are broken. The more people that use it, the more that we can create an ecosystem around it. Like a lot of open source projects, it's got a core of developers that may be about four really key developers and then kind of a peripheral set of developers. The more people that add to it, the more benefit that we can get as a whole industry. So here's a couple of ideas for projects. If you just have so much idle time that you're just wondering, what should I do on my weekends? Well, you should go out and work on LTP, of course. That seems obvious. Okay, so moving right along, case self test. So case self test is a whole different thing. It's the kernel unit test framework. It's inside the kernel source tree. It supports both local execution of tests or remote installation. So you can build a tar file that you then go and manually install on another machine or another device. And it can cross compile just like the kernel can be cross compiled. And you can select on the make line when you make this stuff, you can select an individual set of tests to run. Again, it has groups of tests that are in directories. There's about 52 directories, 350 different source files. And this is where the kernel developers themselves put their unit tests or where they're supposed to put their unit tests. Not everybody's on board yet, but it's kind of gathering steam. This is super convenient if you happen to be a kernel developer because it's right there in your source tree. If you type to make to build your kernel, all you have to do is type make case self test to test your kernel. If you happen to be on the same machine. There's a couple of extra steps if you're testing a different machine than the one you're on. But case self test is fairly primitive. It does not provide a harness. Doesn't provide any helpers for set up or cleanup. And it's really ad hoc. So people, this is basically, the kernel community has started to gather their own unit tests that were outside the kernel and just kind of have jammed them together. So they're trying to migrate now towards a common format. Right now, every test looks different. The output is different. I was gonna show an example, case self test. But there's really not a good example. I have a test on size in here, but I didn't want to, that would be gratuitous for me to show my own test. Each test is different. So there's really kind of no canonical example of the API because there's kind of no really API. There is a little bit of an API. Each run is written from scratch. So what's going on lately with case self test is they've been trying to convert to the tap format for the output format. So tap stands for test anything protocol. It's a very, very simple one page, one page specification for how the output of a test should look. Not should as in ought, but if you're following the spec, this specifies what the output is supposed to look like. You can see it's very, very simple. It's line oriented. Each line starts with either okay or not okay and then has a test number. So there are some helper APIs in case self test for tests to produce this output rather than their own ad hoc output. And people in the kernel are starting to migrate to this format. So if you use KSFT test result pass, it'll output your message in tap format, which is good. That's a good start in the right direction. And then here's some resources if you wanna get started with case self test. A couple of tips if you happen to go down this route. Don't assume that you're building or running on the latest version of the kernel. I see this a lot in case self test. So don't rely on features of the current kernel version. Try to make your test backwards compatible because well, I'm preaching the choir here because we're all embedded developers. I know you're all, most of you are not running like the 4.16 kernel, right? So this is a plea to the guys who are working on the 4.16 kernel to please throw us a bone when we're working back on 318 or 4.4 or whatever kernel we happen to have in our products. So check for, if you're writing tests, try to check for dependencies at runtime notify the user if they're not filled. Check if you need to run as root user, please check for that. Don't just silently fail in weird ways and check the kernel configuration for required configs. Okay, so Fuego. So this is near and dear to my heart. You might think I'd spend a whole lot of time on this. I'll spend a medium amount of time on this. But Fuego, just if you haven't seen this before don't know what it is. It's kind of a host test distribution plus a bunch of tests and test wrappers packaged along with a Jenkins interface all inside a Docker container. It is intrinsically host target, okay? So the enterprise guys and the cloud guys, they're all lucky, they can develop their software and test their software all on their development machines if they want, but we often have a big disparity between our development machine and our products. And so we need an environment where we can cross compile and drive the testing from a more powerful machine than the one we're testing. This is an analogy I just barely came up with that Fuego is like the Debian of QA software. So it's more like a distribution than it is a test harness. Well, it has distribution like attributes. So there's a bunch of tests in Fuego and each one has kind of its own little package that tells it how to run, how to install, that type of thing. Right now we have about 150 different test suites that are included in Fuego. And these are not, some of them are Fuego-specific tests but most of these are wrappers around existing tests. And I think it's on the next, yeah. So it's more like a packaging system than an individual test, a Fuego test. So the Fuego test.shell, which every test in Fuego has is a wrapper for building, deploying, running and collecting the results for a test. And then you can also provide a parser. So whatever weird output format that test has that you're trying to wrap, you can collect that information and put it into a standardized output format. We can apply a pass criteria. Again, the pass criteria can be shared. The way to customize the tests can be shared. And that parser allows us to collect that individual test data. I'm not gonna go into great detail. This is the architecture diagram. We run a bunch of stuff. We have Fuego scripts on the host inside a Docker container. We've got a web control interface for starting tests or monitoring the results, doing visualization on it. But all of the real action happens over on the right there on the target board where the test program. So we do building, we actually build from source and some people don't like that. But I think it's really important and when I get to my tips section, I'll tell you why. But we build the test from source, we put it over on the target, we execute it, collect the results back to the host, analyze the results, parse it, and then present it in the GUI interface. So, oh wait, did I go the right direction? Oh, yeah, so like I said, a Fuego test is usually a wrapper around an existing test. Some examples are like IO zone, LTP, Bonnie, Iperf, Drystone, Ciclic test. So the real time test, file system tests, networking tests. Those are all existing tests that we just have wrappers for that kind of allow you to hook those into our system. So you can actually, if you want to, go out and write a new individual test, something that you wanna test on Linux and put it into Fuego. And usually, if it's simple enough, you can actually put it in the FuegoTest.shell script. But a lot of times it's easiest and it's more clear if you take the material that's gonna run on target, put it in a standalone shell script or standalone native program, and then launch it from Fuego. And then a test consists of, like I said, this FuegoTest.shell and parser.py. And then there's a bunch of other files that you may, as you start to customize the test, get into writing. This is Hello World in Fuego land. It's very, very simple. We specify the tar ball where you get the Hello World program from. You can also have a git reference there. And you do a pre-check to make sure that there's a variable defined. This is, the pre-check function is where you would check for dependencies. And there's a whole bunch of helper routines for checking different dependencies. Whether you need to be root, if you have certain kernel configs, that type of thing. This one is checking to make sure that this variable is defined. And then the build function is make. Hello World doesn't need to be configured. Well, GNU does, but this one is very simple, Hello World. The deploy function just uses put command, put the hello onto the board, and then we run it, and we collect the results with the report function. So we're gonna cd into that directory, run Hello. And this is the first time I've noticed there's a typo in there. Hello and pass it that argument that we checked for. And then the results, Fuego collects the results for us, and we do some log processing on it. We're looking for the single word success. This is not your standard Hello World program, it also prints at the end, success. So it's very simple. I mean, you can understand what's going on here, hopefully. In terms of output, every test in Fuego produces a run.json file. So the test metadata, the logs, and the results are either referred to from this file or embedded in the file. And then we have a result schema that's very much like LTP in terms of pass, fail, error, or skip. Okay, so Fuego Advocacy. This is the part where I convince you all to write Fuego tests. Actually not. If you have, don't write your device under test program in Fuego. I mean, I don't really care if you write a Fuego test. Actually, I mean, if you want to, I'm not gonna stop you. But I'd rather, if you have a new test, put it in LTP or in case self-test because Fuego runs both of those, so I get it automatically. And then you can use some of the helper functions and features of LTP or case self-test. If, however, and the whole industry benefits, not just the Fuego community, I should be more dogmatic about driving people to Fuego but do something that can be shared with the widest number of people. I will benefit if you write it to LTP, so I don't mind saying that. If you are writing a multi-node test though, here's the problem. That's where things start to fall down. There are no standards for writing multi-node tests. So Fuego supports host client operations, so we can do some multi-node stuff. So we have tests that do serial port, assuming you have a serial connection to the target, which a lot of times you do in your board farm configuration. We can do network, obviously we have network connections to the target, but if you're doing things like testing audio or testing video output, that requires additional hardware that's capturing that or you're doing like USB testing, you need to toggle the USB bus to test connect and disconnect. That requires hardware that's not, there's no standard interface for that board control. And this is something that's really lacking in the industry and something we've actually set up a mailing list based on discussions we had at Embedded Linux Conference Europe. And we are trying to work through defining some of those standards. And I'm hoping that, so I put it on this page, so it would kind of commit me to doing this, but I'm trying to put together a board control summit at Plumbers next year to talk through a lot of these issues. There's a lot of different groups that have different systems. Lava has their own system. There's a group called LabGrid. R4 control. Anyway, we'd like to get together and actually confront this because multi-node testing is kind of where it's at. If you're gonna test drivers, well, the drivers are doing something. You can test file system drivers because that's a local operation, but most of the other drivers are bus drivers or panel drivers or things like that that are controlling hardware that you need to control remotely if you want to test it effectively. Okay, Fuego Resources, how am I doing on time? Oh, no. Okay, here's my scorecard. I'll let you look through the scorecard at your leisure and decide which of these you want to use. I already kind of gave you my tips on this. And in terms of choosing a framework, if you're doing white box testing in the Linux kernel, I think you should use K self-test. If you're doing black box functional testing of kernel behavior, use LTP. If you're doing benchmarks, don't write your own. Go out and extend one of the existing ones. There are benchmarks for basically every kind of category of stuff. So you've got file system tests, you've got memory tests, you have network tests. So just go extend one of those or figure out how to customize it. A lot of these tests have a ton of options and you just need to figure out which options do you pick to do the tests that you want to perform. And then if you're doing dual machine tests, of course, use Fuego. I'm gonna get back on my advocacy horse. Because we do support host target operation and we're working on a control API. Okay, so now the second half, which is ostensibly the reason you all came, which are my tips for writing good tests. Okay, so in four broad categories that I'm gonna go into in detail, produce good output, make tests universal, avoid false positives, and for heaven's sake, test something useful. And I will now tell you how to do those. So the six elements of good test output are intrinsically you wanna have a test case identifier. You wanna have a description. You wanna actually have the result, that's important. It's nice if you can include the behavior, what you expected versus what you saw. A lot of tests leave that out, right? They just say, oh, that failed. It's like, well, what failed? Now I gotta go read your source to figure out. And then if you can, if you have a heart in your soul, tell us how to interpret the results. What does it mean if this doesn't work? Should I be worried about it or is this like no big deal? The other thing, it's important to, in your output, distinguish results from errors. Every, if you've done testing before, you know that things can go wrong besides what you were testing for, right? And so you need to distinguish that because if the network kind of botches up in your test lab, everything will go red across the board. But that's not actually a failure, right? That's just an error, okay? So that's the nomenclature. The nomenclature is the result of a test is pass or fail. If something has gone horribly wrong that's unrelated to the test, that's an error. And all of the test harnesses that I'm familiar with distinguish those and that's really important. Because you don't wanna, it's just confusing. It happens all the time. In fact, my Mino board just went down in the lab on Friday and boards go down because I don't know, flashwares out or the network goes sideways or something. And so I had a whole bunch of tests but luckily I could tell the difference between Mino board being not reachable and open syscalls suddenly returning the wrong error. Okay, for test output, make the test results machine parsable but human readable. You want both of those attributes, okay? You should use unique strings for the test results outputs. Something that's not gonna be in normal English word, use that. If you want, we have words already. Use T pass, okay? Or not okay, which is the tap one. But use something standardized. Don't just say, don't write my open syscall, went sideways. Okay, that's harder to grep for. You want, optimally, you want to be able to grep for this stuff. So you want unique strings for the results output. You want a common result schema, okay? Again, you don't have to make up your own. There are schemas out there. Just use one of the existing ones. Use unique and persistent test case identifiers. I've got a whole slide on that. Use line-based output so you can grep it. And the results exposition. So a lot of times you want to explain what's going on with the test. Either put it all before the result or put it all after the result. But for heaven's sake, don't sprinkle it before and after results. That is like incredibly obnoxious to parse. And in some cases, it's impossible to parse, right? Yeah, the parser has to understand all the possible strings you could be outputting. So the general idea is that you want to use your result line as the marker between test cases. So it's a unique enough string that you can identify when a test case either ended or started and use that to obtain the result. This makes the parser so much easier. We've had to write obnoxious parsers. We don't like it. Please help us out. Test case identifiers. Don't just use numbers. The number of tests where, you know, it's like, this happens a lot. The number of tests you see where the result is, you know, one pass, two pass, three pass, four fail, five pass, six pass. It's like, oh, great, five. Well, that's helpful. You have no idea. A lot of times you'll get some exposition after that. But if you have a test case identifier that is kind of human readable, that's a real string, and that is unique and persistent, then you have what I call a T-guid, a test globally unique identifier. Now, what could possibly be valuable about that? Well, the great, the value in that is that now I can take my test results and I can compare them to yours on the other side of the planet, right? So if I want to find out why hypothetically, not hypothetically at all, I notify 06 kept rebooting my Beaglebone, right? Then I can use that unique string and say, hey, someone over there with a Beaglebone, what does I notify 06 do on yours? And it turned out to be a sub case of I notify 06 that did that. So the idea is to create essentially a namespace for these test identifiers that is global and unique. And then as we share test results with each other, we can actually collect data and find out the answers to our questions without having to run everything ourselves. I don't know if I explained that very well, but this is one of my pet peeves. Please make sure that your identifiers are persistent and globally unique and string-based if you can. I don't like numbers. Numbers don't help your users find problems. Another thing about tests, please make your tests universal. And this means, unfortunately, you probably need to limit the languages used. So you see tests written in Java, you see tests written in Python. Well, I can't use those on a lot of machines, right? If I don't have the Python interpreter, I don't have Java. So it really comes down to, and this is what LTP kind of settled on, although LTP has some exceptions, some tests in Python. You're either talking about a native program or a POSIX shell program, okay? No bash-isms. Don't assume device under test capabilities and use minimal resources. I'm gonna expand on those. So compiled language, almost everybody is writing their low-level tests in C. That's the most common denominator. I know it's 2018, you'd think we could get beyond C, but you just know that C is gonna be supported on your board. Provide source, not binaries. Do not provide binaries, because we can't put that on other platforms. Make the source cross-compileable. Don't assume the architecture is either 32 or 64-bit or anything else about the architecture. Statically link, if possible, to avoid dependencies on libraries. In terms of the shell, some of the things that you can do is use POSIX features onlys. Do not use bash-isms. There's actually a tool in Ubuntu. You can install it. It's probably already on your machine called check-bash-isms. And it will give you, it'll do a line-by-line analysis of a shell script and tell you what is unsupported in the POSIX shell standard. And then, of course, go get rid of those things. Because on these tiny, tiny devices, what are we all running? We're either running busybox or toybox. We don't have bash. I'm not spending 800K on some bash thing. And then if you have another interpreted language, I'm not saying you can't do an interpreted language, but if you have another interpreted language, provide the VM for it as part of your test. So like if you're doing something in Lua, or if you're doing Python, maybe you can get away with having MicroPython and just ship that over to the board with your test. But you have to limit yourself in order to make it your test as widely applicable to every possible scenario as possible. I'm actually, I have a side project that I'm trying to see if I can get Fuego to run on NutX, right? I want to run it on a non-Linux POSIX-S-H-O-S and see how many of these tests can run. There's some problems. We have some dependencies on Proc and Sys, but if we can get rid of those, most of this stuff should work. Use minimal resources. Avoid dependencies on things you don't need. For C programs, you can limit the usage library calls, use a POSIX subset, and it depends of course on what you're testing. If you're testing for font stuff, you can't, you know, you got to test actually the font libraries, but OSKit, it turns out, defines a good subset, minimal C library subset. You can ignore the weird parts of the memory allocator and on that resource, but assume minimal OS features. So for the size test that I wrote for KSELF test, it uses two Sys calls, which I thought was important because that's the whole idea, the size you might actually be eliminating most of your Sys calls. Well, three I think, because it used OpenRead and there was one other one. So it's important not to assume that something is there in order to make the test as runnable in as many places as possible. For shell scripts, the same applies, except now you're talking instead of APIs, we're talking external commands. So don't go using esoteric weird external commands. This is my recommended minimum list. This is the minimum list that Fuego has. Just about any busybox capable system is gonna have these. And then if you can limit the use of proc and sys, because someday maybe I wanna run your test on NutX or Zephyr or something else. Detect dependencies when you have dependencies and you will have dependencies on something or other. It's a good idea to detect the dependency during the test. Don't just fall over because something's missing that you expected. So that means you have to probe the system and hopefully abort early with a message. Don't just exit. It's much nicer to have some message about what's going on or what's missing. And this is the fourth part of the kind of the schema of results. Don't report pass or fail or error. In this case, if you're missing a dependency, the industry standard is to report skip, okay? And so that is kind of an extra indicator that what happened was different than expected. So skip and they have a different name for it in LTP. They call it TConf, which is configuration error. But basically you're skipping something because what you wanted to test was not really available or not configured. And then another thing with dependencies is allow the user to specify what tests to run. Sometimes if you know that there's a whole class of machines, like say 32-bit machines that your test is not gonna run on, but the rest of your test case is valid, well, let the user select that. And then, so support skip lists or be able to auto handle skip lists, these even the better case. But don't make your test case identifier change numeric values if you skip tests. Okay, don't assume the capacity or speed of your device under test. So don't hard code loops or sizes. We get a lot of test failures in Fuego because boards, we're running on a wide range of boards. We're running on boards that are only 100 megahertz, boards that are gigahertz. And we actually have, surprisingly, we have a lot of tests that fail, or at least a couple that I can think of that fail because the board is too fast. It goes so fast that the test assumes something has gone wrong and reports an error. So try to automatically detect your loop sizes, probe the capabilities, consider doing a calibration run if you need to to figure out kind of what machine you're on. And as a last resort, so all of that is kind of the automatic way that's the user-friendly way to do it. If you just can't get around it, at least let the user specify on the command line and then they can control what kind of parameters. So, but test parameters are a pain to deal with. That increases, that decreases the usability of your test if you now have to know some command line option to run it. So in terms of making tests reusable, you wanna make tests so they can be used on lots of different things. The secret here is to parameterize your tests, what I just was talking about. And also to allow for external results criteria. Okay, and what does that mean? Well, in the case of like LTP I was mentioning where you get 1056 passes and a whole bunch of fails, well what will happen is if you submit that report back to your management, your management's gonna say, why are these tests failing? What is, you know, we gotta go fix that. And it's like, well, some of those we probably don't need to fix. They're either errors in our test system or they're just because the test was written badly. And so when you run these huge systems, you wanna have a way to specify, well yeah, I don't, basically I don't care about some of the results. Let's see, and in the case of benchmarks, every benchmark is gonna depend on your board and on the flash you chose and on the network hardware that you've got on there. So benchmarks have to be customized. If you're doing pass or fail on the benchmarks, you have to allow the user to specify what the threshold is for success or failure. And Fuego does that through the criteria JSON file. Okay, so, parameterizing tests, if you're gonna do it, this should not be used, this allows people to adapt your tests, usually done through command lines. Try not to use environment variables because just, I won't go into the details because I'm running out of time here, I've got one more minute. But don't use environment, use command line arguments and document your parameters. So, this is all kind of obvious, test automation. You should try to make a test that's automatable, use standard build tropes, make stuff deterministic. Let's see. Okay, test robustness. This is basically just talking about if you're making a robust test, what are some of the things I just talked about that help make a test robust, checking for dependencies, creating the needed resources at the test time, and then tuning for the device under test capabilities, handling errors gracefully. And then finally, test something useful. This is the last idea, really, and I promise, well, I was gonna promise I'm not gonna go over, but that ship has sailed. So, the idea here is test behavior that your program relies on. That is stuff that would break your app if it changed. So, if you're relying on some behavior of the kernel, a sysfs file path, or the way a syscall behaves, or the way a library behaves, test that. Don't just blindly go into the specs. This is one of the problems that plagued LTP, is they blindly went into the specs and they tested just like every Erno in there. And if you test behavior that you don't rely on, it just means that the kernel now has to adhere to that behavior, even though nobody relies on it. So, instead of reading the kernel code, or reading the specs, in order to figure out what you wanna test, you should be reading your own code. So, see what your code relies on, and then test for that, make the test for that. And then the other thing, the other major category of tests is things that you know broke. It's kind of, well, things that have broken in the past are more likely to break in the future. So, if you create regression tests every time a fix goes into the kernel, if you create a regression test for that, that's something that's valuable. Final thing, oh, this is not the final thing. UCLI test, if you've never used this for shell script testing, it's a really handy little tool. We use it before you go in a couple of places. You have to look at the slides to get it. And then, okay, I think this is my final slide. Yes, okay, go back there. So, my advice, this is, hopefully there's been some useful tips in here. Sorry I've gone by this stuff so fast. But if you have questions, you can ask me afterwards. If you're writing new functional tests, please do it in LTP. They have a good test library. The build system gets you cross compilation for free. They have a consistent output schema. And there's many, many harnesses that already know how to run LTP and visualize the results, including Fuego. If you have an existing test, please publish it, okay? Put it out on GitHub somewhere, and then add Fuego, add a Fuego wrapper for that test, and then anybody who runs Fuego can do it. Fuego can handle some of the dirty work for you. We can automate it, we can document it, we can make the results shareable, provide visualization for it. Personally, I'd love to see KSELF test develop a little bit more, and I'd love to see them actually adopt the LTP test library. So there's not kind of a schism in the ecosystem. And finally, we need board standards. And so that's actually gonna be my focus the next, maybe, well, I don't know how long it'll take, but I'm putting some energy into that to try to make that happen. So with that, go forth and test, and please share your tests, and I'm sorry we don't have time for questions, but thank you very much for your time.