 Thanks for coming to this session about testing a large testing software. My name is Rémi Durafor. I'm a senior software engineer at CleanArow. I've been working on open source software for some years now. I was working on VLC before on JavaScript Engine, and now I'm working on Lava itself, which I will present right after. So Lava, what Lava is? I will just make a small presentation, because that will be my target for testing things. That will be the thing I want to test itself. So Lava stands for Linear Automated Validation Architecture. It's a test execution system. So it allows to test software on real hardware. So you will give him a kernel, a DTB root FS, for example, and it will deploy the resources that you give on the target. So on Raspberry Pi, Juno, Beaglebone Black, whatever you want. It will boot the board for you and it will run the test on it. So it's used by some big system like Kernel CI, which tests the Linux kernel mainline and next, at least some of our branches too, regularly. So all the labs that are participating in this project, they are using Lava, except one. In Linear, we have a project we do system level testing called LKFT, Linux Kernel Functional Test, that it's all based on Lava for the testing part. We use Lava for power consumption tests. We do some benchmarks with it. And we also have a unique feature that is called Multinode, which allows to test many devices in a job. So you can request many devices that will participate in the same job. So you can test client server things. You can have many devices that work together for testing things. So what it is to test without Lava, when you want to test on a real hardware. So let's say you have a kernel, a DTB and a root FS, that you want to test on, this is Raspberry Pi 3B, if I remember correctly. So you need a way to control the power. You need a way to get the serial outputs. And in this case, we will boot using TFTP and NFS. So you need a TFTP and NFS server where you will provide the resources. So the first thing that you will do is to pour on the board. So you launch a script that will pour on the board. You connect to the serial connections. So using telnet, low colors, for example. Then you will see the Ubud prompt coming. You will have to type enter to interrupt it. And then you will have to interact in the Ubud shell to send the command. So for example, you requested an IP using DHTP. You set the server IP. You load some resources using TFTP. And then you type the boot command to actually boot the Linux kernel. You will see the Linux booting. You will try to see if there is errors in between in the middle. Then you have to log in using username and password. Then you inject the test that you want to run on your board. You run the test. You collect the results and you pour off the board. That's a lot of step, and that's not really fun to do when you have to do that 10,000 times a day. So what lava will do, it will do that for you. All the not really nice things and boring things, lava will do it for you. So instead of typing each command separately, you will describe what you want to do in a job configuration, which is a simple YAML file in which you will say, I want to deploy this kernel, this TTP, and this root FS. You just give URLs. I want to use Uboot for booting because my board has Uboot installed on it, and it will be a TFTP plus an FS boot. And I want to run LTP test suite from this git on this board. And lava, the part of lava which will access to the board called the dispatcher. So the dispatcher will connect to the board, control the serial relay, control the power controller, and install the resources on the TFTP and FS server, send the commands, all the things that I explained before, it will do it for you. So obviously lava has been made to be scalable, so you can have many boards per dispatcher, and you can have many dispatchers per instance. And every dispatcher will connect to one server, to the main server. And the server plus dispatcher plus the boards, this will be your lava instance, your lava lab. And as a user, you will only interact with the server by posting, using HTTP, you will post job definitions, and you will get results. That's fairly simple as a user. So just the roles between the server and the dispatcher because they are really different. The server is, it's a web interface, so it's what the user will use. So you have the web UI, the API, so you can, as I say, submit jobs, see the results, get the logs. It will do the access control, it will do the scheduling, and it will also store the logs that are sent from the dispatchers. And you will also send notification between the launch relay carrier. The dispatcher on the other side, as I said, it's responsible for accessing the board and controlling the boards. So it will deploy your sources, it will power on and off the board, it will send the commands, the command that I was presenting for you Boots, the dispatcher will be the one actually sending the commands. It will also pass the logs, so when Linux will boot, it will try to find errors, and it will report it to you. And it will also try to classify errors. Is it a bug in lava? When there is an error, is it a bug in lava? Is it an infra error, like DHCP is not working? Or is it a test actually not working? So you have to present the right result to the user. So that's the dispatcher responsibility. We do support many deploy Boots and test methods that I won't explain here, that's not the point. And the same, we support many devices, different device types, that's the actual list. So there are 197 different types of boards. I'm saying that just to explain that if you want to test lava itself, you have to test on 197 different boards, which is difficult. So why do we want to test a testing software? Because I agree that testing is not really fun. That's maybe why our room is not the crowded one, but that's really important. So why is it important? Because like most software, it has to be reliable, but why especially for a CI system? If you have a false positive, that your CI system is crashing, it's a bug and it's crashing. So user will send a job a job expecting to have their test suite running and the result will be it's failing, but it's not their fault, it's your fault. So if it's happened a lot, people and developers will just say, it's not working, don't bother about it anymore. Let's just do the usual way and not test much things. That's the first possibility. Even worse, you can have false negative, which is your CI system is working and it's actually not running the test that the user is asking and it's not reporting a narrow. So one test is failing, but you're not able to report it because you have a bug somewhere. This is one of the worst case scenario because a CI system is a proof that the test suite is passing for a given project. So the release manager will say, I can release now because I know that the tests are passing and in fact it's wrong because your CI system break. So technically at the end of your CI system, people will not use it anymore if you do that. So you have to be reliable, that's really important. And obviously it's a software, so you will have bugs and you will have regressions. And even more for lava, at least it's a complex system, so easy to make mistakes. So how do we test lava itself? So this is a pyramid of tests for me. So you have a basic which is that you can analyze this on unit tests, then you can do integration tests and even better if you can do end-to-end testing. So when we develop things, we do some manual testing, we test that it looks okay, and we sometime run the unit test, then we send the merge request. We do use GitLab, so we use GitLab CI, and GitLab CI will run the full static analysis and the unit tests. Because if we run that, we are sure that at least it seems to be working. Then every month, every day, we will run integration tests thanks to MetaLava. This is what I will present today. And every day we also text the latest master and we install it on staging, which is our staging instance, which is a pre-production instance where people are actually running real steps on it. So we are sure that if it's breaking, it will break on real users. So it's important. We also do federated testing that I will not present today. If you want more information, I did a presentation last year for them. So GitLab CI, it's a fairly simple GitLab CI pipeline. So we do unit tests, both for the server and the dispatcher, because as I said, there are really two different things to test. Because the dispatcher is responsible for both, so it does access to hardware, whereas the server is just a classical Django application. We do test on Debian 9, 10, and 11. We do that because sometimes we do found bugs on different versions of Debian. Sometimes it's working on 9, but not on 10 and 11, and the reverse, because the Python version are not the same. We do run some static analysis. We use Bandit and Pylint. First, the funny thing about Pylint, if you use it without any arguments, it seems pretty useless, honestly. If you use the right command line, it's really useful, because it does find real bugs. And then we also build Debian packages and Docker container and the documentation. So as I said, testing the server is just a Django application. So you insert that data into the database, and you run some tests. That's pretty easy. It's really easy to test. The dispatcher itself, that's way more difficult, because the dispatcher is interacting with the boards. So you need a board to test the interaction with the board. You need something to test with it. And it's also interacting with the master, so we have a protocol between the master and the slave, so we also have to test it. So it's really difficult to test, and I will present how we can test it correctly. So the KitLab CI, yes, it's useful, because in black you see every time the merge requests are failing because of the CI. So we do find some bugs, but it's absolutely not enough. So how do we test the integration test, which is a project called MetaLava? So going back to what we support, we support 197 boards. We support 16 deploy methods and 26 boot methods and four test methods. That's this amount of combinations. So if we want to test everything with real hardware, just impossible. So we want to be able to test the full system, so the server, the dispatcher, and the board interaction, that's what we want to test, in fact. We vote any boards because that's really costly. If we want to have 197 boards, we need a really large amount of money and a really large board, a lab, and we want to be fast and cheap. So the easiest solution is to do board emulation using QMU, but that's not cheap. You will need a lot of CPUs, so we have to do what we call system mocking. So what is system mocking? Going back to the lava architecture, this is a full architecture. Just simplify it a bit. We take one server, one dispatcher, and one board. Try to make it simpler. So we don't want any boards. Just remove the boards. Now how do we make the dispatcher believe that it's actually talking to a board? Because the server is talking to users and dispatchers, so that's easy to have. The dispatcher is talking to the server, easy, but to boards. So we have to make believe the dispatcher that is actually talking to a board. So we need a way to fake the power control, the serial relay, and the TFTP and NFS resources. So the power control is quite easy. It's just a command that the dispatcher will run, so you can replace it by being true or whatever. It's not really important. The serial relay, it's difficult because it's where the board will output. You will see the UBOOT prompt, which is the link's kernel, and this is where lava will send commands. So you need an interaction here. So we created a project called DEMISIS that will fake this interaction. So when you will run DEMISIS, you will have the impression to actually talk to a board while you're just talking to a script, a Python script. And for TFTP and NFS, we have to check that the dispatcher is actually putting the files as a right place, as the file wanted to see. So DEMISIS will have to mount NFS and to download using TFTP and check that the files are the right one. So we are sure that the boundary between the dispatcher... Yeah. So... Sorry. So DEMISIS output, it will look like a real board. It will expect the right commands sent, so you will see DHCP set on several IP, et cetera, et cetera. And if it's not, it will fail, which means that if it fails, it will fail and the CI will fail. So you will check that Lava is sending the exact set of commands that you have to send to boot a specific board. Same for TFTP and NFS resources. We'll do some checksums on the files that are served over TFTP and NFS. Thanks to DEMISIS, we are respecting the boundaries. So from the dispatcher side, it looks like there is a board and from the board side, so DEMISIS, DEMISIS is checking that the resources that are provided by the dispatcher are the right one. I believe that it's not easy to understand, so let's make a really small demo. So I have a small demo. Can I type something? So I'm just calling DEMISIS with a command which will fake a boot 20 to 60. It's a board that I'm sometimes using. So when I will launch it, you will have the impression, so you'll have the same output that if you were connecting to a B20 to 60 that is actually booting. You can see that I don't have any boards hidden anywhere, so... This is what you get when you actually talk to a board. I'm inserting new boots by typing a command and then I have a shell in which I can type commands. So I can type DHCP and I will get an IP address. It looks like it's doing something. It's actually not doing anything. I can prove that it's not a real shell because if I'm typing something wrong, it's just not working. This is a script that I made for them. So if I'm entering one command, it won't crash. What I'm using for MetalAvi is that if it's not exactly the right command, it's crashing. So this is just for showing you. So if I want to send to boot, I'm just typing boot command. Usually it's boot m with the right addresses, just for the demo I made it to be boot only. And it would look like Linux kernel is booting. That's what you usually get when you boot a Linux kernel up to a shell. And then you will run the Lava runner and it would look like what Lava is doing when you actually run things. So from a Lava perspective, you're actually booting a board. You're actually testing a board. It's just working. The fun thing is that it's really easy. There's a small yaml file going back to the beginning. It's just an array of commands to run. It's a small finite state machine. So you will have a command to print things. So you will print with a small delay because the board is usually having a... There is a small delay when you boot a board. Same as there is a small jitter. So between every character there is a small jitter. So it's not really... There is a small e-rig library. So it will just... This is the line that will be printed. Then I'm printing the line etenike to support a boot. It can be interrupted. So if you enter something, it will stop immediately. And it will just then have a prompt. And it will wait for a specific command. It will just loop on the prompt until you type DHCP. When it's done, you will go to the next one, which is just printing the line that you usually see when there is a DHCP command, et cetera, et cetera, et cetera. And then you see this is all the same. This is pretty simple. The fun thing is that it's working. Add the nice thing about it. This is a full... This is for Raspberry Pi 3. And this is the one that I'm using in production for testing. So I don't remember which command I have to send. So it will just fail. I type two commands that were not the ones that are expected because I don't remember which one I have to send. So it's just crashing. So if lava is not sending the exact set of commands, it's crashing. So my CI will crash. So I'm realizing that the lava change is behavior. It's okay. So thanks to that, the MSI software, we created MetaLava, which is we have a Docker container with a server, the Docker container with a dispatcher. So we patched the dispatcher container to insert the misses inside it. So instead of connecting to the server, it will launch the misses. And then we just sum it up to the server. And we looked at the results of the ones that we are expecting. Thanks to that, we test 27 different boards, including that board that I've never seen because I'm just going on the Internet, looking for lava labs, and I'm just copy pasting the logs because that's the only thing I need. And I'm also able to test failures because I just have to, if I want to have a DHCP failure, I just need to have the text that usually view boot prints when there is a DHCP failure, and I'm just adding that to my configuration file. And it will look like DHCP is failing. So I can check that lava is actually reporting that as an infra error and not a test error. Same for bootloader errors for many, many different things. That's really fun, honestly. I found that really fun and it's really useful. I actually found many bugs just by this. A second thing that we can do, another thing that we can do with system mocking, that's what I call system mocking, is benchmarks. A question that was asked to me sometimes ago is lava able to run 500 jobs in parallel? So the easiest way is to, you buy 500 boards, you buy something like one of the servers, you ask many, many, many, many, many friends to come and to help you plug everything and it will fail. Obviously because some boards, when you have 500 boards, some will fail so you have to replace it. It will be a nightmare, honestly. So it won't be reliable if you have to do that every day. So one way to do it is just to mock some part of the system. So again, which part do we have to mock? So again, this is the lava architecture. We will simplify it a bit, but we'll look at how lava is working itself. So on the left side you have the server, on the middle is the dispatcher, and the board on the right. We said that we don't want any boards so let's remove the boards. We can also remove all the poor controllers because there is nothing to control. In the server we have many processes, many services. One is the lava master that will do the scheduling and that will start the jobs. So this one we have to keep it because if we want to check that we are able to run 500 jobs, we have to be able to schedule the jobs. So we have to keep lava master. Lava slave on the dispatcher is responsible for starting the jobs. So the same, we have to start the job somewhere. Lava logs is responsible for receiving the logs that are streamed by the dispatcher. So we have to keep it also because it might be a point of fail in this case. Lava run is just an intermediate process that will start, that will connect to the board and do all the small things to connect to a board, but we don't have any boards. The only thing we care about is the logs that we will stream back to lava logs. So what we can do, we just replace lava run by a mocked version of lava run, a small Python script that will look like lava run. So it will have the same command line. So when lava slave will start lava run, it won't see any difference. It will have the same signal handling. So it will respond the same way. So same for lava slave, it won't be any, you won't see any difference. Same return value. It will just send the logs to lava logs in the same format and almost the same speed. So it looks like the same. And you can do that with a small Python application that won't use many CPU RAM and IOS, just loading a small file and a more or less cat thing, just cat sending back things. So thanks to that, I was able to run almost 600, to fake 600 jobs in parallel on my laptop and went up to 300, but then I had too many jobs running. My laptop was not able to handle that, but on a bigger server, you can have 600, 700 more than that because it's actually not doing anything. So I was able to test that lava was in fact not able to run 500 jobs. 300 is working, not 500. So to adjust another example of what you can do with system mocking. So conclusion. Stem mocking is really fun, honestly. You can test many things. You can do benchmarks. You can fake hardware that you have never seen before because you don't really need it. You just have to look at the boundaries. It's not that difficult, just be creative. Look at what part of the system is not really important for what you're doing. Look at the boundaries and try to mimic the boundaries. If you do that, then you can mock some part of the system and test it really nicely. That's all. Do you have any questions?