 Hello, I'm Alexander and I work as an embedded software developer, so I write firmware for microcontrollers and today I'm going to show you how you can test such firmware much faster by not running it on the real device but integrating it with Python. And I already gave a talk in a similar direction last year, it was called writing unit tests for C code in Python, where I used the CFFI library to just extract single functions of single modules from your C source code, build a Python module out of it so that you could load it into your Python process and then use all of Python's power to write unit tests. I'm building on this idea today, and you might have already seen in another context this hierarchy of tests. With the unit tests last year we were at the bottom level where we only look at individual modules, individual functions and try to test them, but of course it's also important to test your code in integration with all the other parts and so this year we're going to move up a layer to the integration tests and try to make sure that basically all of our firmware really works and the motivation for that is that the firmware that we write is rather complex, so in the end we might have half a megabyte of compiled code on our microcontroller, for this we've written thousands of test cases and when we run those against the real device it takes several hours for all those tests to complete and as a developer that's not really what I want because when I make a change to the software I need to know fast whether this change is good or bad, maybe a quarter of an hour is the upper limit for me, I don't want to wait hours before I can tell that, so what we did in the past was we just selected a subset of those test cases that tried to cover as much as possible, but of course you can't guarantee that it really gets every corner case that you have in your code base, so there might still be errors that slip through, and this is what I want to avoid and how this project started, so I'm first going to show you the basic concept now and then afterwards give you a complete demonstration based on some firmware example to show you really the code that does all that. If you look at your typical microcontroller application it might look something like that, you've got a large application code base that's pretty standard C code that you could compile for any architecture, but of course you've also got hardware specific parts and if you've structured your firmware in some way you might have a hardware abstraction layer that really interfaces with the hardware and provides a nice and clean C interface to your application, and this is what we base this approach on because we want to make it look like this, we keep the application code and just replace the abstraction layer beneath it with some Python code, and the approach for this will be similar to what I showed last year with the CFFI library, but first when we are talking in the context of microcontroller firmware that's already written in C, you might wonder why do we use Python at all, we could just replace this hardware abstraction layer with a different hardware abstraction layer for another machine that's faster and just you see for that why Python, now we are at a Python conference here so I don't need to tell you much about the general advantages that Python has over other languages, when you compare it with C code then you can easily see that you need to write less code to achieve the same results, and it's also usually easier to use, for example our microcontrollers have cryptographic functionality built in hardware, so we have for example an AS peripheral in there where we can just pass in some data, it does the AS encryption in hardware and returns back the result, so this is something that we have to reimplement in our Python code for this to work, and there are libraries in C where you can do that, there are libraries in Python where you can do that, but the Python ones are usually easier to use, easier to get around with, and in the end Python is also very powerful for this approach, the hardware abstraction layer might contain functions that for this simulation that we are going to build here can work similar and don't need a different implementation, so you can just use a single template in Python and let Python generate the code for all those functions that you need, you don't need to specify each and every function in your application, so you can use the C code just for the program to compile, and this is now what I'm going to show you, the general approaches that will collect all the applications C source code, all the implementation of the application, and we'll collect all the header files of this hardware abstraction layer, so everything that specifies the interface of the hardware abstraction layer, and when we've got both of those parts, we can pass them on to Python, and we'll use this information to generate a Python loadable module that we can then run from our Python interpreter, and then we have our application running inside a Python process on a normal machine, not on our microcontroller, and since the normal machine is much faster than the microcontroller, hopefully also our application will be much faster and our tests can execute faster. So as an example, I can unfortunately have a real code, so I looked for a different project, and I chose the micro Python project because it's also very complex and a very complex project and has a lot of code, so it gets really an impression of a real-life application of this approach, and not some artificial example that I just constructed for this talk. You might have already heard about the micro Python project. If not, a quick explanation is that it's a re-implementation of the Python programming language that can run directly on a microcontroller. It started several years ago, also with an hardware device. Maybe you've seen it in previous talk in this room, where you've got a little board with a small controller on it, a lot of hardware peripherals that you can access from your Python code more or less directly, and it has basically full compatibility to the standard C Python 3.5 code, so they don't provide all the features, but most that you want to use. And first, we'll have a look at the structure of the source code. All the source code is open source. You can find it on GitHub, and if you look at the repository, then you'll find a structure that looks like this. So there are some files containing documentation and then a lot of folders, and many of those folders contain the code that is specific to one micro Python port, so micro Python already supports not only a single platform, but multiple platforms. There are, for example, some parts even for Windows for Unix systems. But the initial port was this one here, the ST port for an ST-based microcontroller. And in other folders, for example, the py folder, there's the generic code that can run in every port, so the py folder contains the Python interpreter, for example. And for this example, I will choose the minimal port, similar to the ST port, but very stripped down in functionality. It just contains the bare essentials. It gives you a Python shell that can run code, but it doesn't give you any further hardware access. But for this demonstration, that should be sufficient. If we look at this minimal port, this is all the files that are contained in there. So you see only two C files. The main C file contains the basic application start-up code that initializes everything. And you see this UART core file at the end. This is what the implementation of the hardware abstraction layer for this project is. So it contains some functions for input and some functions for output so that we can provide the Python shell. This is what the relevant functions look like from this file. You've got one function that reads a single character of input and does something with that. And you've got another function that can print strings to standard output. So in case of this minimal port, if you really run it on the py board, then it just uses UART communication for that. So you see some accesses to the UART registers in this code. And if we try to compile this file for our normal machine, then this wouldn't work because there are no such registers where you could write to. So these are the functions that we want to replace with Python code so that we can execute them. All the rest of this code that is contained in the minimal port also that's imported from the py folder, that should run on our architecture without any problems. So then there's another project that I need to talk about quickly. And that's called pymake. That's a re-implementation of the make utility. And I want to use that in this demonstration to parse the make files that MicroPython uses for its build process. Because for this approach to work, we need to know which source code files to integrate into our binary. Where do we find the header files? Where do you find the source code files? And of course, I could just hard code that in this example. But if you wanted to use that productively, it makes more sense to keep this information in one place. And the place that already was chosen here is the make file. So I just want to parse the make file and inspect the relevant information from there so that I can still keep all the information in this one place and don't have to adapt many places just for this whole process to work. And pymake gives me such a make file parser in Python. So I'll build up on that. When we look at the MicroPython make file, one bit of interesting information in there are the compiler options. For example, for the include directories. So it just builds a list of those here where it specifies some directories where we can find the include files, the header files. And in order to extract that using pymake, I can tell pymake just to parse the make file that I have. Without executing it really, it just parses all the data structures. And afterwards I can ask pymake, well, give me the contents of this variable ink where the include directories are contained. And what I get back is not a string, but is an object, the representation you can see here. It's actually not bad to get back an object like this and not the raw contents, because if you look at the beginning, then there is contained this value here that contains a reference to another variable. So I'm not interested in the string value, but I need to have this value resolved to its actual value in order for this process to work. And this is what can be done with the expansion object that the last call here returns. There's a resolve string method on there, and this then returns the final string value that I'm interested in. So I can hide just all this code in a simple function so that I can use that to resolve other things. And now looking at the cleaned up example, we can just call this function, get back the string that we're declared in the makefile. Everything seems to work. So we store this value in a variable for later use and start with the real process now, collecting the source code. So for collecting the source code, we'll just change into the micropython minimal port directory, so all paths are relative to this directory. And again, look at the makefile. There's a variable called sourceC that lists up all the source code files. And at the beginning, you see two that have already shown to you, the main file, the uoutcore file, and then there are some references to other files in the lib directory, again a directory that's shared by multiple ports. And so we can just extract this list of source code again using the function that we've already created. And again, you can see here in the last variable again contains a reference. The reference is resolved to the actual value. Now if we want to create a list of source files, we can use again the function converted into a set. Then we need another variable from the makefile that I haven't shown you so far. It contains the list of all the source code from the py folder, not as C files, but as object files. So we just adapt the name so that it matches to the file system location that we are interested in and add that to the set. And in the end, there's one source file that we have to remove from the application. That's this UART core file that I showed you in the beginning because this isn't really source code of the application. That's the source code of this hardware abstraction layer. We don't need that now, so we remove it here. And then there's one more thing that's special about MicroPython here. If you look again at the path that are contained in here, the last one refers to a directory called built. And if you try to find that in the file system, it's not contained in any of the commits. It's just a file that's generated during the build process. And contains information that MicroPython extracts from its own source code. So we just tell the MicroPython build environment, hey, please build this file for us. So we can compile it also into our extension module. So then we have a list of all the files. So we can just open all those files, collect all the files that are contained into one large string that we later pass on to CFFI. And before we do that, we make one more modification. The last line here just renames the existing main function to mp main. Of course, the MicroPython port assumes that it is the only application that is running on this machine, so it has its own main function when we import it into the Python interpreter. There is already a list of all the files that are running on this machine. And we have collected all the application source code. Now step two is to collect all the hardware abstraction layer header files. And for this minimal port, that's rather easy. There's only one header file that we need to include. There were only those two functions I showed you in the beginning. The header file defines some more functions that are not really important. And the other function that I showed you in the beginning is to collect all the files that are running on this machine. Because CFFI's parser for this information doesn't understand everything that the C code or the C standard allows. It just understands a subset. For example, it has no idea of preprocessor directives. It doesn't understand some attribute annotations on the C pre-processor. It's a similar code this year. What we're going to do is this year we add some definitions for the C pre-processor to the contents of the header files. For example, for this attribute definition, it just tells the C pre-processor to discard all this information. CFFI doesn't need to know about it. And if it's not there, it can't get confused by it. And it takes care of everything that's included of all if-depths and other things. And then CFFI can understand the results. So this pre-process function that's used here looks like this. It just calls the GCC's pre-processor and uses its output for the further steps. And you can also see here a reference to this include options variable from the beginning where we specify all the include directories. Of course, we can also use the C pre-processor and the C pre-processor to do that. And afterwards everything is contained in the string we get here. So this is now an extract from the string that we've produced so far. There are three function prototypes. One of them I showed you the implementation for this, for the one in the middle that can output a string of arbitrary length. And we can pass this to the C pre-processor. And then we can also tell CFFI, hey, these are functions that C code can call but that we want to implement in Python. And in order for that, we need to prefix those prototypes with external Python plus C. Then CFFI knows, okay, I need to generate some glue code in order to make that work. And again, the simple solution that you might come up with in the beginning but depending on how complex your code gets, it's better again to use a real parter that understands the C code and can just make this modification. This is based on the implementation that I showed last year. It uses the Pi C parser that's also used by CFFI internally. And it parses all your C code into a Python data structure. Then you can modify that data structure and write it out again. And in this case we do that in two functions here. We have one function that's called for every declaration that we find in the C source code. So that's the first one. And whenever we hit a function declaration and it's for a function that we haven't seen already, then we will prefix it with external Python plus C and return the complete result, otherwise we'll just ignore it. And the second function takes care of the initials that we might hit. So there might be inline functions that are specified in the header files. Of course we don't want to create a Python implementation for something that's already there. So we just remove them from the output as well. So we can simply run that on the header content that we've collected so far, get back a new string. And if we look at that string, then we find the same function. So CFFI should be happy with that. But there's one more modification that we need to make. And this is this. We had this MP main function already renamed in the C source code. Since we want to call it later from the Python code, we need to tell CFFI that this function exists and that it should provide some way for Python code to call this function. So in this case it's the same function prototype as this. So CFFI will assume that it's an existing C function that we want to call from Python and not something new. And with that step two is complete. We have collected all the header contents and now can move on to CFFI. And the CFFI source code is this. It's only four lines. So we first create the CFFI object to build our module. We pass in the header content that we collected before and then we generate the Python interface out of this header content information. And we pass in all the source code that we collected. And CFFI will pass that on to a compiler to build our extension module that in this case will be called MPCM. Again we pass in the include directories that we had collected in the beginning and afterwards we tell CFFI to compile all this into a loadable module. And the steps are completed and we have a loadable module. So now we can run it. And to run it we simply import that module and then we need to define the functions that we wanted to replace with Python code. And CFFI provides a decorator for that. It will just match on the function name. So if we define a function that has the same name as one of those external Python plus C functions, CFFI will know the function whenever the C code calls the function of this name. This is the implementation that reads a single character from standard input. And this then is the implementation in order to write out the contents of a string. And with that our implementation is complete. We have everything we need. So I'm going to try to show you now that this really works. I've prepared a code that contains basically this code. I can run it. And then I'm dropped into a MicroPython shell. And I can execute MicroPython code in here. I have the usual features of tab completion that MicroPython provides. I can call some of those functions. I can look at the objects. Everything seems to work as it should. And in order to demonstrate you that this really uses the code, I'm going to specify that code and tell it to print everything twice. And then you can see okay, every output that we get is there twice. Everything that I type is printed twice. And it really executes our Python implementation of those C level functions. Okay. Then I want to talk about some of the challenges that you might face and that we faced when we started this. First of all, your code should follow a certain structure in order for this to work easily. If you've just got a single file that contains everything, it's hard to separate the hardware-dependent parts from the general source code. So what you really want to have is a clear distinction between the hardware abstraction layer and the application code. Then you can just match on the other folder. This is what we do in our example or have some other mechanisms like the make files that I showed you before. Then there's the problem of namespaces. It's perfectly valid C code to have two files that contain functions, static functions with the same name. But since this example collects all the source code into one large string, everything ends up in the same name. So this won't really work. So you need something like that where you prefix every function, for example, with the name of the module so that you end up with a unique name. And another problem is the platform dependent code. I've prepared a small example that looks innocent but contains multiple problems when you try to run it on different architectures. So you just calculate a checksum over that structure and of course the checksum should always be the same no matter on what platform this code runs if the data in the structure is the same. The problems that you have here, I'll show you the corrected version already. As first, the data types in the structure, if you just use shorts or ints, there's no specification that defines that. Then you might get problems with padding that the compiler inserts into your structure. So we tell it to avoid this padding with the attribute packed. And last but not least, you need to consider the endiness of your data, so the byte order of your data if you've got multibyte values. So in the second example, I use some standard functions just to convert the byte order to a network byte order, which is a big byte order, and so the structure always should contain the same values here, and the checksum should really be identical. Another problem you might get with code that relies on interrupts because that's not really supported on this platform, you might get something like this if you use threads to really achieve some parallel events, but we don't really have that. Last but not least, let me talk about the external interface for your code. If we look again at this picture, what's beneath your hardware abstraction layer in your usual application is the actual hardware, and when we take away the abstraction layer, we also take away the hardware, so you need to replace that with something else. We also have a network interface that can be used by our existing test cases, so they deliver their input there and get their output back, and so the test case doesn't even need to know whether it talks to the real device or our simulation device. Okay, now you've done all of that, you'll also get some benefits out of it, and the first benefit and why we did all that was the fast execution. So I've collected all of the data, and they were executed in roughly five minutes, and if I run the same set of test cases against the real device, it takes one and a half hours, so that was already a huge speed up. In fact, these are the numbers from the first prototype that could execute everything. We didn't invest any more effort in optimizing that any further because it was already fast enough for everything we wanted. Another thing is, the warnings that the compiler gives you, or that special linters give you, but there are also dynamic program analysis tools that look at your code, or that don't look at your source code, but that look at your binary code while it's being run, and can give you more information. One tool that we've integrated easily is the address sanitizer. That's just some extra compile and the compiler will add extra code that checks for invalid memory accesses out of bound accesses, and if it detects something like that, it will just abort at this point. And a second tool that we use is a fuzzer. It's called American Fuzzy Lop that tries to be a bit more intelligent than other fuzzers by trying to find new code path automatically, and you can use that with Python code as well. You can use a wrapper provided by AFL to compile the extension module. It's called AFL GCC. It calls internally to GCC, but in a way that AFL support is integrated. So this is all that you need in your code for the AFL support to be present. And then there's another nice tool called Python AFL that's actually intended to run Python code with this fuzzer, not but it also supports this use case, and then there's a small script that in this case reads fuzzer input from standard input and runs it against the application in a loop. We did this with our code for some, I don't know, seven billion executions that fortunately or unfortunately didn't find any problems, but it works not with the highest speed, but you can use it. And the last benefit that you can use is that you can run a certain kind of hardware independence. You can do your development without having access to the real hardware, so maybe in the beginning of your project when the real hardware isn't really available right now or even later on when the real hardware is just too expensive or you just have a few of them with this approach, you can easily scale and do your tests in parallel on many different devices. And with that, my talk ends and thank you for your attention. Casting? There we go. So I think we've got time for maybe one very quick question if somebody would like to ask something. In general, thanks for the talk first time. And in general you have to simulate your outside world with embedded systems. So there are some inputs you're waiting on in and you have to simulate this and so you did simulate this in Python the outside world or how did you manage this? In our case, the outside world is really just a communication channel. We get some input there, we have to process that and generate the correct output. Of course, you could do something like I said in the interrupt example that you use some threads that every five minutes simulate the value of some necessary for our use case.