 OK, c'est tout le monde, c'est bon, on va commencer. OK, bonjour tout le monde. Avant tout, quelques déclamations. C'est un talk about the embedded system. IoT, the thing without screen that you forget in your basement for two years because it's kind of monitoring your actual consumption or something. It's not that the pattern that will be described here won't be valid in other use cases, but just that the rational are not necessarily the same. So we can argue for one way or another in the few things that we'd follow. You can do it as well, not necessarily a good idea. Since this evening, General Team revolve around Python, we will do it in Python. You can do that basically in any language that you want. You need a big standard lab, but I know for a fact that there are people in this room that planning to do that in a GS for example. I don't know if it's a good idea, but... And most of the code presented here is a simplification because there are limits of the code you can fit in a slide. So everything is on GitHub right there. You can follow the talk with the source before you if you want. All right, so as I said, we are going to talk about a subset of embedded system. We created the same. Some of things have a touch screen in your fridge or stuff like that. That's not what we are going to talk about. It's really about stuff that you don't really touch. You place it somewhere and you forget it. When you have this kind of system, you usually need to choose some existing framework. When you have computational power, I'm not talking about microcontroller thing. For example, a Raspberry Pi, you usually go with a distribution like a Raspberry Pi or you go with the Android stack or stuff like that. All of this stack were conceived with one big fundamental idea. It's the user that is in control. For example, you have this stack. You see the thing here adjusts a way for the user to take decisions, control the hardware. You have this very big stack with some kernel driver managing your power consumption, for example. And then you have some kind of user space demon interacting with that driver and you have some kind of messaging system and then you have a graphical UI when the user can say, okay, I would like to power it down or not. In IoT, we need an autonomous system. The system this stack, for example, or the Windows stack or Android stack can't really be used in a modular way. You can't take some part of it and expect it to work very well. So we usually choose one, put in there, and it's not pretty when you mix them up. So you end up kind of simulating the user, connecting to part of the stack and manipulate the device like that. It's usually complex. For example, the idea of System Days to boot it as quickly as possible. So we try to parallelize a lot of things. In the embedded world, it's making things difficult because it's difficult to debug and reliability is kind of paramount. So you end up generally coding your service file depending on one another so you can sequence the whole thing. And basically, a little bash script starting everything in the right order is way simpler. And you can control the timing between each step by adding a simple sleep. So what I would like to show you here is do-it-yourself kind of stack. Tailor for your need. And I would like to argue that it less work. It's simpler and it's more robust than started from a standard stack and mutated to fit your requirements. In Autonomous System, the general architecture is kind of... I have some inputs. I need to process them. I need to take decision and take action that has impact on the outside world. Most of the time, you end up also streaming data because a lot of IoT systems are complex sensors. The key point is by itself. So the system should be aware of its state. Where I am in my computing sequence and what should I do next. So you have a lot of states. So you end up thinking about the main dimension of your system. For example, if you want to do a device that you put in your cars to monitor something, you should check if the engine is running, if the car is moving, something like that. And then you say, okay, it's 20 minutes since the engine is down and the car is not moving, so probably I should pour it down. So the next stage should be shut yourself down. So how can you manage a state? One way is simply considering that you are in one state and dealing slash engineering the error that it generates. For example, if you have a battery in your device, you can take the hypothesis that it always functional and always have power. And sometimes it makes sense if you have nearly constant power supply and the battery is just there to do the exceptional case. So you just ignore the case when there is a battery issue and when there is no power, you just crash and the next time you put, you try to recover from the scratch. But most of the time it's not sufficient. So the first thing when you think about the system, you should start with animating the different states before you start coding anything. I see what transition you should have and for example, seeing that when the wifi is not available, you should not run the HTTP instead of having the HTTP running all the time and when the wifi is not available, throwing errors. If you do that, you end up with a more reliable system. Here we are talking about system state, not the application inside of it because the application inside of it usually has other state but I'm mostly focusing on a system wifi network, watch process to launch, if I should pour it down, stuff like that. So usually you end up doing filtering on your inputs. You need probably a big complex event processor to do the big stuff. I won't talk about that. This is very interesting. I won't take this time to talk about the platform stuff in detail. Or you should run your application or to boot, or to shut down, or to manage logs and stuff like that. Most of the code, as I said, is on GitHub, but I also put the code in a build route layout. So a build route is a build system to generate a new firmware. So I forget the build route repository this afternoon for the Raspberry Pi. If you want to try it, you just clone it, type mail, make its pre-configure. It will generate a firmware you can play with. And for this talk, I'm going to use an example to talk about the different steps. And I'm presenting to you some kind of Wi-Fi scanner. So basically you can have your Raspberry Pi, just plug a GPS in that. You work with it in your car or other stuff. It will log every new Wi-Fi that it sees with a GPS position and you can send it to whatever you want. So the first thing for the system is to boot. In any Linux computer, the first thing to boot is the bootloader. In embedded stuff, it's mainly U-boot. Raspberry Pi is not like that, but most of the time you have U-boot. You can have grub. This program, the responsibility of this program is to load the caramel and maybe the Initrum FS and call it. The caramel will then initialize the machine, initialize itself and then call the user space. The user space is a conceptual space where the user program runs by a position to the carnal space where the scheduler, memory management, carnal driver execute themselves. This program that the carnal calls just after it's initialized will be our entry point. The carnal can know what to call with several ways. You can pass it in command line, so you can call your carnal with an argument and one of these arguments are Init, Init-egal, and then you just give it the path of the program that you want the carnal to run. You can do this on your own computer. For example, you can really call when you just stop boot, call with Init-egal, bin bash, and you end up with bash and not. You don't see the usual boot sequence and stuff like that. Also, when you compile the carnal, you can hard-code which program to call after the boot there. Usually, the boot sequence amounts some partitions and amounts some virtual device and virtual file system also. Scan for device, load the right module and firmware and do some other setups. In some embedded world, you may want to bring up the internet immediately for debugging purposes. The first step, you need to have some kind of file system. In modern Linux, the file system are quite complex. If you type mount in any given time, you see a lot of them. Most of the time, you have one real partition, the root FS, a mountain slash. In embedded system, you should really have a root partition that is only read-only because you don't really want to edit that. And probably, you need to store data so you can mount a partition in read-write to slash var its usual pattern. You need a partition for the proc file system, the proc virtual file system. You mount it on slash proc. You can have a virtual file system for slash dev containing all the device on three points. Everything that you see there represents a kernel driver. You can open it, do read-write on them and basically, you are talking directly to the hardware. You also need something called the pts file system. It's the pseudo-terminal system that I won't go into that detail. You have a cys slash cys. Slash cys is a file system that representing directly kernel objects. Also, you might want to have that. TMPFS, it's a file system only in RAM that you can use that as a general purpose file system except that it won't write anything to the disk only in a space in RAM. Most of the time, you use that for the slash tmp and slash run where some program will store their pit file or lock and stuff like that. Oh, in Python, oh, do you do that? Unfortunately, the Python stdlip don't wrap the mount function. So, you need to go with the libc. You can use the c-type module for that. It's a module in Python that can open any shared library and call any function in that. So, we can use the standard c-library and call the mount function. And then, by you have your Python mount tooling encode, maybe a good idea to check forever. And then you simply mount everything that you want. If you want more partition data or different scheme and stuff like that, you just edit the code. You don't need to do configuration file. You can use simply whatever you want. After that, we talked about loading module. You should load your module. You can use modprobe for that. You have in Python a function called system that's called order program. So, you load your module. And you have your low-level setup. You have a file system correctly set up. You have some module running. And you can do anything else. If you want to split your init files, you need to know that the process that the program that the system calls first will be PID 1. It's a special PID. It can die. We will talk about that later. But if you want to split the boot sequence in several parts and send them together, you can't use system or fork at the end of the first part to switch to the second part, for example. You can't use that. You need to use XA. If you use XA, you basically switch the current program with a new one. And the PID stays the same. Because you are PID 1, you can't die. So, you need to switch between the program and not fork a new one and let the parent die or stuff like that. All right. One other role of init is to re-zombie. Obviously, we have an OS, so we can launch a program. Init launch itself a process, themselves launching a process and stuff like that. So, there is a positive rule that say when the sub process finish, its parent should get back its return code. If the parent can't or is blocked or bugged or stuff like that and don't get back the status code or the return code of the child process, the child process stays in a state called a zombie. After a time, it consumes resources. The file descriptors are not always closed, the right time and stuff like that. So, after a time when the parent also dies, the grand parent should read the child and if nobody does it, it's PID1 that does the rest. When that happens, PID1 gets the responsibility to reap everything and how does it work? It's kind of easy. When you should reap a process child, the kernel send the parent a signal called sick child. A signal is some kind of software interrupt. You jump from what you are doing to a signal handler that you register earlier. In this example, in the main somewhere, you register your handler and when a child should be reaped, the kernel just called the reap process there and then you use a syscall code with PID and basically you reap the problem. Sometimes it will fail. For example, if you use the system command in Python, the system will correctly wait for the sub-process. So, there is no need for reaping, but the kernel send the sick child anyway. So, the reap process will be called, but it's not needed. So, you will have an exception for WETPID and you should always use the no-hang flag because WETPID, if you don't use that, WETPID will be blocking and it's a bad thing. Your init will be blocked until there is something to reap. So, we have a correct file system. Everything is maintained in the right place. We can reap the process that we already launched. Some modules are loaded, know what. So, we probably need to launch apps. So, you have basically in Python three possibilities. You can use system, the command that we talked earlier. It takes the program names and launch it. The problem is that it's blocking call. So, when the program runs, the parent doesn't and just wait for the child to finish its job. So, you can run several things parallel. So, that's not a good thing. You have the pop-ins. The pop-ins, in the su-process library, it's launching the process, but it doesn't wait for the process to finish. You just launch it. And also, the API of pop-ins is quite nice. You can set the input, output. You can communicate with the process. Polling is to see if it's finished or not, stuff like that. And if the API of pop-ins doesn't suite, you need, you can rewrite it. You won't pop-in with a combination of fork and exact. But that won't be covered here. So, let's write a process manager with pop-ins. First, we encapsulate the pop-in object in another one that we will do it ourselves. To manage a case, for example, if the process doesn't start. Also, the main goal of a process supervisor in IoT is to restart the process when they crash. So, we are adding a check method. If the process is dead, just restart it. And then you might see, obviously, a flow here. If the process crashes immediately after boot, it's restarted in loop. It will consume a lot of resource. You probably should implement something kind of a back-off. A back-off is basically you take the time when you start the process. You count the seconds till it's dead. And if it's too short, you wait a little bit before restarting it. Probably, you should count the number of restart also, et cetera, et cetera. You should maybe add a statistic method on this object to know what's the state of the child, to know how many times you restarted it, and stuff like that. This stat method can return some kind of dictionary probably containing the metrics. Some of these things can make sense to your use case. Other doesn't. Maybe you need other things. Maybe launching several child of the same process in parallel. In any case, for example, the back-off feature is very simple in Python. You just get the time when you start. Get the time when you check if the process is dead. Come to the delta and wait or launch if you want. It's just a few lines of code. The last object was just for one process. Your system, you probably want to launch a collection of process. You probably need another object that has a collection of this object. So, here is a supervisor object with a method for a starter process, a stopper process, stop all process, whatever. Maybe it makes sense for your use case to have a signal method in there to send a specific signal to a specific child based on its name. Again, very easy to do. There is actually on the source code. So, you have a kind of super resource system. You can start with a nice Python API. You can start a process, check them when you need and the system kind of work. We are in the boot sequence. So, you have a file system. We have module. We can start that. So, let's start application. You want to connect to your IoT device, for example. So, probably you need SSH. Maybe you want to connect directly. So, you may need your IoT device on a VPN network. Let's start a VPN client. Don't forget to start some kind of system logging module. For example, carlog. Carlog is a kernel logger. And you want to start your specific business app. For example, in our case, we want to start something that we scan the Wi-Fi. And so, you have some kind of code like this. You create a supervisor object in the variable SV and then start all your process. Again and again, people talk about modularity. So, you need to separate stuff. Breakdown complex thing into simple things. So, if you want to scan Wi-Fi, we probably want to generate a stream of Wi-Fi information at one time. So, we have a Wi-Fi scan. Just check the Wi-Fi status every five seconds. See if there is new Wi-Fi detected. And if there is, just send an event. Hey, I have got a new Wi-Fi. You probably want a program that just listen to your GPS chip. And most of the time, there is a chip talk in a language called NME-183. I generate on the serial device. So, you just open the device with PICerial, parse the outputs, and then generate a position stream. In the source code on the repository, it's a fake GPS because I don't know the specific of your setup. So, I just generate a position stream fixed on this building. And probably, there should be a program that kind of merge the days to stream and associate the last position with the last Wi-Fi detected or something like that. Which bring us to another decision to take or everything communicate together. So, you have the problem of inter-process communication. On a Linux system, the most basic primitive available is the POSIX communication system. So, you have FIFO, you have signal, you have shared memory, you have POSIX message queue, you can write to the file system, you can work something directly by file. Unfortunately, the APIs are quite rough. You don't get a lot with it. For example, knowing the current state of a socket is kind of a tricky question at any given time. So, maybe we should look around and see what the Python library has to offer us. They are basically built on the POSIX stuff. So, the APIs are a little more pitonic, but they don't provide much more future. And if they fail for some reason, usually you end up with a deadlock and everything is blocked instead of crashing. And crashing is actually better because if they crash, you can restart it and try to recover. If it's blocking, you need to detect that you have blocked. It's not easy. And you have also another alternative is that using big, big-told-party system like some kind of database or ready some stuff like that. Basically, it's taking building block to have a several software stack and break down to an IoT system so what could possibly go wrong. When you have to choose, usually you discard everything that doesn't fit your requirements and you end up with several possibilities so you need to select the best fit. Usually, the main dimension to consider is the simplicity of the API. The more your API is simple, the faster you go, the more you can add testing. The richness of your API is important too. For example, if you need a queue, knowing how many elements are in the queue is a real-life feature. You can't have that on FIFO stuff like that. Is the queue can grow or should it be a very little FIFO stuff like that? And another dimension is to take into account how easy to test it. Python is really easy to test. But if you are dependent on the art system to test, it kind of destroys that capability. So that's nice. So why not Redis? So for those who don't know, Redis is a no-sql database which provides a way to manipulate lists, dict, set and string in a general key space. It's kind of a lightweight system, very performance. You can push to the right, for example, and then pump from the left and basically a FIFO. So you have your index system on the top supervising all these applications, system up to the right, specific application in the middle communicating between them with some Redis queues in this example. Redis supports transactions so if you need to increment a counter and push an element to the queue you can etc. Ok, so the Internet of Internet of Things usually is another topic in the network. So we need some kind of specific application to monitor the network. You should definitely use a state model to know what to start, what to stop, what route to your packets to and stuff like that. For that, the Python standard lib is unfortunately lacking of the right API. But fortunately there is a very good lib called Pirut 2 with which you can control of the low level network stuff and so you just need to concentrate on the high level business logic. Also for the streaming of the data you have access to a lot of library like request if you want to push it to HTTP or if you want to stream things from message oriented kind of architecture you can use for M2TT or Pica for MQP depending of what your server needs. Obviously it's very specific to what kind of setup you have so not much to say about that but what I can say about it, it's in the end something will fail. So you need to understand what what happened. Again, Python has a nice logging module you can define the logging module in settings for example and import that file everywhere so you have the same settings for everything. With the Python module you can easily store your log in VAR log for example rotating them with a rotating file and so on and stuff like that. There is a bit of a catch if you use for example a settings module that you import everywhere so after the file system is correctly constructed because when you import a logger it will try to open the file and stuff like that. If you haven't logged your slash var for example it will crash. Also you should really really log the system matrix you should log the CPU load you should log your memory usage and then give in time and sending to the server but directly. For example if you know sentry and you see it's a good idea to stream the log directly from the device to sentry it can be it in some cases it's not a good idea because if you have network issue which it's generating a log you basically generate a log that make your network issue worse so you have a positive feedback loop and in the end everything crash again what you can do for example is to break that feedback loop by counting the error sending the error counts in a periodic matrix and stuff like that and if you see that counter increasing you connect to the device and retrieve your log that's a that's a strategy ok if you connect to the device you really use SSH and SSH after the logging you usually spawn bash or ssh or any shell you want but it can spawn python too you can connect directly to a python console if you want there's two things you need to do it it's redefining the default shell in the ATC password file and you need to authorize the python 3 executable as a valid shell which is usually not by default if you have a collection of API on your device you can import that the API you can ask for example you can just import the stat process from your supervisor and just call it and you know everything or everything run since it's a valid python structure you can easily manipulate that for loop sleep one second and printing some stats and you have an easier way to debug than use watch and PS and stuff like that so we have a system that's booted run application at one point you need to shut it down the way it works usually is the alt command or the reboot command or the shutdown command in a desktop system send a signal to init and then init act init the signal to do a alt is seek term and to ask init to reboot the machine it's seek user one init need some kind of teardown script basically stop all the apps in a good order because you probably don't want to shut everything down maybe you want to shut your applications apps down first and then send some kind of last messages stuff like that saying I'm shutting down and then stopping your system app when everything is shut off you should sync your file system basically writing everything that is already buffered directly to the file system and then you have a syscall to actually alt or reboot the board again Python doesn't offer a wrapper around the sync syscall or reboot or alt syscall so just open the clibrary with ctype and call everything the alt syscall and the reboot syscall are actually the same it's called reboot and they have some magic number for if you want to alter reboot ok what about device if you need udev probably that your architecture of IoT stuff is wrong if you have for example speculative extension maybe some part of the device that you are shutting down for power saving and stuff like that you actually can declare everything from the start and the kernel won't simply use the thing if they are not there that's not a problem if you really really really need to do it you can implement it by listening to uEvent it's netlinks socket name you open it and the kernel broadcast to every listener to uEvent some kind of message structured message with information of what device just get connected what are these vendor ID product ID what class of thing it sends a lot of information udev usually have that act on it with the udev rules and bring the device up so you just need to consume all this packet with the python script if you recognize the device that should appear just act on it and you are the rest also if you don't want to use your event because netlink it's not a great API and you want to use all I think there is another way there is a file a virtual file in the prog file system you can just encode a program name in that file and the kernel will launch that program every time the outplug event is arriving and this program will receive information via argument and all variables ok one of the other thing with p1 if p1 exit the kernel panic it's not a nice thing in any case in other process there are no kernel panic but error are still annoying so you should test test your device so of course during development you should do unit testing but since there is several modules interacting in ways unit test can't really add up to a point so you need at one point you need full integration testing so there is this program called tmu system and then usually it's system rm if you are on rm but you need to have a good description of the machine you are trying to emulating and it's very slow so for testing it's it's for the worst case you should use it but not in most most of the time there is some kind of middle ground between the two you can use qmu but only for the user space program it will still use the host kernel you shoot in your image you use Unshare that's a program to separate name space or process id space and stuff like that it's basically container technology you can use this tutorial to reprogram docker from scratch and use that to test your RAT image it's quite fast so that's a nice thing but you don't have access to the right hardware so you need to mock everything a few pointers if you want to mock Wi-Fi there is a kernel module called this one or this thing that can do hardware simulation you can have virtual Wi-Fi interface pop up and you just use WPS if you have to mock serial for example because you have the GPS chip talking over serial device you can use pseudo-terminal if you have to interact with slash sys you can easily use Pfuse making a mock file system just the read and the write implementing the open read and write syscall and outputting some kind of default value if you need to mock slash dev and especially the IOCT alcohol there is no easy way some some parting stuff the main problem with that is you can do everything in Python easily but it's not necessarily a good idea when you want to do IoT you really need to be pragmatic a little bit of batch is largely superior to a full blown program in Python sometimes you simply don't need that feature that you want for example in the supervisor you see I redirected some input because it's really necessary but I didn't do the output so basically it's always printing on the console maybe it's not needed maybe you just use your syslog your module and not using your Python logging module for everything maybe you just use it to some specific application and in any case the program stress is your best friend the one last thing you're probably thinking in this talk wow this guy recorded a lot of things reinventing the wheel and stuff like that all this fake Wi-Fi logger is online you will see it's less than 1,000 line of code basically replace most of the things from boot to alt running application doing stuff obviously it's not everything because it's leveraging libraries tens of thousands of line of code with a little bit of glue what I presented is the little bit of glue the main shift if the switch from a bash base system using big stack like system day and all it's ecosystem to a Python base kind of system using the standard and other library to do most of the stuff alright in summary the key point of this talk was in IoT you should really model your system with finished state machine don't consider that you have only one state and if that fail we don't care you can write your own init and supervisor in Python it's quite easy the result are usually simpler and easier to debug for specific purpose system I'm not talking about general desktop system if you want to re-use that system for other project you probably want to generalize it at some point but if you want to do it only once and make it evolve with the product it's usually a nice way you should really evaluate the possibility of using high level system to handle the IPC instead of positive primitive and container technology and emulation is great to test embedded system so thank you for listening to me it's done