 So, welcome, and thank you for attending this session. We will be talking about improving embedded system boot time, but actually a word has missed from the title, and that is Android. So, mainly I'm focused on the Android boot time issue. So few words about me. I am a physicist, or at least I was a physicist. I work on the autonomous navigation of light and then air systems for planetary exploration. And since 2009, I'm working on Android Embedded. Welcome guys, so you can. So I'm working on the Embedded platform since 2009, and I'm currently the CEO of Kinetics, a company based in Santa Clara, California. And one of my favorite hobbies is to work with retro computing. I don't know if any one of you guys actually is passionate about it, but I like to restore old Atari 2600 Sunnyvale heavy duty 1977, so that particular model, and writing some assembly code. I guess it's great to have that control of the machine that sometimes nowadays in embedded system that we are used to, it's much more complexity. A lot of easier things to do. So today we'll be talking about Android and the optimization of the operating system by Ibernation. So we will take a look to the Android boot sequence first, and what are the optimization that you can do on the code boot, so using standard techniques. Then we will be talking about Ibernation, and in particular using the kernel feature software suspend. And we will have some examples, real example based on the IMAX 8 mini chip, SOC by NXP, working with a particular board that is featuring this chip. First of all, how many of you are an Android platform developer? Okay, like a decent number. And how many of you have programmed Android from an application standpoint? Okay. So as you know, Android is really common, you know, you can use Android in different embedded products, and in my experience I've seen products like embedded devices for medical applications or spectrometer, scientific type of tools that because they want a fancy UI, they are targeting Android. Some of the things that was really cool about Android back in time was that anyone can be an Android developer because the language in the entire SDK was on a sandbox. And that kind of easy way of developing software was also on the other side of nightmare to work on the platform itself. And you know, you guys are maybe familiar with the boot sequence and why Android is taking so much time to boot. Of course, there are some common ground with Linux, so we have of course like the booting sequence from the boot ROM of the chip, the bootloader, the kernel, the Enit, and then from the Enit, you have a lot of demons that are launched. And this is how actually Linux works, right? And then you have a system ready to go. Instead, Android adds a lot of other components because Android is a sort of sandbox for developers to develop JVM-based programs. And part of the services that are part of the system are developed in Java. So part of those are like native programs and part of others like the power management is done in Java. Another is like a comparison between Linux and Android. And of course, as I said before, to the Enit that is like common ground. But then we have a couple of things that are happening on Android that makes the platform completely different. And by the way, the operating system is completely different, too, is a more remote procedure called kind of operating systems. It's not really like Linux where you, you know, if you want to modify some demon components, you have some configuration file. Yeah, you have configuration file in its file on specific, but there is a lot of things that happens in the code. So you've got to actually look into the code to modify some aspects of the system services, native or Java. This is a very popular picture from Karim Yagmur, one of the main experts on the field of the Android platform. And you can read this picture like counterclockwise. So you start from the CPU, the bootloader, the kernel and Enit system. Much more common ground with Linux, native demons. But what actually kicks in and is different is what we call the Zygote and the Android runtime. So because everything is handled by a virtual machine, Zygote is in charge to fork itself using some kind of lazy copy mechanism. So we don't run out of memories if we are launching hundreds of applications. And so every time you want to launch an app, a Java app or Kotlin app, you are forking Zygote all the time. After, you know, we have an important block, the system server. The system server includes a lot of services that you are actually using on the phone, like the battery manager. And everything is really also close to the hardware. And then, of course, you end up on the launcher. That is where you land all the time that you power on your Android phone. So in this really complex picture, you know, there were like several ways of optimizing the boot. And you know, there is a lot of work that has been done from 2011. There was a very popular presentation done by Tim Bird that everyone, I think, knows here, right? And he was trying to identify what you could do on the system to make this boot sequence a little bit faster. So now that we introduced a little bit of the components, right? So we can read, you know, these lines trying to, you know, understand what was the advantage. So starting Zygote as early as possible, right? And that was especially slow if you were working with encrypted partitions. So if you move, so data usually can be encrypted. And then if you move partition for the classes that he's reading to, from data to cache, you can save time, because you don't have to decrypt those class files. Make parallelize the package manager service. Every time you start Android, you scan all the packages installed. So if you're running these things on multiple threads, you can do it faster. You can split classes by importance. So what is needed by the system server, right? So what is part of the system? Or what would be something related to the apps? Or because you want to show up the GUI immediately, right? So you want to give the user the gratification of having the launcher in front of him as soon as possible, you can do some prioritization on the system services, like bringing the window manager immediately in front of the user. But with all this work that is like a decent amount of work, it's not really, there's a lot of things that you got to fix on your system image, you get like 30% of boot time reduction. So I want to introduce a use case that for me, at least, was really important. And I was also looking at customers. I was trying to identify how they were using Android. And most of those systems in the embedded space, I'm not talking about phones, is a kiosk type of system. So you have just one application, and that application is basically doing everything, like talking to other components, of course. But that is where the user lands after the boot. So I want to just introduce what I call, not just me, but it's a common term. But I want to just define that. That is called the single image mode. So single image mode, so if you are working on a specific context, what if you create a snapshot of that initial state and every time you boot the system, you just load that particular state? So it's a kind of different use case between laptop computers or phones and embedded systems. But in this case, you don't have to hibernate, and we will go into the details all the time, from different states, or let the user to hibernate the system from different states. But just you have one state you want to land, let's say the Android launcher. And the user from there can take the control. Or you may select another state, like the application already opened, like the custom application, the kiosk application already opened, like opened, and then you want the user to land there after booting the system. So this is kind of the picture. So if we have a system in a consistent state, that means that it's like a quiet state. And we create a single image by hibernation. And then we store the image on a swap partition. By the way, we have to create a swap partition on the system. And then we power off. Then every time we can tell the kernel, every time you boot, you just go and load and resume that particular image from the swap partition. So you power on, you load the image from the swap, you restore the image in memory. And you have the consistent final state that actually is the same of the consistent initial state. And you loop, that is the use case that your device will be executing all the time. So maybe it's worth just to recall some power states that are really common. First two are really not in this case. We are interested the suspend to idle and the standby. Instead, the more common suspense state that you are experiencing on your Android device is the suspend to RAM. So when you press the power button, a wake lock has been released and the system goes into suspend to RAM. And then we click back again, the system is back. And the system is not off. So the use case we wanna achieve is just we wanna unplug the power. And in this case, the memory is kept, of course, powered. And then we have the iBurnation. So kernel stops all the system activities and creates a snapshot image of the memory, everything, and he writes on a persistent storage. That can be SD card or EMMC. Power off is not mandatory in general for iBurnation. You can iBurnate and don't empower the device so you can stop the iBurnation flow before sending a power off event actually to the system. But this is what we wanna achieve. So we wanna like power off the entire system. So everything that you do on a particular platform is really dependent on the particular SOC that you are using in your single board computer, in your dev kit, in your system in general where you are developing. So everything is really tied to the lowest level of the system. And of course, you gotta deal with a lot of things at the kernel level. So right now, the power management has been almost like divided between two models, what we call the systems leap model that includes the suspend to run, suspend to disk and the run times leap model that is the suspend to idle. So if you look at the kernel and drivers, you will see these two distinction when they are present. And this is done on the device or like bus or class driver implementing function to take care of what is happening when you suspend the system, iBurnate the system and when you reboot and restore everything. So usually, if you're familiar with how drivers are developed, what you see is there are a lot of callbacks. In particular, in the last line, you see a structure that goes from the device, the platform driver to the power management element of the structures that actually points to all the defined functions, callbacks to handle that are passed to the power manager of the Linux power manager. In particular, we are using software suspend. So software suspend is like a kernel program feature which is part of the power management since the kernel 3.8. And this is the default framework that everyone nowadays is using. And there are different stages. So when you go through the iBurnation flow, you gotta take care of different stages. So first of all, you wanna create an image. And so you have a sort of callbacks to prepare that state. And then you have the freeze callbacks that stops the system, right? And then you wanna save the image so you need the system again, right? Because you gotta write something. You wanna take a picture of the system and then you gotta write the image somewhere. And so you cannot do that if you are like in a deep, deep sleep state. So you do a sort of defreezing again and then you call the power off. So if you go around like the kernel sources, you gotta take care that this workflow is done right. And again, there is no magic sauce. We will be talking a little bit about this even during the presentation, but if you are working with a specific chip vendor, probably you are working with a kernel that is provided by the chip vendor. And usually this is not the mainline kernel. Usually it doesn't work like really well. It depends. They didn't test everything, right? And so everything is really platform-specific. There is no magical way to do that once and then apply even from the same vendor, you have different chips, so different SOC with different version of Android. And we will see how different version of Android they carry a different version of the kernel. And things are different than you have also components like the GPU that changes. So the code of the GPU driver in kernel space it's different from the previous version of the same GPU device. And so again, even in the same family, it's not really straightforward. So one thing is work your driver's power management operations. So some devices may have not that done right. And it really depends also on the use case what you have to bring back from iBurnation, right? So you have to just figure it out immediately what is your state, the initial state, right? You remember when you wanna take a picture of the system and what are the devices that are involved and you wanna bring back to work, to life, right? So drivers are implemented in different ways. There are different ways, flavors that developers do drivers. One of the things that is really should be a standard way is to implement the power management operation using the power management operation structure that is defined into the kernel. Some driver, they can still do the old way to define the suspended resume operation inside the driver structure, but it's not really standard. And also in the power manager, if you go around in the, I guess, the PM.h definition file, you will see that there are a lot of macros that you can use to save a lot of typing, right? To pass to the system, to the power management, the callbacks that you define at the driver level. So this is a picture that just like summarize what we have been talking. So you wanna hibernate. So you freeze the system and you wanna create the image but then you defreeze the system and you wanna write the image into the swap memory. And we have to figure out how this flow is handled by all the devices that you wanna bring back to life. And some of those may be easy, but again, they depends on the platform you're working on, on the version of the kernel you're working on. And some of others may be less easy, especially when the system is dependent on user space blobs that is really popular nowadays from chip makers, right? They don't wanna give you the secret sauce of some components, so it's not open, and you gotta live with this. This is the restore flow. So basically we go back, so we wake up the system, we freeze it, and then because we gotta start from where we are supposed to bring back the system, and then we restore everything in memory. So it's just the reverse process that we did before. There is also things that you have to do at the Android level. So Android has to be put in the consistent state that we were talking before. So we wanna remove unwanted wake locks, and we want to force all the threads to release semaphores because I'm talking about Android semaphores, not Linux semaphores. Eventually, when you bring up the GUI, you may have some noise around because the GPU may cache something that has not been discharged, and then you have some pixelation or something that is not looking right. You may want to repaint something at the surface-flinger level. You wanna eventually look for pending surface-flinger transaction. This goes back to what is the state I'm taking the picture, right? So again, it may be easy for certain states, but if you go into more application-dependent initial state, you may have to take care of this. And of course, the sync between the hardware composer and the surface-flinger. And the bad news is that at that level, so the obstruction layer of the Android user space between the low-level hardware and the user space, there is a glue, right, that may be not open source. So it's proprietary, but still you have the code so you can look into the code, or maybe not, they are just binary blobs. So let's go back to that concept, right? Code ones just run there. That's the beauty of having Java running on Android, it's fantastic, but at the system level, it's not like that at all. So every time, you know, every... So kernel and Android user space are tightly coupled. There is nothing that you can do. So you got to redo the job all the time and try to understand what changed in the driver at the driver level and take care of what we have been talking, right, the power management of each device. And so we got to rework some features to the particular Android version. For example, Android 9 on the IMEX 8 is using the kernel 4.14. Android 8 was using kernel 4.9, right? So the code base may be different and what is different for sure is the GPU kernel driver. So the GPU is evolving from product to product, especially when you change family of microprocessor sources, you switch from an IMEX 6 to an IMEX 8 even if the GPU may be the same in terms of performances, the code may be like quite different. And some components that I was mentioning before, they may be totally binaries. So what if something is not going to work or is not working and you need to debug? There is no way you can debug. And I remember that we were like frustrated at a certain point that we were thinking about having some open platform where even the AdWords Composer or the GPU were completely open. And so we can actually look into the code and debug and trace what was happening because we are talking about tracing the system and debug the system, not only putting the print K around the code, but also trying to have a picture of what is called and when in the system. So having something more as a big picture before going into the detail. These are typical examples of things that are really closed like the GPU, hardware acceleration, hardware Composer, abstraction layer, the OpenGL libraries GPU dependent, of course. So we wanted to do some tests on the newest platform from NXP and in particular the IMAX 8M Mini. So the IMAX 8M Mini is right now the cheapest. Today we are like doing a lot of work and we are doing a lot of work cheapest, today we are like doing a lot of development around the Mini is a very popular SOC used by many customers. And in particular, we were using the boundary devices, Nitrogen and Mini. So Nitrogen is like a sort of hardware, single board computers that features different microprocessors. One of the most popular was the Nitrogen 6. That was featuring the IMAX 6 Quad. And this seems to be a very successful product too. The sweet things is that he has like an EMMC or SD card but we can use the EMMC and he has two gigs of RAM. RAM, it's like an interesting point because when you wanna hibernate something, you need RAM. You don't need double of the RAM but you need RAM to prepare, right? So you have a running system and you gotta prepare, so allocate pages to have that image then persisted. So again, if you have like a 512 megabyte system, you may have some problems and then you gotta free memory somewhere else on the system. I guess the CMA is probably something that you wanna look at immediately to just free some memory like many SOC and allocate too much CMA memory, right? For DMA, probably something that is worth it to take a look. We were using Android 9 with a kernel 4.14. So it was interesting to start with no hibernation image optimization. There was a great talk yesterday by a colleague here that was talking how to make this really performant, right? And for minimizing the stress, of course, on the NAND and also for having less data to load. There is something that you can do off the shelf from the kernel that is like the dump cache. So there is something that you can do from the CSV file system, just dump the cache and see. But we didn't see so many improvement. Our image has already 125K pages, right? One page is 496, so for 4096 bytes. And pretty much was consistent even trying changing the dump cache parameter. But we were more focused on bringing back the system. So there is a lot of work to do still there. Again, it requires less memory allocation because we have more RAM available. And so this is like a good thing to do some tests with. And we add like a boot time of 12 seconds. So again, the important here is to say that the image has been loaded from the kernel. So you power on the board, you boot starts and then the kernel kicks in and the kernel loads the image from the swap partition, right? In this like workflow, we reach like a time of 12 seconds. Let me say, this is a video that actually shows you. So this is the kernel stage, right? So here we are loading the image from the swap partition and then we are loading the system like on the launcher. We found this to be the consistent state we wanna start working with, right? And so we have like a time around 11, 12 seconds. That is like from 40 seconds to 12 seconds it's like a great improvement. I mean, it's yeah, you can do something more. So what you can do is load the image from Ubooth, right? So restoring from Ibernation is just copy pages into RAM from your swap partition, but this is what actually Ubooth does, right? But there is a lot of things that Ubooth doesn't know. First of all, it doesn't know the software suspend binary, to invoke the software suspend. It doesn't know the initial address to jump and starting loading the image. And more, it doesn't know something that is called no save pages. So when you go into suspend, there are some pages that you don't save. They're not necessary for, you don't need them. And the kernel knows about those pages and the Ubooth does not. So you gotta do a pre-loading stage where you have to teach Ubooth how to skip what is the length, what is the start and the end address of the no loading pages. So you need some modification of the kernel code, modification on the Ubooth code. And one important contribution was done a time ago by Russell Dill from TI. It was working, so the IMEX 8M is a 64-bit platform. And so there isn't a part that is an assembly code that takes care exactly of the CPU resumed function that has to be passed to Ubooth, that has to be rewritten for 64 bits. But there is a lot of things that has been pointed out by Russell that are really, really helpful to guide you on this stage. So if this works, you can save other, and we are, so right now I don't have a result to show you, so we did the kernel and we are pretty happy with this because we still can do system that boot in 12 seconds. We can probably reduce another two seconds or three seconds. That is a guess, right, Eric? So it's a guess. It depends in because we tried, we did some tests. There was some loading issues. So the loading stage was taking too much long. So we were like losing all the advantage of having Ubooth now in the picture. And so there is still things to clear a little bit and polished and try to actually have a eight second boot loading everything from the bootloader stage. So this is everything for today. So I guess that I am on time. Yeah, I am on time. If you have any question, I hand you the microphone because the session has been recorded, okay? So any question? No question, okay. Have you had to handle secure boot with the Ubooth method or did you find any ways to do that? Okay, so the question was if I was using secure boot that is you're talking about NXP secure boot, right? Okay, so secure boot is like a way to trust the bootloader because you store the keys of the, you sign the bootloader and you store keys on the fuse bits of the SOC, right? So we did not do that because it's really, first of all is really a use case that is difficult to debug. If something goes wrong, you lost the SOC. There is way to do some tests, of course, but theoretically, because you're not changing, you're just patching the Ubooth for supporting some loading stuff, it should be independent from this mechanism. So it should not be any, again, the testing may be a little bit harsh, but not impossible at all. So of course, we are doing secure boot right now for some customers and not for many. Many customers are okay with the Android chain of trust, if you wanna say that, so the Android secure boot, I mean the verified boot, right? And the SE Linux and the n-varity. So you have Android already built in some security features that sometimes on Android they are enough for giving the system like a solid. But yes, this is a very important part that we have to test and it's not documented at all, by the way.