 So this is going to be a presentation about various high David David a presentation about Techniques that you can apply to reduce the size of your system So first I'm Michael the knacker Founder of free electrons And I've been interested in in topics like size but also Fast boot time and a small size is a requirement for for fast booting like the fast the smaller the kernel is the faster it boots And yeah, there's less data you have you can copy from from storage so Why reduce size there are various reasons for doing that first to to run on IOT systems very small systems We like to use Linux Because of the only Functionality that it provides which is sometimes useful Some people can also run Linux as a bootloader So that you don't have to re recreate drivers For your bootloader and want to reuse as much as you can from from Linux Of course boot fast booting faster is one of also one of the cases typically I've seen people booting Their systems on an FPGA like the SOC doesn't exist yet And it's really slow when they have to emulate it on an FPGA So it's nice to have a kernel that is as small as possible Because like it takes a few minutes to get to the command line prompt even more sometimes So it's it's nice to have a very small system Also one of the goals could be to reduce power consumption And there's also the idea of running whole system in internal RAM if that's possible if the CPU has enough RAM Internal RAM that's the case of one of the processors that are mentioned on the tiny that we key we key kernel or what page they have like a few Processors that could fit the the whole Linux kernel And system inside the internal RAM because the RAM is expensive in terms of power consumption needs to be refreshed if you want to keep things inside and I just added the security at the last minute because it was talked about In the keynote this morning like it you could also try to add some security by reducing the attack surface on your system like not implementing system calls and then Helping I mean reducing the functionality that could could be vulnerable The reasons for for this talk is there's nothing much new unfortunately in this area But there hasn't been any talk about size since you'll see to one year and a half ago But I wanted to see what how things look like now What what projects are what progress projects projects may have made And also as a personal interest looking at things that didn't try yet as and she had this experience with you like Trying with the muscle library things like so things that have been around for quite a while But which I didn't play with yet things like toilet toy box GCC LTO Some experiments with the latest GCC versions compiling with clang and things like that just to share an update and So there's nothing extraordinary and I don't expect too much But at least you have some figures which which may help you in making your decisions and trying I guide you in the choices that you will make And it's also I when I submitted this talk proposal I also hesitated between above and a talk So I'd like to have like half of the session dedicated questions and answers and proof I mean people talking about what they could do to to improve size issues and How to how they could get involved what we could do as a community to to make things better so that there's a like a Both part in this talk if we have enough time So essentially how small a Linux kernel in the system can be essentially you when you have like two six megs of RAM for that that a regular kernel will fit within this size and You need something like eight sixteen megs to have enough space for for use For for use space, whatever you do in use space for your applications and the allocations you make Of course, you have if you have more RAM you have more performance because you can can cache things As far as so I'm talking about a regular system without really doing anything really hacky to to reduce the size Then storage you basically need between two and four megabytes of space To store the emitted kernel and use a space can fit in a few hundreds of kilobytes. That's really nice It's easy to have a standalone application that fits in within a few hundreds of kilobytes at most And so if the user space is not too complex 8 to 16 megabytes of space is really more than enough If you have a dedicated system, of course So I'm going to give you a list of things you can do to to reduce size a little bit So of course, there's the OS Option of GCC to reduce size and it's nice because it like it automatically selects the optimizations Based on what GCC can do. So essentially it's minus oh two plus minus the optimizations that increase size It's quite impressive if you look at the All the options that are provided by GCC if you can really want to investigate what GCC can do with you have a very long list of options with very detailed descriptions of the Approaches that are that are taken. It's very interesting I also make a quick check to see how What did to see the impact of you're using a more recent compiler On size so that I did that on arm on on the board that I booted So between GCC 4.7 and 6.2 doesn't much change in terms of the optimization power I just saved 0 to 4 percent in size Internal size, so it's not so much don't expect too much from GCC. It's already already good doing a pretty good job There's also the link time of optimizations Feature of GCC since GCC 4.7 Essentially what it does is GCC keeps more information about the source in a special format. I Mean in a special elf section when it creates the O files it adds some more information in special sections And this is used at the end when linking all the object files together to to have better optimizations like inlining across the various objects Removing that detecting that code and things like that And surprisingly, you know, so it's it's nice to something you should try when you build software Surprisingly, it even works pretty well on single.c file files So I find I found in that pretty nice project which is called From this guy SMCC project single file programs this guy took a few Regular programs like Ong and another a few others and just just put all the the C files together in a single C file And it's very nice for GCC benchmarks So C compiler benchmark because you have just one big dot C file and and see how you can play with the GCC options on all Those big C files. So this for example, Ong Program was fitting in when that's seven megabytes So I used it several times To to to make some measurements. So here if you want to compile with LTO, we just add minus F LTO to it in addition to OS This is a link to more details about LTO in GCC So with Still on Ong dot C if you use compare LTO without LTO and with LTO you can save about Two to six percent in the stripped executable, which is quite good And given the size of the executable at the end so it's it's pretty pretty nice That was on x86 64 on my essentially on my laptop And in for arm. It's the same kind of saving as well To minus two that eight percent in terms of the stripped executable executable so you can just like reduce a few Essentially a few hundreds of kilobytes sometimes if the executable is big so that might be with it or a few tens of kilobytes Don't compare the x86 64 Code of course code size with the arm when you were talking about 64 bits. So it's not fair And then I tried clang versus GCC versus GCC So on regular GCC on that are on x86 64 Yeah, I made a test and still with Ong and also Confided with GCC clang on the same same program and I got a five percent size reduction out of the box without doing anything Just passing OS, you know clang is trying to meet this to be compatible with GCC So you can pass the same options not FLTO, but essentially the basic options you have they just work So it's minus five percent and then I compared it again with a GCC LTO See and GCC LTO is trying to achieve something like what clang does It's a bit less efficient. So it's less minus two that's seven percent here So it's doing some of the job that clang can do GCC can still win though for very small programs like in hello.c. It's It's a bit a little smaller one way one that two percent Which isn't many bytes though because it's a small program You may wonder also on on the arm platform whether it's whether it to use thumb from to actually on the latest processors or Arm the arm instruction sets. So the arm instruction set is 32-bit and thumb two is 16 bit It's more compact, but to express the same things you need more instructions And therefore you don't like divide this the code size by two of course So I up to first to recognize the code that you get It's it's good to do you I am doing armlinx.objdom-s to like On a on an object code turn it turn it into assembly and I said and I and then you can see the size of You can see how the code look like so for arm. You can see the addresses are multiple multiples of four And you have 32-bit instructions Well with thumb you have like addresses which are multiple of two as expected and also 16-bit instructions as well So you that's a way to recognize firm I didn't I didn't find any other way to recognize fun than disassembling And I did that because it turned out that my compiler by default was compiling for fun The way that you have in Ubuntu compiles for fun by default so You don't really have to think about using thumb because this often built-in your compiler Depending perhaps on the compiler you you you get on your platforms So to compile an arm mode you can use minus me and arm And and thumb instead and you can also of course tune for a specific specific CPU and see where whether you can get in some Some code size So thumb in that case was like minus seven percent smaller than pure Arm though when I was compiling with arm. I actually got a mix of arm and thumb code So I don't really know what this happens, but that's what I observed With this program I had some with mr. And I had some thumb code inside as well. I Don't mind The next question is how to get a small kernel Since I'm not going to talk about the past Could the very old colonel so since 3.18 you can run make tiny config Which was brought by a Josh triplet Is see around in the conference? No And I looked at what tiny config is it's actually just make all no config plug plus a Config fragment or don't remember exactly how to call it So if you look at the make file is just all no config plus a few edit settings that That actually reduce size so it's even smaller than all no config all all no config Look at that. It's what it does is Use GCC optimized for size. So the code may be slower, but it's smaller You turn on kernel exit compression It is optimized in inlining the slub support so slub is one of the low-level Slab allocators. It's a little smaller than Then slab or slab in size. You just save a few like six kilobytes or a 10 kilobytes Though I didn't check on the memory usage So maybe it's even better. It's supposed to be like saving a few hundreds of kilobytes in in memory usage in a real life system though It doesn't scale and on on x86 you also have like no high mems and support that's added And I just I then checked the kernel size with tiny with tiny config And it's not so bad like you expect like the kernel size to grow up and up and up and up It actually it's so actually it's going down on our on the art platform So the total I'm looking at the total total VM Linux size Assuming that this is going to be what you load in memory. Do you do what you're going to consume in memory? So I didn't look at the compressed sites though. I will show it as well. So it looks like this That's the full the full size and it went down in the 4 to 4 4 to 5 I don't I didn't have time to exactly investigate why perhaps because of some Some drivers that I guess we see drivers being moved to the proper drivers directory and then being Optional. Yes. No. Yeah No, I didn't know. Oh, yeah No, that's a good the good point It's just a regular make so yep, that's true. So it's bigger than it's it is at runtime. Thank you Any idea what could have happened at in 4 to 4 for the I mean 4 to 5 for the size reduction It's the same compiler like At that time maybe That's good. Anyway If you look at the x86 it's more progress. It's it's more like constant almost constant. So it doesn't grow that much over the course of like 12 versions It's a little bigger of course, but it's it's x86 as well 64 bit as well. So it's bigger so it isn't You may think as the Technification project is not so active that it's going to like grow very exponentially. It's not it's not what happens Really, it's it's still under control. So it means we could like reverse the trend. I Made a test like I really wanted to make sure I have a kernel that boots so I booted the for the 10 kernel on on you a QMU emulated versatile PB arm platform So this that image was 400 kilobytes, which is quite good. We fix that compression and Yeah, the total size and stripped is a little over one megabyte And then I tried to with QMU it's nice because you can reduce you can like simulate a machine with the homage Whatever amount of RAM you want So I Went down as little as three megabytes and when I reached three megabytes in the system I couldn't boot anymore. So at least it boots with four megabytes. So with more aggressive Work, I guess two megabytes or three megabytes are probably achievable So it's in between three and four I have no choice in QMU I think to to like know exactly what the threshold is All right So that's that's one of my questions to what what hardware could we use to to play with those things? And it's it's more likely to convince kernel developers if we're using real hardware than QMU to get some code accepted Okay. Oh, yeah So if you were to buy the state of the kernel identification project project, which was started by Josh triplet And let's forget about everything that went on before They Like about one year ago. They were he had like a few patches left in the Linux text tree and they were they were Didn't have time to take care about them and therefore they were removed From the tree but the tree is still available the patches are still available or we could like Resurrect them if we want the problem is There there's if you follow the discussions, there's skepticism about like adding some extra kernel configuration options to further Allow to remove features from the kernel like the kernel developers don't like to they already have to deal with a K config complexity, which is rather big. So is it really the way to go to add some more kernel configuration options to? to To allow to disable things This has proved to be useful in the past like you see the benefits of the Dignification project and how small the kernel can be and I meant to Like ten years back. I had a roughly the same kernel size as I have now now. So it's like people like Josh have and Mac Mikal and others have been done a great job of Keeping the the size to a minimum It hasn't grown over time So it was useful, but is this the new way to go and if you look at discussions people are more likely to be interested in exploring automated ways of Detecting and used features and use code I'm detecting that and then having mechanisms to remove them from the kernel code So that they don't the kernel is smaller based on What you like you like you trace your system you make it run and see what exactly what you use at runtime Doing some well-defined test scenario and then remove The code that you don't need like the system calls the product the product contents the current kernel command line parameters You never access so that's maybe the way to go according to the kernel developers There's a lack of volunteers with time and the time to drive the main learning effort anyway That's that's one of the problems too a few words about LTO It can be an interesting solution. It was patches were proposed by Emily Keane in back in 2012 The problem is at the time the patch was submitted They were creating new problems but more difficult to investigate problems so they couldn't be accepted back at that time and Yeah, people that Linus like Linus didn't really trust the two chains at that time and we're really afraid of getting new bugs or Creating new bugs with LTO that would be difficult to investigate That may be worth trying again. Oh Another possibility is to use clang But that's so in there one of the slides forward You could also use if you really have a very small system One of the ways to go is to use kernel XIP execution in place in which case you keep The kernel text in flash so you execute kernel from flash directly from flash you never copy it to RAM You just use RAM for data and and allocated data. So It works if you have no flash, of course, that's accessible as if it was RAM And yes, that's the only solution if you have very little RAM like Cortex and 3 and something like that an arm is apparently the only platform supporting it Though there were some efforts on x86 But they are not mainline yet A long time ago, so is anybody still using that Okay, but it's not in the mainline kernel yet, right? Is it okay? Rob So the question is can we emulate flash on a QMU to to use XIP with QMU to like to Developers to play with that We had a patch like 10 years back, but they didn't get accepted. Unfortunately. It would be awesome to have like no flash on QMU If you still want to help with kernel identification and add a few more kernel configuration options you can There are several things you can do is look for Simply object-y Statements in kernel make files and you can see that You can see some opportunities for simplification like here. Do you really need ptrace support all the time? Which takes 14k on arm if you take the size of the dot-o files or without reboot support that would be fun You may not need to reboot because it's Linux. So why would you reboot? Oh There's a way of rebooting anyway. Yes, exactly Another way is to look at the compile logs I'm sure you do that all the time because it takes time to compile Linux and you can just see why am I compiling that stuff? so Then you can wonder whether it's really useful in your case or not and whether it could how difficult it could be to Remove it from from the kernel build process Other thing you can do also. I'm I'm quoting some parts of the Kernel wiki for identification. You could just use nm minus size sort in VM Linux and you'll see the biggest symbols That we saw it's like the whole low Low-hanging fruits the easiest way is the ones that would save most if you remove them You can also also use the bloat o meter from a mac meckle that's still available in scripts That compares to VM Linux files So it's very easy And it tells you which it can detect size regressions functions that have increased in size Compared to the previous versions There's the LLVM Linux project We which is trying to use clang to compile the next kernel and effectively opens the door to nice performance and size of optimizations Probably better than what GCC LTO could could achieve Unfortunately, the project doesn't look very active. I mean from the website at least it does seems stalled any anywhere than that behind Okay busy with other things awesome So could I find the references? Okay Okay, I burn out. I'll ask you actually I'll send a mail to you. I get the address. I'll add it to the slides I Think about nice So there's progress good Now I'm I'm moving to user space and things you can do to reduce size so I compared busy box and toy box and I'm glad that Rob is here so yeah, so feel free to interrupt and and and Add more explanations if needed So I just compared like I built busy box and toy box with like this the same set of applications Not necessarily the same features, but more more or less. I tried to to meet to have the same features. So I compared busy box and toy box busy box was a hundred kilobytes And and and and toy box as you can see for the same features was 84 thousand kilobytes a few thousand bytes so it's It's a smaller for the same set of features As you can see and if you just want a shell Toy box is gonna be a much better Okay, at least it works, but wow, okay. Oh, it can't okay. Yep I think I've caught a talk took them from the painting directory. I'm not sure Okay, that's where I took it from effectively Well, at least if you can try and country contribute if you want If you just any other solutions if you just want a small shell I'm just talking about just something that can run a few things like mount and start up a system without creating that doing that in C But yeah for okay. Thanks So From what I found from toy box it wins if your goal is to reduce size and have a tiny root FS It's apparently better than busy box for a very small Root fast system that just loads a few modules and amounts proc and a few things like that so it's sufficient busy box would win in terms of Configurability like if you for the moment at least if you want to add like color support in LS and things like that or a few things I mean in some applications If you have more elaborate Needs, but it's gonna be bigger And I made some tests with various C libraries So I used busy box as an example of a program that you can compile. It's not really a typical program, but That's one of the compiles that the programs that you may compile. So busy box if you compile it with the muscle library It's with the same Statically with the same configuration as before like supporting 16 commands. It's it's a hundred and eighty 3000 bytes With a ucdipc ng It's two hundred and ten thousand bytes. So it's more Like you say like three hundred 30,000 bytes and if you compile of course with gc against gdipc, it's gonna be like 755,000 bytes so it's much bigger of course, but that's expected So we are if you're still using this ucdipc, you should really give a muscle a try because it's it's nice I Also just just as I was doing that I also compared the dynamic executable size Which should be the almost the same on the old cases versus muscle ucdipc ng and and gdipc But it turns out that gcc is a little bigger then Then muscle and ucdipc can be and if you're looking at the hello. Let's see program With recent compilers. That's that's again. There's nothing very new in that It's just to give you an update and what you can achieve today is with gcc one six of three it's seven thousand three hundred bytes for Muscle so it's very fairly small Executable with of course you can do much better Like you can have a like a hello program in like as little as like 60 bytes or something like that If you do a lot of manual tweaking With gcc the same gcc with ucdipc ng It's going to be 70 67,000 bytes So much bigger And it's almost 500 kilo 500 kilobytes with with gdipc compiled statically so it's going to be very big so For small executables muscle is also a clear winner here Are people using muscle in their products? Or more rubber ucdipc and immaturity problems with muscle Yes, nice Oh, that's nice All right. Thanks. I was using a cross-tool and g to build the tool chain But it's nice to have another solution as well And there's still the old super script that still exists You can get it and compile it very easily. It's just an executable that's Eliminate a few health sections that are not needed at runtime and you just save a few Hundreds of thousand year thousands of kilobytes. It's not much but Every every bite can count In the nice thing about the s-trip is that it's a it's platform independent unlike strip Which is really for arm or x86 or other platforms s-trip is platform independent. You can run it As it's like compile it and run it you don't need to be known it don't need it to be part of your tool chain You have like all those old see libraries, which haven't been updated for a while. So let's assume they are completely stalled I also looked at optimizing libraries on the target and target file system You could use something that's called MK lips that's provided by distributions, but this There's no magic in it It's just copying the libraries that are needed for a set of executables and that's that's something I assume the the The building tools like a build route or a Yachter project to already take care of like not just compiling just copying the So files that are needed for each system. So this there's nothing. It's not very interesting and that's something you can do manually as well Now what I'm looking for is something like the library optimizer from Montevista, which is still available on Sourceforge which actually looks at the executables that you have in your system and looks at what parts of the shared libraries it uses and then Tries to optimize the libraries to remove the symbols that are not in use in your programs Though I didn't have time to investigate that is anyone using that approach to reduce size Another way of course is to have a big static executable that just contains the parts of the libraries that you need If you just need one executable in your system as your main application you get that as well for free But if your system is a bit more elaborate Yeah, like you have multiple programs you need shared libraries and therefore you might want to to reduce the size of libraries To achieve a small fast system size The best solution of course is to boot on an intramfs Because you boot earlier you don't have to initialize a fast system and storage driver your kernel doesn't need to have one so you immediately boot programs on on the file cache So that's very fast You can use a static single executable So no no libraries to share and you just include the parts of the libraries that you need And you have a bit if you have a bigger size of course you can use Regular file systems compressing file systems such as squash FS if you have black block storage, but we're talking about much bigger stuff GFS to for flash also If you have a small partitions, you be I is going to be is going to have too much overhead So you can still you should stick to GFS to and then you could also use Files block file systems on top of Z RAM, which is a compressed block device in RAM So you might you might use that as well but if you have if you have a tiny amount of RAM, it's you won't use that but as a for a system that needs to optimize RAM and Still has enough RAM budget that might be worth it to use this Some conclusions before we we can go into questions So there hasn't been Apparently recent maintaining efforts, but the kernel size is still decent if you use make tiny config And you can boot it easily on the system with less that with four makes of RAM at least So that's uh Yeah, that's what you can achieve at least on arm It would be worth it effectively to try on the x8664 on the Intel boards For compilers you can use Clang or GCC LTO Hopefully for the kernel as well It is a new library worth using which is muscle It's also worth giving toy box a try if you're if like essentially you don't have like much scripts in your system And you just like brand a few commands Toy box is going to help in the reducing the size and As we can see there's still significant room from improvement in user space and kernel space I'm mostly talking about kernel space here The the question is can we add some more configuration settings to the kernel to to make other things removable? It's it's going to add to the complexity That the kernel developers face. So are they're going to are they're going to accept that? That's the open question And I will propose to move to a kind of buff part Any recent achievements that that didn't mention So they're not thanks for the LLVM Porting effort That's one thing. I'll add it to the to the report any anything any people to I've seen something on this area progressing. Yes, Rob So was that for 313 you said At 2013 okay, Rob was mentioning some some patches to To improve temp effect support Back in 3000. Oh, sorry. No 2013 so I'll try to add a reference to it. Thanks Yes, right. Okay, like micro Yachto was that was that micro Yachto or something like that? Okay. Oh, yeah, is there any your Resources you're using that I mentioned and then the most important question perhaps is like What community-friendly hardware we could use to run the next tiny it makes tiny tiny five projects so that we We can convince we have like more Better reasons for convincing the kernel developers to to integrate code If we support real hardware, it's going to be better than just a QMU. So any any suggestions for a nice community-friendly ball we could use for But it has too much RAM, right? You could but then it's artificial they will tell you Please use my Alright So you have no other solution that's that's nice officially Any other solutions for I mean target hardware Yep, I should look at that Oh, yeah That that's probably okay Oh Any anything else what what do you think about? The the kernel tiny think I should have effort is there I'm not sure we have the right kernel developers in the room to discuss that but what what do you guys think? But the kernel tiny fiction vacation prop project should we continue and revive the patches from Josh? Hmm I Could help a little bit like taking one patch and pushing it from time to time is okay doesn't take that much time Is it worth it? You gotta need a good argument for for adding some plastic complexity to the kernel Question parts aren't available for these really small memory footprints. I don't know that It's an interesting exercise. I just don't know there's a whole there's a whole lot of payback Though we're glad that this work has been done for this tiny vacation work has been done because it's useful for us So I guess we should continue in some way for You mean RAM usage or Yeah, go ahead. Yeah Right, yeah So Rob says that using less RAM means some more better cache efficiency and things like that. So it's worth it to To have small less code actually. Oh, yeah Right Unless you have a dedicated instance Yeah, I think the good thing is that Linux is a general purpose system you you never know what users are going to do So let's give them more options. That's what I believe as Linus said So a few references here That's worth reading if you it's more for people reading the slides and wanting to know more about this like For example, there's the document from Timbird. It's it's quite old, but it shows some efforts to like to look for I mean analyzing the the kernel usage and Trying to eliminate things that are not used at runtime Stuff like that and some of some presentations from recent conferences Recent enough conferences given the history of ELC Especially the work of Vitaly will effectively booting Linux on STM 32 systems with quite aggressive solutions for reducing size So you can get some ideas if you want to contribute those presentations are full of ideas that would be worth Exploring and then end up streaming if possible at least you can discuss that with the community and see what What what response you get whether it's whether it to have it mainline or not During this week, you have other talks that that if you're interested in size There's a tutorial from Rob today But building the simplest possible Linux system So you understand how how simple a system can be and Rob is one of the best to make simple things There's also the talk about Optimizing C for micro controls from Camratch It's more on the IOT track, but it's interesting too and also GCC clang optimizations for embedded Linux on Thursday, which is could be interesting as well Any questions right in time Thank you