 Hello, good morning So it's time to make a start on this. What a lovely audience if you don't mind I just can take a quick selfie of you guys and ladies Because normally nobody turns up for my talks Well, some people do Okay, so we need to talk about system D Get through this stuff. This is me. I've done some stuff. Let's get that so Yeah, some of you may remember a couple years ago I did a talk on boot time reduction boot time optimization which you can find at that URL there and It was mostly focused on the bootloader and the kernel parts When it came to the user space part I kind of bailed out and I just put in an init equals whatever program I was running So when a boot up it just runs In this case Qt demo, whatever Which started really really fast got to say that was a good solution, but it doesn't always work out because there may be some other dependencies So in this talk, I really want to look at the things I was missing out In doing it that way so this talk then is going to be well initially it was going to be about a Really hard focus on boot time optimization But then I kind of discovered that as I delved into this I was having to describe more and more stuff So it's kind of more slightly into a introduction to system D And then some boot time optimization following that So I'm going to be covering what is system D. What did it do? Bit of an introduction to as to how system D works Then we talk about optimizing boot time and Then there are a couple of topics I'm going to cover at the end before we're all finished I Got to say that this is not if anybody here is expecting The real deep dive into system D. This is actually not that talk This is the embedded Programmers view of system D and how the hell do we make this thing work? So first of all in it demons What do we need demons do you got to have an in it demon otherwise nothing is ever going to happen? So the in it demon is the program that's launched immediately after the carnal has booted So once the carnal got up to its done its initializing of its all internal structures and device drivers or whatever It execs the very first program which therefore has process identify PID one and then in it starts running and It starts a whole bunch of other demons it configures a bunch of stuff through various obscure means and Sorry, that's the wrong place And then once it's done the initialization it will then sit in the background just waiting for things to happen So part of it has to part of things that has to do at this stage is it is the parent of last resort so when a Process dies which has children then those children get reparented to in it for example It also is monitoring some of those demons and restarting them as necessary And it's also killing off zombies and all the all kinds of other exciting stuff So there are several in it demons we may consider So here I'm looking at three busy box system 5 and system D So system sorry busy box in it is the simplest of the lot It's part of busy box. So there's no real overhead to it is just a couple of shell scripts and it kind of works So for simple embedded systems that is by far the best thing to use System 5 in it on the other hand is a bit more Flexible than busy box in it it has this concept called run levels And you can switch from one level run level to another which allows you to start and stop a bunch of demons when you make that switch But system 5 in it is kind of slow because it's a bunch of shell scripts Each one of those shell scripts takes time to launch and so on plus it is 30 odd years old So then we have system D Which is the the modern way of doing things? And that's what we're going to talk about basically So system D it is not just an in it demon. It is a way of life Well kind of so system D is aims to be a general-purpose system manager and It does a whole bunch of things I've listed I mean I if I were to list every every component of system D. It wouldn't fit on the slide So I've edited things down a little bit But the things that are important to us are the system D itself There's a journal Journal D, which is doing the event logging kind of replacement for the old assist log stuff There's the login demon which is handling logins on terminals and such like the event demon Udev has been around for a long time But it's now integrated as part of system D And you that you'd have is basically managing the dev directory and also handing hot plug and other kernel events Network demon does what it says it configures network interfaces time sync demon handy first synchronizing your clock and Resolve demon is basically a wrapper around a DNS resolve So those are the kind of things that we want those are kind of services want in our in our embedded systems So why system D? Why go to all this trouble? So system D is kind of reinvented from the ground upwards Started in 2010 I think by Leonard pottery pottering and case Eva's red hat and They basically started with a clean sheet of paper and said how should we design a system startup? demon So it is it we have now explicit dependencies between services whereas Wacken the old System 5 in it days The only ordering was was by the number of the script you put into the RC whatever directory It Given that dependency information it can build up a dependency tree at boot time and then by walking through the tree We can then do parallel starts as we walk back up the tree At each branch node we start everything at that level and then work back up to the route So theoretically at least it should be faster because we're doing things in parallel No more shell scripts So system D doesn't Depend on shell scripts. So that's a that's a definite plus Plus there are some handy things which are good for embedded So we'll look at this briefly at the end of the presentation, but we have per demon resource control so we can set CPU and memory limits on each program if we wish and It has built-in support for watch dogs. So we can set a watch dog So if a demon stops responding for some reason it stops responding to the watch dog It will get restarted So all the kind of things that we need in our systems so The aim then is to for at this point is for me to describe the the base some of the basic concepts of twi behind system D and I'm attacking it from the point of view of unit services and targets I believe if you understand these three things you basically understand system D So a unit Is a thing everything is a unit. It's a text file A Service is a particular sort of unit which describes a service in other words a program that you run and Then a target is a group of services So as we go down that list we go from the particular to the general So that's a quick look at these things. So system D units They're just text files and they live in one of these three directories The nice thing is that when system D goes looking for a unit or any other configuration files It scans these directories in this order. So it looks in the ETC system D system directory first Then it looks in the run directory and then it looks in the lib directory The idea then is that the stuff you put into system as I into ETC system D system This is the local configuration anything you want to change you do in that directory Whereas the stuff that's in the lib directory lib system D system That is the system default stuff if you do nothing in the ETC directory then what is in the lib directory wins So that means it's very easy to customize stuff You just put newly new units into ETC system D system and they will override Whatever the system defaults are in the lib directory So for example, it says at the bottom if you want to disable a unit the quick and dirty way to do it is Either create an empty file with the same name in the ETC system D system directory Or if you want to be a bit fancy about it, you can create a file. That's a link to dev null so units A unit is Well, all the things we're going to talk about are going to be units every unit begins with a section marked unit and In my example here, we have a description, which is just some human readable code Message rather we have documentation which references a man page in this case And then we have a dependency which I'll come to in a moment So the dependencies The dependencies are the crucial bit because by reading the dependencies of all the units System D can build the dependency graph and then it can do its parallel execution thing So in the previous example, we have a requires So requires is the most common sort of dependency it says this unit requires some other unit or units can be multiple So that means that those units when I start this unit it should start up the other units as well You can also have wants which is kind of like requires but less Is a weaker former requires in other words if the If the unit If I have a wants instead of requires it's not fatal if the the wanted unit doesn't start and Then conflicts is the opposite of requires if I have a unit that conflicts for another unit It means the two cannot coexist at the same time if I start this unit It will stop the other unit if the unit gets started it's gonna stop this unit Okay, but we can't have them both and then we have another similar concept called order So the requires etc is the dependencies the order is the order we'd like things to happen They are different concepts different things So typically if I put here that Where is this? Well, okay in this particular unit. It says after network target So that means that when we start this unit up, which is going to be a web server Then it we're going to start this after in this case network target is started Which kind of makes sense because a web server doesn't make much use without without a network So after is kind of Similar to require but also different So just as an example if I have three services ABC if I say a Requires b and c and it is after b Then that means that we so when it processes processes that requires statement It'll actually start all three things or can start all three things up in parallel but if I put in a An after statement after b for example, it means it's going to start basically See and then be and then it will start a Okay, so we're introducing some order into the way things are going to happen If you don't add in an ordering, then it's just however system D chooses to do it. So that's the unit section The next bit then is a service so a service is a particular sort of unit and It has a unit section and then it's followed by in this case a service section This is again for the light ttpd service So in the service section we basically say what program it is you want to run and any parameters we want to specify there So the key thing here is the exec start statement Which gives the name of the program like ttpd and the parameters required to run that and In my example here, there's a net exec reload which says that if we do a reload of this service You'll do that by Staining a kill minus hub to whatever PID it happens to be running with and then the Third part of the of the triumvirate is the target. So target is a group of services So targets their units they end in target and they look something like this The interesting thing is that when you look at targets Initially, I thought that if I looked at a target it would have a whole bunch of dependencies on services So that if I start for example the multi-user target, I would expect to see all the services required for multi-user In fact, when you look at a target Such as this one here the dependencies are just on other targets So how does that work? Well, I'll come to that in a moment. Oh, yeah, so I meant I've slightly forgotten this bit here There is a thing called a default target. This is that link shown there This is the target that's going to be started when you boot up So it's called default target. It's a symbolic link in this case to multi-user target, which would be the The non-graphical login So, yes, how do these dependencies with targets actually work and the answer is it all works by reverse dependencies So we have two more keywords here requires and wants So these are called outgoing dependencies. So essentially I can within my service. I can say my service Is wanted by the multi-user target So instead of having a pointer from multi-user target to my service It actually goes the other way and then when it starts a multi-user target It will see that it's wanted by a whole bunch of other services So it will then start them all up in the right order and the way this is actually implemented is kind of interesting It's done with a bunch of symbolic links So if you look Actually in the ETC system D system multi-user target wants directory So this is the list of incoming dependencies in this case for the target multi-user target and There you will see there are symbolic links created by the wanted buys For each one of the services. So in this case here, it's for something called simple service, which is just a demo program Okay, and if you look further in that directory, you'll see the symbolic links for every single one of these things And then I need to talk a little bit about system CTL So this is the the driver program which allows you to control system D and make it do different things So you can do a whole bunch of things with system CTL. This is just a brief list But we can for example start and stop a unit for example a service We can enable a unit so when you enable a unit, this is the point at which it installs that symbolic link We've just been talking about so if I enable in this case simple server That's the point at which it creates that symbolic link If you are shipping a system which has a bunch of units enabled by default Then essentially your system image will have these these links already created in the Lib system D system directory And if you're using pocky or build root or whatever that will create the symbolic links for you in the image before you put it Onto the target Disable just deletes that symbolic link sees enough and then status tells you what it's doing Get default tells you what the default target is There's also a set default if you want to change that symbolic link for the default target And then list dependencies shows you a nice little graph showing how all these dependencies work So that's a kind of the as much as I want to go into right now So that's kind of hopefully Despite my slightly garbled description of all this stuff. You got an idea then of how the dependencies and how the ordering of System D allows it to bring things up in a particular order So what do we how do we apply this then to reducing boot time? So I'm defining boot time here as the time from powering on to running the the app the critical app Typically then what you are doing at this point is you have some generic image generated by your favorite build tool like build root or Yachto project Or maybe even using a standard off-the-shelf distro like Debbie on or something these images are generated To cover a variety of circumstances possibly different hardware Different configurations or whatever so they tend to be quite conservative and the things they're going to do because they have to work in all circumstances In most cases then reducing boot time is taking something that's generic and making it specific to your particular use case There are basically two ways you can do this apart from rewriting the whole thing The simplest thing is just to leave out stuff. You don't really need so if it's running a bunch of demons you don't need or if it's configured some some Interfaces that you don't require you can just tell system D to ignore those things The other thing that can sometimes be a win is doing things in a different order so sometimes it's It is a win to be able to start your critical program ahead of stuff that's less critical system D comes with a bunch of Well, it comes with a tool called system D and Analyze which has a bunch of options to get information about what system D is up to So this is the key the main tool. I would use I do use for running for optimizing system D So you can just try it system D Analyze and it gives you a brief summary a one-line summary of what's been going on Then you've got blame system D Analyze blame which gives you a list of all the units that it's run to get to boot up and Tells you how long each one of those took and then it orders them from the longest to the shortest Which is kind of interesting, but really the key one is the last one on this list here critical chain This takes the the critical path from start up to Whatever the default target is and it tells you which units were affecting that that path So really the critical chain is where you want to start you look at what is taking the time What are things on the critical path and then you start optimizing those things? so as an example then I have Some example dumps of system D Analyze which I took on this little pocket Beagle, which I happen to have Leftover from yesterday being at the e the embedded apprentice Linux engineer thing down in Yeah, the thing that Debbie that behind running so that's running a copy of Debbie and stretch and When I run system D Analyze it tells me this So it's taking quite a long time to boot the kernel. I haven't done anything to optimize that but it's 18 seconds But then the user space is taking 47 and a bit seconds So the total boot time is one minute and a bit So obviously there is some optimization to be done here if we run Blame we see a whole bunch of things So the one that's at the top of the list is called board Sorry generic board startup Whatever that is Then it seems to have some the MMC block device That's not actually very interesting Network service CPU free service and so on but the interesting thing is This one here if we look at the critical chain so we can see there that the default target is graphical target So it's going to run a next server and some kind of GUI on top of that That depends on what the user target Then we have getty target, and then we have something called getty GS zero service I wish it depends on the honor device So having a quick look at this Just yesterday in fact and trying to optimize this the most obvious thing is that TTY GS zero doesn't actually exist and if you look at the The logs if you look at the journal log you see that there is a timeout after 40 seconds or something trying to it initialize this device So the quick win is to remove that service So taking my pocket beagle and doing a few changes to the system de-configuration First thing I did is I switched from graphical to multi-user default target because There is no display on this device Then I removed the offending GS zero service And then I went through and Actually these yeah, I went through and removed a bunch of other services whilst I was at it Which I knew I wasn't using so I'm not using this as a robot Controllers I could remove remote robot control. There was no Bluetooth hardware on this device and I don't really need an Apache web server running So I had to all those things out run system de-analyze again. Yeah, and it's quite a good So the kernel time is Well, it's slightly different, but that's just a random variation The important thing is the user space boot time is now 12 seconds So I've managed to shave 35 seconds off the boot time which I regard as a win It's still kind of longer than I would like and I had intended To spend a bit time more time optimizing that but I kind of Run out of time doing that But hey So that's the kind of things you can do. That's the that's the I guess the key point here is we have the system de-analyze Command which allows you to get a list of problem areas and then you can go through Looking at the units look at that at the dependencies and removing stuff that is needed or is in the wrong order So that's the main part of the talk. I've got a couple more slides before we're finished So in addition to just the plain and it demon which we've what we've been talking about System D comes with a whole bunch of other useful things and So I want to mention just briefly the watchdog and the resource limits both of which are kind of useful for the embedded use case so here's an example of the watchdog you can you can use in a service and Just looking at the example there we have watchdog sec Restart and watchdog so basically the watchdog sex says that if this Service isn't prodded by the by the watchdog in 30 seconds. Then we're going to restart the service So so long as the service is responding to watchdogs That's fine But if it doesn't respond within 30 seconds something's gone wrong System D will then stop that service and then restart it and you can also do this Other thing you can put in a limit so you don't get into some kind of boot loop So in the example here If we get four boots for sorry for restarts in five minutes Then there was something seriously long wrong with the system and that case will force a system reboot and start over again and the other really useful thing in again for the embedded use case is to be able to set limits on the resources used by a service and In this example here, I'm just showing two of these things We have CPU quota That's the percentage of time that this Service is allowed to use Okay, so it can't use more than 20% of your CPU bandwidth Which is kind of useful if you've got some real-time stuff going on as well Then you may want to make sure this doesn't take up too much of your bandwidth and Then the other thing the memory max option. This allows us to say how much memory What's the memory quota of this device? In this case is set to 4x There's a whole bunch of other stuff you can do with this the manual page system D resource control tells you a bunch of other bunch of other things to do with IO scheduling and such like and If you're interested in on how this is actually implemented. It's all done through control groups or C groups But I'm not going to describe that now and that is basically it there you go So there's a quick run through system D fancy stuff. We can do a system D any questions I Have a microphone here. No. Yes. Okay. You have to come and get the microphone I'm sorry if I take questions without the microphone, then it's not picked it up by the recording and then nobody knows what time and we're talking about All right. So about the watchdog. Do you know if it's possible to implement it without Needing to have specific system D related code in your application No, I mean so in order for the System in order for your service to respond to the watchdog prompts that which are going to come from system D Then yet you've got to write some code to do that Is it's only a little bit of code but system D specific you kind of import something or is it where what is the way? Okay, I have a further input on this Well, you either use a lib system D which in which case you have one line of C code to just Triggers a watchdog or you can just at the end of the day It's just writing in a socket that is made available by system D so you can just check an environment variable which tells you where the socket is and Find out what you have to write in that socket and do it in whatever way you want. It's not very complicated Okay, thank you for that to to answer the question I assume you want to run some cell script to do the watchdog check And I've written a small utility call health doggy confinement github where you It will wrap your program and automatically Forward the exit code of your health script to system D. So you don't have to Change the target program and recompile it What was the name of that github a health dog? Yeah, okay, cool. Thanks. Okay anyone else You may have to come if you come to the front then we'll pick you up next time round I was wondering because we've experienced that we've actually had problems because system D was paralyzing too much especially in combination with a watchdog that are Yeah, our services are already stopped because they were responding to the watchdog Have you experienced something like that that in some cases? It's faster to actually not paralyze too much to put in some boundaries So they can't paralyze too much Yeah, I mean, so if you have On boot if system D is starting to start up a million different services Then obviously that's going to consume all your CPU resources so I guess the answer to that would be to put in the ordering statements The the after and before and so on so that things happen in a more A more leisurely order should we say that's also what we put in there. Yes. Yeah Behind you. Thank you so a lot of embedded systems up till I don't know I guess recently have been you know using the busybox startup I Think build a ship with busybox startup by default at least a few years ago. Maybe I'm wrong about that But at what point would you recommend switching to system D from busybox? Well, what would be the What would be the criteria you could suggest? Well, I mean, that's kind of a general-purpose question. Yeah, I guess I mean The cop-out answer would be at the point at which busybox in it stops working for you But Thank you, but I think that I mean if you have a really small system and you have You know just a few megabytes of RAM and a few megabytes of storage then obviously you want to slim things down as much as possible but if you've got a more complicated system with multiple network demons doing various things You know and control programs and so on then system D really is an advantage Because you have all these extra facilities I haven't even talked about the journaling yet which allows you to to Do journal do a logging system logging in a flexible way and to upload those logs securely to remote webs remote servers and so on So it has a whole bunch of extra facilities, which you just don't get if you're using a simple Either busybox in it or system 5 in it. Okay, it's more about the capabilities of system D than the boot time advantages It could give you um Yeah, I mean that If you have a system that's that simple then Probably busybox in it will be as fast or faster than system D So in that in that particular case your advantage would come from the extra things that system D can do for you. Thank you Hi Would it be possible to open slide 25? I don't know what it This one. Yeah here if you see the second line, there is the device file and on the fifth line There is the you'd have trigger service. Did you make any optimization to improve the speed of these files? No, it's a simple answer. Thanks. Sorry to be so terse on that whoops Excuse me that that's indicating times up Yeah, so we can talk about this afterwards, but the same transition that I didn't Do you have time for more question? Who's controlling this? Go on then maybe not really a question, but a comment to the gentleman in the gray t-shirt there who had the problem with a Paralysation I Sounds very familiar and I think Your second line even it's a bit different shows actually the reason because of many of those embedded systems are very much IO bound so the IO limited and now system D coming from the server world They paralyzed like crazy, which is not nice in our embedded systems So the answer the addition I would get to make To make there is I would use system the analyze plot which actually shows you Which step takes how long and it shows you whether you have a CPU bottleneck or an IO bottleneck In most cases what my experience you do not have a CPU bottleneck You have a IO bottleneck because it takes too long to read All the executables and all the libraries from this low MMC crap So And the answers then as you as you said as you said Look there what takes long and then just add more after statements there to get it not so parallel Okay, so the system D there are two system the analyze things that you didn't mention which I would recommend The plot which I said and the dot the dot is typically very intimidating, but that also tells you That you have too much stuff ongoing So probably show something out so the system D analyze that's many hidden goodies there Okay. Thank you very much. So I think that's the end of the session. Thank you all very much and yeah Enjoy the rest of the day