 Okay, so hello, I'm Ederson D'Souza. I'm an OS development engineer at Intel. I've been working on Intel for like 10 years now and I've been working with Zephyr for the last one and a half years. And today I'll be talking about Zephyr footprint, where are we and where are we going? What do we mean by that? So that's the agenda. I'll be talking about footprints, what is footprint and what's the footprint of talking more during the presentation. Some tools that we have available on Zephyr to deal with the footprint issue. Some hints that we usually have back of the mind or if you do some search about how to reduce the footprint. And then I think the most interesting part of talk about some experiments I have done that try to address some issues and then have some time for questions. So what is footprint? What's I'm talking about? Well, and I'm talking about software footprint, let's be clear. So it's the biggest impact of the software on some resource like the flash memory, like the RAM. So like how much memory is being used by an application. It can be the power consumption as well. All of those are footprints. All of those are resources that the software is consuming. And we usually want to reduce the usage of resources. And we usually want to do that to save money because we can use like a smaller chip, a smaller memory. We can save energy. So we can use a, we can save some, we can save the planet depending on how you see it. So basically resources are things that we are scarce and we want to minimize the impact, minimize the usage of them. So that's footprints and that's what we care about. And to be able to reduce the footprint, you need to be able to know what the footprint is to start with, right? It's something that you can measure luckily, right? Because it's, by the finishes, it's how big is the impact. So you need to be able to measure it. And there are several tools available to measure footprints, depending on kind of footprints, you can use like it for energy, there are probably different tools. And you want those tools and you want them like integrated on your system, on your production development. You want to be able to catch regressions, like if you do a change, you don't want to start using more memory or start to use more battery than before. And of course, you need to measure to be able to see the improvements. For instance, one example is the ROM report tool that is available on Zephyr is one of the targets for the West build. And it's simple to use as using West build dash to ROM report. It will then generate a report. This is an example of the output, I trimmed a bit so you can see the interesting part. It shows a hierarchical report of the usage of memory for your ROM. So it's like after the the executable is done, it's compiled, you can go and check like if this function is using all those bytes and this other function is using all those bytes and you can see like what are the biggest users of memory after the ROM and then you can try to do something about it. It's really nice. I'm not sure if everyone knew it, but it's really handy. And when dealing with footprints, we usually want to try to minimize the usage and there are some things that we usually think about how we can do that. You can find on Google, you can I don't know if you can ask CheckGPT to find this, but you can ask some friends, definitely you can ask them on the score. And there will be some common hints that everyone will tell. For instance, if we're talking about the ROM, it's one of the easiest ways of the saving, spaces the saving in the features and subsystems. If you're not using isprc, there's no need to be compiling it into your application. So that's I think is the easiest and probably the biggest way of saving. You can try to avoid having holes on your structs as well. That's another common hint. Again, Zephyr has a nice tool here. You can use a hole to check for holes on your structs. And what do I mean by holes on your structs? You have your struct and say it has all its members. Each one will be occupying some space in memory, but for performance reasons, the compiler will usually align those with the processor cache line. So you can actually be reading everything as fast as it's possible. And the problem is that we then add some padding. And it's possible to minimize this padding. So you can actually don't use too much memory. And again, there are tools that help you try to figure out when you are having those problems. And Zephyr has integrated like a hole on the target of what's built. So you can, again, just use it to get a nice report on the health of your structs if you are wasting space there. You could try different toolchains, for instance, so usually when toolchain the next version, you have like better compilation techniques. And you can see that you can save some bytes from your application just changing toolchain if it's possible, of course. For other kinds of a footprint, like in the RAM, you could try to limit the number of threads. So each thread will have its own stack. So you can save some RAM if you have less threads. But then there's a compromise on your application, how you can design it. It's always a compromise. For instance, power management. If what you care about, if what you're caring about is like the energy consumption, if you can enable power management for the board and the devices you are using, that's perfect. But that will also add code to your application. So then you have the trade-off with the size of the application. You can mention the logging. Again, it's more on the size of the application side. You can reduce the amount of logging that you do in this effort, so macro basis, so things will be reduced in compile time, so you'll save the size of application. And for the experiments, there's one thing that I just noticed here. That's where we are. That's some tools that we have, the hints that we can find right now. But what else? Is there anything more exciting coming? Can we say if we're just thinking some other ways of what else is available? And being curious about that, I decided to make some experiments. I was talking to people. I got some ideas. I tried to get like what's going on and what is worth trying to pursue and trying to reduce footprint. And let's set some expectations about the experiments. You probably noticed that I'm talking a lot about the size of the application. I tried to hook other kind of footprints, but I'm most worried about the application size. So that's basically the focus of my experiments. On those experiments, I was experimenting what I could do, what we can do, but not if... And the results will guide, oh, this is nice. We probably want that. And the way I tested, some people were like, no, you shouldn't have been done this way. And please let me know about other ways of trying to achieve what I did on my experiment. But I'm not prescribing how. I just tried to have something, some kind of baseline that I could use to test to get some results, to get some numbers and see if we can go further on these or not. As I started going down the rabbit hole here, I kind of deviated a bit from trying to run something because my focus on the application size. So maybe things maybe can be broken by the experiments. I don't expect anything to be fundamentally broken. I expect only minor breakage, hopefully. So if we want to go in some direction, I think that's possible and the issues are solvable. But yeah, if you just try to replicate the experiment, I said, oh, but it's all nice and beautiful, but it doesn't work. It may happen that that's true. And I use some of our projects on the test because just trying to do that with really gimmicky applications is not interesting. It's kind of boring. We can see the idea, but if we want to have real results about work data, we need to use some real work applications. But I don't find anything about those applications not trying to imply that I will try to submit patches to those applications or that they are using too much. There are big applications, nothing like that. I just want some real work applications. And that's one of the wonders of the stars. There are real work applications out there that you can try and experiment. So one of the ideas was about eliminating some function points. And the kind of, I'm joking, I mentioned that they are considered harmful, but I'm not care about the reason that usually function points are harmful is the indirect call. That's not about it. It's just that one thing that we end up having with function pointers is that the compiler cannot see through it and it cannot eliminate some code based on that code not being used because it can't see through the function pointer. And Zephyr uses lots of function pointers like the APIs are basically based on them. So here's an example for the case scan API. You have like a struct that has like some function pointers inside it and then the driver when the driver actualizes it will define the function pointers that will do the actions required by the API. And the API itself will basically be calling those function pointers internally based on the device struct that you're using. You get the API associated with that and you then do the calls. So the idea here is could we do that differently? Something like C++ templates. So we would know, compile time, what is being actually used, and the compiler would have like more proper principles to avoid compiling some functions and try to save some bytes there. Something like case scan and the driver actually being used. Well, maybe. In the C word, I think on C11, we have like the generic keyword and you could use that to have like a switch doing a dispatcher based on the type. But there's a problem here that you won't normally know that compile time, but why we're writing your code. And that kind of defeats one of the interesting things that we're having to effort, that's the device tree. So you can do like just change the device you are using while compiling one point and not having to change code everywhere. So maybe the device tree can help you with that. That's interesting. But that started becoming complicated. And I was, okay, I won't go exactly this line, trying to map DTS to some generic thing, but because it kind of sounds a little complicated, at least to start with. But the idea is not lost. Actually, we could do something like that, trying to have like some static dispatch. So I think that instead, basically the idea here, instead of trying to use DTS to get some code generated that could help with the static dispatching thing, we could have some code being generated from the key configs that I enabled. And for that, then I generate some code. And with some rejects, I can try to change some drive. So what did I do? I basically created a script that goes on all the drivers, actually have like a metadata that says the drivers that they kind of enable doing that. It goes into those, the directors, it basically grabs for the instantiation, like the point where you do the device with the function pointers. And I try to get those function pointers and create like direct calls to them. So instead of going through the function pointer, I'll have a simply direct call to that function that I can then call from the API if that is defined. So it's ugly, but I think it works for casting. And then the dispatcher is basically this code that's basically included. So I had to go into the drivers and add this includes. So the generate code will be inside there. I can hook that into the build. And I can have this code that calls without the function pointer. And my hope here is like the compiler will be able in later stages to avoid adding code that's not actually used. So with that, I'm not showing this script here because it's just a kind of big Python script that does some rejects and generate some code. With that code generated, I was able to test. But as soon as I started doing that, there were some issues. The very first one is like, you know, so that I had like the F that like NPCX driver is using the step thing called this. What if I have more than one driver for the same subsist, so if I have two GPIO drivers, two case can drivers being used, then all bets are off. Actually, I would need to have a way and run time to differentiate to which one's being called again. And I would have to have like a switch doing the dispatching. And that actually adds code. And in the very first tests, it added more code that was being saved. So, okay, this is only useful if we have just one driver for given subsystem kind of driver. So only one nice person driver only one driver. So I kind of change the script to only try to don't think if that was the case. How do I know what to enable? I basically go into the directory, try to figure out some rejects from the CMake list, which is the if it is used for that driver and check if that configuration is actually defined on the config file that's inside the good directory. So it's catchy. But it works. So using only and then when I run on some real-world projects and also that for most, it's kind of expected to have just one. Sometimes I have two, but it doesn't seem to be like too perfect to expect just one driver being enabled at some point. So with that information, I'd say, okay, let's let's try to expand this. So I start doing that for a few drivers. We see them there, as you can see, there, as per see, key scan. And I went to some real, to like try to collect some data, I went to some real source projects. And I chose those three, this watch, the Intel EC firmware and the DMK. And as my goal, just to see if there's a reduction on the size of the application, I just tried to use the board that described first on the each project read to me and see how it goes. When I was doing the first test with this, I was actually using the current version of Zephyr at the time. It was like a month ago. And those projects are actually on top of some older version of Zephyr. So I could have, of course, changed my work to be on top of the same version they were using. But of course, I'm not all that smart. And I did one each time. And each time, I'm rebasing them on top of the latest Zephyr. And here, things may be already wrong. There may be subtle bugs just because of the rebate. But I still went ahead. I want you to get the numbers. And as I said, I don't think any problem here would be fundamental. So you can, if I can run and I can find some bugs running, I can try to fix them. I don't think I'm hopeful that won't be really pretty bad. And that is the Zephyr SDK I used, 0.161. And the numbers. Okay. So when I tried to use that for this watch, I was able to save like 850 bucks, which is a whopping 0.1% of the project size. So yeah, quite underwhelming. But that's something 0.1%. For the Intel, the results were a bit better. It was like 1400 bytes, which is, it's amounts to like 1.5% of the size of price. So it's kind of a more valid, again, something that's more interesting. So I thought, okay, the next application I'm going to test is the MK. That's smaller if you compile that. Of course it went wrong. I still don't know why, but the MK actually got bigger. And when I looked at the wrong reports, I basically, this date is coming from wrong reports. So after I compiled, I got the wrong report, I compiled again with the steps fetching called trickery and run the wrong report again. And for the MK, it got bigger. And it seems that that code was kind of zombified back to life. I really don't know what's going on with the MK, but that kind of occurs. Okay. I don't know why this code is being brought back. But, you know, removing that code is actually something that LTO is more famous, right? Or to make that code elimination. You compile, the compiler adds some metadata to the compiled codes that then the linker during the linking phase will be able to inspect this metadata and be more confident on eliminating some code that wasn't used. You know, during SFR, actually, I think the SFR is now on the issues like more than 50,000. And there's an issue with the number in the 2000s about LTO. It's kind of old. There were parts of people using it downstream. So that's kind of nice. What I did, I basically added those flags, the LTO and LTO objects. It's quite easy actually to do some tests. That's nice. Just add that to the West Build. And of course, things won't build. You will figure out that there are some code that's missing, too much code is missing at the end, like a main set. The compiler is free to use like the main set function to set some bytes in memory. And it's not there unless your mark is used. You use the compiler to mark them as used. That's not bad. Actually, there's a small series of tests that send the string if there's interest. But it's just adding some of those. And I was able to compile everything. So those are the results. So here I basically compiled the results for using the static dispatch idea, just LTO and LTO plus the static dispatch because that's what saved the ZMK case. For this watch, it tripled the gains. So now instead of 0.3, 0.1% of savings, 0.3% of savings, which is basically nothing. With both, the static dispatch idea, I'm calling static dispatch, but maybe it's not the right name for it. It's just to avoid the function point. It would be direct dispatch. And it gets to 0.4% for this watch. For the EC, the Intel EC firmware, the gains were kind of expressive actually. With LTO, just LTO saved like 11%. With both the static dispatch, 13.7, and that's really encouraging. So that's really cool. I just need to run it. ZMK kept surprising. The gains with LTO were also huge, 13%. But LTO plus the static dispatch idea from before, but actually I still lost 1.9% on that case. So I'm still puzzled. I don't know what's going on. If someone has an idea, please ping me. And well, that's basically it. Those were the few tests that I had done. I also tested with different tool chains, but it didn't include here because it was like pouring. But it kind of notes that you can save some bytes. Yes, if you use newer tool chains, like basically try to use different versions of the first decay or some other tool chain. So what are the results here? Clearly LTO is interesting. Like there's interest on LTO. I think that on the containers presentation yesterday, there was again some call about the LTO. So definitely something that you should be looking at. And what about the static dispatch idea? I think it is, but I'm not sure if the way I have done it's the way to go. It feels to sketch it, but it can save some 1.2% on some projects, not DMK for some reason. And I still think that the first idea of trying to use the device tree to see what's actually enabled and from there try to generate some the code that's actually going to be used. That can actually help, but I'm not sure. Again, I'm open to ideas here. Also open to the idea that doesn't go further. This way probably is just waste of time. And of course both for the dispatch idea that I'm looking to as well as like the LTO, we need to ensure that no subtle bugs are being introduced. I think on the small boring tests, things just worked. But yeah, real-world projects, things may not be as clear-cut. We can think that's working and then it's not. There's always some corner case. And that's it. Now it's time for questions. If someone has any questions, that's basically it. So if anyone has questions, thoughts, comments, I'm just asking. Thank you so much. Okay. Does anyone in the audience have any questions they'd like to ask? I don't think we have any right now. I'm going to do one last virtual check. Okay. Hold on. We have one. Hold on one second. Yeah, I would just promote the Spark you can use. I think that's no number at the end of the Spark anymore. But yeah, I can always try to find you there. Do you think it's reasonable to automatically detect how many instances of an API are kind of existing in a given application build? Because as you suggest, there's somewhere it will only ever be one like an SPI driver or whatever. But the sensor subsystem, you might have a dozen sensors. So like, do you just only do the static dispatch Python script stuff for ones you know will only have a single one? Or do you automatically detect it? Or do you have any feeling for that? Actually, the script does detect that. So it's possible. It's really a good question about the sensor subsystem. It won't actually be really useful there. But at least for key scan, it's actually possible. Basically, on my script, what I had is like I tried to have, I went on all the directors that I enabled, it had like, then it goes into the CMake files and tried to find the separate source if you deaf. Some cases are different for those that kind of had to manually hack that into my script. And then I can have like a list of everything that's in a book. So it's doable. So yeah, we could at least use that to see if there's more than one and when it's actually worth. And I basically did that. I think that for ZNK, I think there's more than one key scan, for instance. So I had to disable that. So that's doable. Yes, you can use that to get more information about if it's actually worth doing that or if your work part is actually using more than one. Definitely. Just to confirm, is that script running automatically as part of the build process? Or is it some pre-step that you run? I run on part of the build, but it wasn't, it was automatically on the build. It's just that I didn't integrate it with the build completely. So in my experiment, I basically run once, then the key config is generated. Then I run the script to generate the auto-generated code and then I finish running the build. That's basically how it does. So it starts the build, and you can shoot some point of the build process, and then I finish running the build. Thank you so much. Any other questions? Okay, I think that that's it. Thank you so much for joining us. And is there any way that they can contact you? Are you on Discord or something else if anyone has any additional questions? Oh, it's on the screen. And yeah, yeah, just thank you. And there's this card here. And yeah, and I think that's also the, I think you can go on the events page and see the, I think there's a way to ask questions there. I'll be looking to that. So if you'll just see like the app page, something I forgot about the provider of the videos, but I'm looking to that too. So if someone asks any questions there, I'll have to reply. I think we are good. So thank you so much.