 All right, thanks for coming. My name is Kamraj. I work for Comcast on the RDK project, which is the reference design kit. It's a very innovative name, which is basically our set-up box supporting system, which is based on Linux. Part of my work involves working with embedded Linux, but then it's also extending to other smaller devices, like security cameras and sensors and stuff that goes in your home for security or otherwise. And so today, what we are going to talk here is about some of the C language constructs that we will, that are specific for microcontroller programming. And what I'm going to cover is these few items, which is know your tools. Very important, because there are several tool chains, different compilers, other tools that we use. And they all behave differently, so it's very important to know them. Data types and sizes, more on embedded systems. Generally, when you have general purpose chips, the length, the processor word lengths are known. But in microcontroller, there are 8-bit, 16-bit, 32-bit. You need to know which one it is. And then how variables and function types can help. And then what can you do in loops? And then assembly will talk about assembly. It's contentious. Many times it depends what you're doing, which compiler you're using. And then some thoughts on RAM optimizations. And then in summary, what you can do, keep in mind when you're programming for microcontrollers. So it's an open session. Feel free if I represent something that's wrong or you feel that can be done better. I'll be very happy to discuss it further. So feel free to jump in anytime with your experiences. I'd love to hear. What I've done is I've taken Zephyr as a sample here. And I'm going to cover the GNU compiler, mainly. So there are other vendors and other compilers which might offer more options. I'm not going to cover that here as much. It's primarily around these two projects. So knowing your tool chains, as I was saying, many vendors. There is GNU compiler, then IR system, ARM. There are many. And each compiler over there, they have either added new features that are not in standards. Or they have certain additional options that are very fitting for the target microcontroller they are targeting. And so you have to know what tools you have at hand. And it's very important that you go through the whole documentation, how they represent far pointers, how they represent near pointers. They may differ. So if you are in for writing more portable code, it's very important that you can either find alternatives to those or you can use them in such a way that you can disable those features or have them as no ops when you are really not using those compilers or those tools. Compiler switches. So what I've seen here is I've given you a simple table. What I did is I took Zephyr's Hello World application. And I compiled it with different optimization levels. And OS definitely meant for size optimization. So keep in mind that this is GCC in Purview. Other compilers might have different naming conventions for these minus-o options. So as you can see with code performance, if you are really looking at help from GCC to optimize your code max, you use OS. Looking at help from GCC to optimize your code max, you use O2 and O3 or minus-o fast. So what happens is as you keep increasing your O level, it starts getting a little bit more inaccurate as well. So you have to know when you change an optimization level whether your algorithm can sustain it. So most of the time compilers keep in mind that there are tools written by other people. So I'm a compiler developer. So those are applications too, and they need help. When you feed them with the right kind of stuff, they give you good results. So most of the time we'll cover here in this talk that what can you help compiler with to get maximum out of the compiler. So as you can see, the code size goes up as you increase your optimization levels. But if you have a real-world application, you would see the execution will be faster as well in some cases. In some cases, actually, when you do optimization for size, it improves your code performance. It depends upon your bus width and how you can utilize the memory bus. For example, if you have 32-bit memory width and you have 16-bit instruction set, it can improve your instruction cache usage. And there are loads that when you compile, say in 16-bit may perform better compared to 32-bit. So execution time really depends upon those kind of constraints. Now, OG is relatively a new optimization that's added. What it does do is during development, for example, you can use this option to get good debug view. So it doesn't give you just raw translation, which means you have highly unoptimized code. But what it does is it applies optimizations, which gives you a good view of debugging. So it doesn't hamper your debug view because most of the times when you enable optimizations, you start debugging and it flips around. You don't have the logic flow. And when you are stepping through the code, you lose the context very easily. So it's a good way for if you're a developer, you want to try out some debugging help and want to get good debugging experience. This is a good option to try. So other item in your tools portfolio is your linker scripts, very important where you place your code in microcontrollers. So you decide how, where the code goes. And you need to know like GCC also has a very elaborate linker scripting language. And you can really define your flash memory outline and how it is loaded, which sections goes where. And you can also define symbols and other items wherever you want in your code for help during execution. So if you look into GNU linker manual, they have a very elaborate syntax for linker scripting. Most of the time it's, you know, we take a linker script and then we kind of enhance it. What I found is it's very interesting and important to understand all the construct that goes in and how you define your segments, what are your sections and what are your alignments and how you define your total lengths of your data sections and where you begin them, where you end them, how you want to initialize them. All that goes into your, yeah, question. Yeah, right. So I think the question is, is there any recipe, actually, a template for writing your linker scripts? So I think generally when you have, you know, it really depends on your architecture, how, where you are storing your code, where you are executing from. So many times you store your code in flash and then you copy it over and you execute from RAM. In many times you do execution in place and you're executing from flash. And then sometimes you have SRAM, sometimes you don't have SRAM. So what you do is you basically define your data in your flash segments accordingly and of course alignments are very important where you align them, where your addresses are. So generally your read write data and initialize data is what you consider for your flash size and then how much stacks, if you want stacks elsewhere or where you want to lay your runtime, RAM data, all those things basically depends upon your size. So there is no, I wouldn't say that there is a recommended way you could do it, but the effort of scripts is that it gives you all the tools to define your memory maps the way you want it. So it's important that you read through all the keywords, what is available for you, how can you define a memory overlay or how you can define other important aspects of your scripting that you would need during execution of your programs. So linker map, it's a good tool to see the output of how you linked your application. Many times we link the application and we want to see where the whole thing is lying, which section went where and it kind of gives you the whole view of your memory, how it is all laid out. Very important for a small application where you are either doing some optimizations or you are looking at is there any dead code that got linked in or is there any code that is placed strongly? For example, your init sections might be required at a certain address, is it there? Or simple things like which function is kind of adding a lot of code into my application, you are looking at optimizing for size or maybe other reasons. So it's a very good tool at that point when you create maps during your link step. They can give you a lot of insights into how you build your application and there are also post processing tools, at least they work with new linker, which can give you more, what I've given you is a dump of what linker will put at you but if you want it in a more human readable fashion there are tools which can take it and give you more view of this is how much memory it is using or how much where things are and so you can do all this kind of visualization. There are tools like that available as well. What it's also giving you is what it ignored. Sometimes you spend a lot of time looking at certain function was just thrown away by the linker so it also gives you that list. These functions or symbols I found are unused so I decided to throw them away. So you get that information, they are very good debugging help, many times you debug through to find that a certain piece of code was thrown away. So a very good tool I guess to really visualize how the whole application in the end is laid out and then bNutals also often offer a few tools which are very useful. So, AvgDump, one of the features I use very often is minus D with S. What it gives me is it interleaves source with assembly. So it takes your final L file and then interleaves your assembly code with your source code. So I get a good view of what code was generated for this particular line of my C code. So many times when you are looking out for various optimizations or maybe wrong code generation helps, this is very useful at that point when you can associate your assembly code to your C code. And there is a size utility, it gives you a size that particular application is going to take in terms of text, which is your code and data and uninitialized data. So this is a good way, you can keep a tap on when you are adding new code, you are not bloating your code. So you can put a watch on this value and maybe you can add it to your build system at the very end to dump the size and then see how much size is being added when you are writing code. So very important actually to regularly manage the size of your code. And L-Futiles, L-Futile actually, it's actually giving you some dump of your application. What it shows is, you can see the program headers. So it also gives you similar information like what size does but it's much more refined where it tells you what address it is allocated, what physical address it is allocated at and flags, what kind of flags are allocated to it. So these are few tools that gives you some more insights into when you are developing. These are part of GNU compiler collection as well. I'm pretty sure that the tools that you will get from other vendors also have similar tools available. So you can look into equivalent of those tools or maybe there are better ones. So now moving into like what are kind of things you need to keep in mind. So variables, size is important. Usually it's very important that you know the word, the processor word length which is, if it is a 16-bit processor your integer is 16-bit, if it is 32-bit processor your integer is 32-bit. So usually it's a very good way to go when you use the natural word length that is your microcontroller's processor is supporting. We will have a few examples where we will show that this is how this can kind of cause inefficiencies. And globals, generally globals they have value where you can access their state but they also have a cost. The cost being that compiler cannot assume that their state is available all the time so it has to always load and store them which incurs extra load stores which can get you very inefficient code. So it's very important that you look at whether your function can deal with local data or you can achieve what you want to do in your algorithm using local data. So here is a little example as you can see it's about the lengths that I was talking about. So this example is actually from a code XM3 code generation and what you can see here is the function I'm passing integers to it and then similar function I'm passing short integers to it. So what you can see is that that's the code that's generated underneath. So it's also optimizing the code for size so it's not a raw code. So all the relevant optimizations are applied already but you can see that there is an extra instruction that it is generating which is basically doing the sign extension. So it has to do the sign extension because what it sees is that we are using a short integer to do the arithmetic and it has to ensure that the carry bits are calculated properly so that's why it's doing the sign extension. So you can avoid that if you know what data you are bringing into the function. Yes, I'm going to talk about this. I have a little slide on that, good point. So question was there are fast version of these integers and we'll cover that too, good point. There you go. So slow and fast integers. Did you have my slides by any chance? Okay, so there are these extensions that are in new C standards called fast and least integer types. So this is more portable way of writing non-word length integers or data types if you want to. So you have a choice there where you want to, if your algorithm needs fast access or you can live with slower access. So it provides you these extensions with C11 standards where it's a Uint and then you can have a Uint 8 underscore t. The least one would be Uint underscore least 8t and fast 8t. So this is again what I was mentioning earlier. Tell the compiler about your data. Tell the compiler about your execution, your program and better he will do work for you. It is C99. So it is C99. I think I'll correct it before I upload. So very good way to optimize your use of integers. Look at them, they're pretty useful because many times you can afford one or another and these things come very handy and they are in C standards. So all compilers who claim to be C99 compliant will have to implement it. And sometimes libcs are nosy so they provide their own understanding of these defines. So if you are using something libc with your RTOS or look into that they may be overriding what compiler is providing. I at least saw that happening with Zephyr and so I just wanted to share my experience that I was struggling hard to see why the int types are coming out wrong and what I saw was the includes and how they were lined up they were overriding what compiler was providing as a standard int. So look at that compiler might still claim to be C99 but you might have a libc with your program that is overriding that. Again portable data types. So many compilers I know they provide extensions. Here is a way you can represent the data and over a period of time C has included many of the data types that made sense into the standards. So C99 for example has Uint8, 1632, 64t they are very portable representations so utilize them as much as you can just don't define your own because they will keep you compliant. In the past you might have done it. I know in microcontroller programming you have your own kind of a hosted file that you include in all your source code and then you define them based upon which compiler it is and all those things you don't need to do those. If you just follow the standard and expect the compiler to provide all those defines for you. I think it is standard int is what you will include but the header underneath that defines them as int types. So this is a sub inclusion that I think I dug too deep into it where it is actually coming from. So I just wanted to give you like if you see this file in GCC that's where these definitions are but if you want to include it in a programming in a program you should include STD and that's a very good point. So that's about portability of your data types. So const qualifier we will have few examples but it's very interesting what you will see it's again qualifying your variables and your functions you are providing an additional information to compiler on which compiler can act. So when you say const you are telling compiler that okay this data is not modified. So this can act as a hint to the compiler where it can apply more aggressive optimizations and it can do a lot better job of assuming what your variable that you are passing as functions is supposed to do in the quality. So in code generation compiler is very pessimist so he has to generate code for all cases so it will always if there is a one chance to go wrong he will not use that unless it's very sure that it will always work most of the time but by doing this you are giving him a more play more room to play. So if you use const variables you can also let the compiler regenerate them for example if they are constant already predefined const variables for example then it can regenerate those constants during execution so it doesn't have to incur a load cost from memory and if it is stored in a slower media like flash then accessing it will cause a lot more slowness so use const when you can. So here is an example where you can see that it's the same function pretty much and all I've done is I've defined the globals to be constant in one case and you can see that compiler has regenerated them in the source code he's not doing any load stores from flash even though your constants are predefined but it still has to go and load them from flash and such code if in loops you can see how much impact it can have on your execution path so you can see in the first phase in the first example it is loading it from a memory address and then adding it, multiplying it, sign extending it and then going back but if you look in the second example he's reconstructing it, sign extending it and then returning so it's a lot faster code so const and the volatile variables do you think such a thing can have can we have a constant volatile variables anybody, yes, no so can we have a constant variable yes so any examples someone can think of yeah, there you go, very good so the example is a hardware status register for example so global variables as we were also talking earlier as you can see here we defined a external integer x and when you see the code it is generating in the function below every time he is loading it from the memory then storing it back and then again loading it, storing it back so this is the impact of global that you will generally see throughout your code it doesn't matter which architecture you are on these are general problems that you will see so this is just illustrating a global versus local usage it's the same function as you can see in the first example it's global but there are these three loads and stores where it is before calling the print function but if you see on the local where you are basically transferring that into a local it just knows that it's a local variable I don't have to worry about it changing state out of the function so it can just load it into a register pass on to call the function so keep in mind when you are designing your routines can you live with locals in many cases it also helps is when you are operating in a loop or so and you really need a global you can maybe transfer that local value you know if it is not modified elsewhere into a local and then operate on it and then store it back in the end essentially yourself and there why you are helping the compiler again that in optimization that you are providing and not the compiler static variables so static variables important what I see from static variables is again you are making a statement about the scope of the function so it's only available for that particular compilation unit what it can do is I mentioned here spatial locality that's very important what happens is when you are linking your program linker knows these all variables are coming from same module so he puts them one after another or at least he knows the map so what happens is when they are placed one after another and most probably you are accessing them one after another too or maybe you are accessing them together in a way because you are in a same function or something like that it can basically add generate code where he uses a base function or a base address and then use an offset to address different static variables it can perform that optimizations if you are using static variables but if they are global then there is no way it has to assume that it can be anywhere in the memory so linker cannot perform these optimizations static functions so I know many times we use macros and static function it's a debate many times people have it but one of the advantages that you get from static functions is you let compiler decide when to inline it many times compiler knows more than us many times it can do a better job of inlining than ourselves so we should give it a chance to do the inlining and then optimize rather than we deciding okay I know what to inline what not to inline because it knows the instruction lengths it knows how many cycles it takes it knows all the delays so it can do a lot of calculations on total execution of functions that probably it will take us time if we are doing all those calculations ourselves so my recommendation is always give compiler a chance and then if it fails then you kick in and then you basically help the compiler another thing it gives you is debugging so when you know you are not writing macros but writing functions in inlining or sorry static functions then you can debug them better compilers have done even I think GCC also can do macro debugging if you enable like the extreme level of debugging but then you end up with a lot of bigger debugging data that you need to deal with in your debuggers so this is pretty lightweight on you don't have to like enable those extensive dwarf free debugging information to get all your macro optimizations or for macros so the other thing is that you know during compilation where your static function is going to be laid out compiler knows it already so unless you are using like whole program optimizations and stuff like that the location can be pinned so if you are calling and it's not inline he can still optimize your jumps it can use a static small jump so it doesn't have to do a veneer or interact jump and stuff like that so even it helps in you know creating a better calling sequence so volatile volatile you are telling the compiler that hey you know please don't do anything to this variable I know this is special and you want this to define so what happens is the compiler doesn't unnecessarily optimize on your variable so what happens is many times compilers have this optimizations a real-world example I can give you is you have a register you want to access as it's a 8-bit register so you have defined it as a HR or a UNTAT and there are four of them in line so you have an access to them a read access to all four of them a compiler can basically decompose that into a integer or access-wise he won't use LDRB which means load byte or store byte he will just say he will coalesce all the four because he sees they are one after another in address space I can just make a single 4 byte load right but that's wrong because these are registers you want to access them one after another so it's very important that you qualify that kind of data with you know volatile where you are telling compiler to stay away from optimizing that in any way so there are certain compilers I know proprietary ones who have some extensions to qualify volatile variables place it here you know place it there they kind of give those hints but they are all non-standard so if you are using such a compiler and you might be using that be sure that that is only effective when that particular compiler is used so that makes you more portable to different compiler tool chains or even architectures because different architecture may provide a tool chain and you will be in a fix so this helps you to port your applications quickly if you remain portable across architectures and across tools so array subscript versus pointer access so again moving to you know how you can kind of represent your data so here's another example I tried which is essentially same code in one case what I'm doing is I'm using pointers to access the data and in another case I'm just using the array as such and what you can see is the understanding that compiler has is different even though the code looks same you can see that compiler is able to understand the pointer accesses are much kind of because if it is operating on a global data it is loading the pointer, storing the pointer and if it is doing just array subscripts he doesn't have to do that extra pointer conversion and so watch out for use of these cases the whole idea is to explain that watch out for such use cases look at your assembly that the compiler is generating in many cases you would see that results are dependent on how you have used it yes so that's what you know because many times what you do is you use a for example in Zephyr we use optimized for size I was using as my default level then if you're using a different compiler it may apply that conversion optimizations in another case it may not so look out for that if it is not doing either enable that switch explicitly if you need it or understand that okay it's not doing the subscript to pointer access conversions yeah I think in this case what you can see is that I'm defining a pointer and it is not able to kind of associate that pointer that I can optimize it away so you know that is just illustration from the compiler point of views if I had more aggressive optimization able it probably would have identified that yes pointer aliasing correct you can do that yes you can use the restrict qualifiers but you have basically you have to understand that okay this is something if I want it to work the way you know I'm accessing it it's important to understand what compiler is generated that's happens to me yes so it won't you know you might have to either let the compiler know that don't worry about aliasing I know that you know it doesn't alias no you won't be saying the same issue but it will still have a single load from the parameter list for example or transfer but you won't see this issue so loop increment versus decrement this is actually applicable everywhere but what you see is when you are testing or you are increasing the loop with your counter then you have to test it against your value when you're decrementing you know it's waiting for it to be zero so architectures provide instructions to test against zero you know and all those so what compiler can do is when you are decrementing a loop it can take advantage of those instructions and what you will see is in here it's doing a sub s right and then that will set a flag and the instruction below is doing a branch if not equal to zero so you are able to combine fudge together the check and the branch into one but in the case on our increment side you will see that he has to explicitly make a check against value 100 and then that would set the flag up and then you will do a branch so it can save you instruction and there is also a post versus pre decrement so whether you should use like minus minus x or you should use x minus minus so you can see that the both algorithms are actually doing same things you know it's going to print the value 10 times and but the algorithm if you look at the code that's generated the pre increment you can see that it's running in a it's able to generate better code because what it is doing there is it's able to first apply the operation the incrementer and then use that value throughout the loop but in other case he has to also he has to apply the operation after the value so function parameters very important it depends upon the ABI really so if you look at say ARM ABI you know that's what I was using here code exam 3 so the document they documented extensively and all architectures do if they have a common ABI across all tools then you know this they value how many registers can we use for parameter passing and how they are passed so read through those for whatever micro control we are using and it may have ABI ARM has a very strong ABI and all tools follow that nowadays if it needs more parameters or more registers to represent the parameters then it's going to use stack which is going to be expensive so see how you can write your function signatures that you can utilize the given for register parameter passing in a very efficient way so one of the things is that when you have functions alignment also matters so I'll just show a little bit here so what you can see is you know on ARM R0 to R3 those are the four registers you have but you see in the first function what I'm doing is I'm taking an int a long long and int so and in the second function I'm doing an int int long long so in effect you need four registers to pass them but the problem with the first example is that he will align the the the long long because it's 8 bytes long so he would be able to use two registers for that but then one of the registers will go empty because of the alignments so in the end what you will see is that he is spilling into the stack pointer your third parameter so these things can help you a lot when you are using these parameters so inline assembly it's less likely yes so inline assembly is use it when you have to in many cases compiler may not be able to generate the instructions you want to use say you are accessing a particular co-processor or something using inline assembly helps you to insert those into your normal C programs and it helps GCC to take care of the data flow analysis so it will be able to take that code and assemble it into a C function so intrinsics is one good example and there are as I said special instructions if you have any so I have given a link here for GCC inline assembly syntax it's quite cryptic but it's well explained so you can read through it and see how you can use it efficiently so it has qualifiers that lets you define what is the different register what are different inputs and outputs and what are the constraints on those and I have seen other compilers have them too their syntax is varied so actually one of the most it's the most common case that you will see when you are porting your program from one compiler to another that your inline assembly just doesn't work so I think optimizing for DRAM use smaller data types we talked about that but when you use smaller data types you are using less memory so it can help you if you are on a RAM constraint system use kind of compressed structures there are packed structs in I think all kind of compilers they support them so use that or reorganize your data structure so you don't have much padding in between those if you can in many cases you can't because it's a network parameter or something or an IP header that you can't do much about it and know about the local variables use them as much as you can and for example if you use allocate you will see in the code that it doesn't release that memory even though you think I'm using it locally it won't release it until you return from the function so be aware of those kind of details you are still using that memory you allocated until the end of the function to release it you have to make calls for free in between merge constant optimizations in risk very important that it can reconstruct some constants as we saw one example where it was trying to do that that helps quite a lot and then you can check your stack and heap usage and see if you are having an additional allocation there that you are not hitting a limit for example then you can limit how much stack or heaps you decide to allocate for your app so you can have kind of more or less you can use less RAM for so help the compiler out throughout this I've been mentioning that it doesn't have a magic crystal ball so it operates on what you give it and it makes worst case assumptions as we talked about pointer aliasing is one good example that if it sees there is a chance for pointer to alias it will assume it will alias so he's going to give you the worst case scenario and a bad code there and the global data if you use a lot of global data then it knows that it's not immutable so he has to every access you make he's going to load store and do while is better because you're not going to be decrementing then for loops one reason is that the termination check can be performed and you can use the compiler provide annotations also to help the compiler function attributes variable attributes you can tell a lot about your code to the compiler to help him give you the best code and there are intrinsic functions you can use them for optimizing your code again intrinsic functions are compiler specific so watch out for those be mindful that a different compiler may have a different calling convention or stuff like that so stay away from debug mode and release mode because you want to develop the code that you want to run in production period so if you want to see a consistent see how much debugging you can afford and how much optimizations you can afford more over a period of time using the same code generation for debugging in production is the way to go find details about your system architecture, bus lens, memory types, flash sizes and latencies and very important and profile your code before you optimize anything most of the time we jump on to a solution and it's wrong so use tools as much as we can that gives you a really good picture of what your app is doing and then utilize the tool don't fight them most of the time there is a reason why they are doing what they are doing so help them to help you if you help them they will help you back and avoid assembly if you can write everything in C so that's pretty much I had any questions we are open right now so we are almost out of time so there are a few questions yeah it is when you know that your data is not changing it's good to use it it's always good to use it because it's telling the compiler that yes this is constant data I don't have to always reload from memory so if it can reconstruct it it will yeah there was another question in the I don't know yeah so yes yes that's OJ so one more last question you had some yes yes yes so I think it's always good in theory to represent yourself in terms of aliasing come out clear so that is the best scenario I've seen where you don't really let compiler trigger and you tell him clearly that I'm not aliasing so thank you very much and pleasure