 here's communication Congress and Now I will start translating the herald and it is a great pleasure to present Andrea Matias Koch Who is going to talk about fourth on microcontrollers, please have a round of applause Hello Compiler optimizations for microcontrollers As well as who knows who of you knows for that's about half of the people So I should rather or better explain what fourth is. So fourth is a language That is stacked based And it uses reverse polish notation So first you Push a operands on the stack and then you specify the operator There's also so there's one stack for the Values and the operators and there's also a second stack for the return addresses of the functions The compiler itself is very simple. It is based on the So that you tokenize the input and if you find a word and And if it is a known word, then it compiles the word And and if it is recognized as a number, then it is pushed onto the stack as a number otherwise, it's called as a function and The nice thing about this language is that it is very small and it can easily be installed on a microcontroller and Because of this it is relatively easy to test and your software or your programs with this very small compiler instead of directly programming everything and So this makes it a lot easier to find mistakes in your programs. I Did not invent this language myself. It already Sists for several years as several decades and I but I've written one compiler, which is not so uncommon because writing for compilers is not difficult It's rather unusual to present yourself and so I'm I'm a physicist I have I'm doing my PhD currently in Laser spectroscopy in algae and So the name So the name might be unusual, but And you are controlling the MSP for 30 so It supports every MSP for 30 launch pads I'll talk later about the FPGA the classical architectures The ones that are usually implemented You had a virtual machine that has a list of pointers You take the pointer and you get another list of pointers or instead you use a sample of meridus This is very easy if you want to look for errors in compilers because it makes it easy and You can do some unusual stuff during the construction of the cabala Very old systems had the possibility to To indirect the pointers across another table so that you could change them later on Though if you have a definition very deep in the system You could change later on It is also very easy to decompile it in the original or in the old-fashioned way to implement it You just take the object on you Disassemble it change the the source code and then compile it again The optimizations I'm going to present you destroy this aspect, of course since here you use Machine code and you can't easily You don't have a one-to-one correspondence between the output and the input anymore Well already always had this problem that it was a little slow and of course my optimizations make this better Compiler construction theory, this is all pretty old news says all been done ages before but This is kind of hard to implement in small and so there are different optimizations like tail call constant folding inlining and some others and Depending on which so it's not available for all architectures, but I've listed the available architectures and So now we talk about tail call so in the you can make a take all if you have a Function call at the end of another function and then you can just skip the return and The return on the stack and just compress and shorten the path from jump returning from the function and There's also constant folding Which does compile time calculations on numbers? So that in the end in the final executable there will be a just a number no calculation so here we have an Example where you just put 42 on the stack and then also put in and minus minus and some other things and in the end you can already Execute this code at compile time and so this case it's pretty obvious that you can use it and and also With the help of other optimizations It might be possible to create a situation where a constant folding is suddenly possible And I would like to illustrate how a classical implementation of interpreters usually work So usually you start with a tokenizer which tries to find the tokens in the dictionary to see if they are already defined and And if they if there is a recognized token, so you have to decide if the code should be compiled or if it should not be compiled and And if it should be compiled then it will be compiled and and And Sorry, I lost track you need this for control structures because you can have jumps inside them If you don't find it If it has to be compiled and you put it on the stack if you can already compile it then put it Put the number on the stack if it is not a valid number then it is an error If In order to do that there you don't have to do a lot of changes It's just important that you don't compile the constants right away, but rather collect them and if you have an operation That will produce constant output from the constant input then you can immediately apply it So at the very beginning you now have to Have to take note of the state of the current stack at the beginning to know how many variables you have to available and If if you don't find another constant and you don't have to do constant folding you can just throw it away In the compile mode You check if the operator would produce constant output and if It has enough input values of constant class Obviously unary operators only take one constant and binary operators take two and so on So if you don't have enough yet you just leave them there and You just look later when you know enough At every point you know how many are necessary if the definition is Not usable at this point Then you have to compile the constants you already have and just do it the classical way But you can also change the classical call command you can add Immediate checks that means the function can have special cases Numbers always stay on the stack so they can be taken for the constant folding later. It's important to know that When you add a number then it already puts a marker on the stack If it's not if you can interpret it as a number then it's an error obviously This is in general possible to achieve this on every architecture independent of the way it works I have seen from my colleague Matthias Tröte he also implemented this on AVR And the next thing in lining of course this also does the compiler and So if you have the different so we have the definition that you would just insert the call command the body of the call command inside the calling code and So you just insert the code in place basically Then there's upcoding. I don't know how it's usually called and opcode is is Like a constant because an opcode is a sample assembly code basically and you could also put these opcodes onto the stack and Then you can also use very strict with this With the MSP 300 something this always works nicely, but And there's also the register register allocator Usually Usually when you do calculations and forth they end up on the stack, but it's more efficient to use registers which are just faster than the stack and Because it's usually the stacks and memory and you always have to retrieve the values for memory and If you can create a shortcut then it's a lot faster and you can also just have shorter commands important is It's important that it's all transparent for the programmer Because if you want to understand the logical structure the compiler needs to know that Needs to fall back to the original meaning of the code without the registers Because Programmers like to use all kinds of tricks So the essential for the register allocator is to know which element is at which place so during compilation the stack model has to be taken into constant iteration and It has to be known which stack element is it in which register If at all and yeah, so and if there's some intermediate results do we still have registers and If we have more Registers available then we can use them and if we don't then we have to write the immediate intermediate values out of the stack and it's You can implement it pretty small efficient space efficient and Normally with register allocators you have algorithms that try to Make it most efficiently. I use a simple solution. So I just used So whenever there's a branching in the code, I would just not use the Registers because it's too complicated. So I just stopped there and I just I Like to you at Brits show some examples first. I need to know that and So my work is based on many other nice things which I also will present So for example these two guys have Created all this stuff which this talk is based on so first of all we have the constant folding which is So here's an example So this is on the let-com implementation with the LED light LED you can use as a Cat vote and as an as a diode and as a cafe, but I'm not sure how to translate this and Okay, sorry, this was too fast He was saying that you could implement different programs for your coffee machines and make the LED signal error codes. So He defines a method called shine and uses and out and cathode as outputs If you disassemble it This is what happens I know it and cathode were constants that were folded into the hex into the constant 11 Now we use a Jump instead of a call. This is the tail call optimization which you see with the IO part Yeah You can also see the inlining at the point Where the the end at the end out was immediately inserted And you have another tail call at the very end and for the constant folding at the very beginning There was a very simple example During the compilation the compiler or you replace it The the constant instead of using the opcodes The pluses in the processor So you can combine it with the the opcode for the return You can say a lot about that kind of processor. It's easy. It's about 200 lines very lock It's interesting to take a look at it. If you want to if you're interested in that kind of stuff the MSP 430 is a processor that has a lot of different ways of addressing Memory with the tail call There's a bit of trickiness with How you address this so I didn't add tail calls here. So there's some examples here You define constants and at the very end You again target the LEDs for output This is an initialization for the launchpad This is what it looks like when you compile it. Okay, you have constant folding again and the commands are put in via inlining directly and you can take the parameters into the opcode coding and the output What you get is already what you would write as an assembler code The last instruction is the return My Chris the largest it's a direct report here on from me Chris the largest is the direct port of me Chris And this the star is launchpad was the first target It's almost identical to the MSP 430 At least in respect to optimizations But it has an a register allocator that I want to show you now Here's a more down-plicated example. This is the grade code. This is If you if you count up then only one bit changes You can see that there is no movement on the stack anymore the observed stack element is already included The immediate value Intermediate values that were calculated are put up on the stack again and the shift command Has a different register as a target here And put it on the top of the stack so wait, so you don't need a stack for this This is the thing that you try to use the registers as much as possible This is a more sophisticated example So here you have variables that should be I think in incremented As so you have this address which is loaded and Then you load another address and then you write it back. So you don't have any stack movements So who's curious now and who wants to start all MSP 40 4030 launch pads and many arm cortexes are now supported and Who uses different systems also can use for? There is for for the piece PC and for AVR and for pig and for that 80 many different for dialects and implementations and the reason is that For this very simple and many people just write their own compiler because it's because it's fun and Even though some people will be had me for this and you should just do it because you will get a deeper understanding about the language And so both sides have good arguments and I think it makes sense to just write your own farf compiler Okay, some more example because I have my side time So this tries to get some random numbers based on the digital analog converter and it reads data from the analog reader and And it adds up with Yeah intermediate result and So this is how it is compiled You see the loop You push is zero on the stack So let's have a look again So there was already a zero at the beginning and Then you do a shift which was inserted And then you have the constant 10 Unfortunately, there's only one push command So you'll have to use a different combination of stack pointer and something else and consecutively And then you increase increment the loop counter and And once the loop Terminates you can you jump back And now we have seen all in one Like in a real example This is a bigger example This is a bit exponential function is like an exponential function, but on bits and You can see different things What happens when you have control structures in between So in the beginning you have this You check if the number has a certain size and And you check a certain register And if And once you have control structures and it is not clear Which you need then of course the number has to be saved if the condition is true and You do some pushing and pushing and loading and unloading And so in the else branch there's more work to do So you have the comparison is Is 16 greater or less than Greater than 16 or equal and if that is the case and shift and wow so here's some more work to do you can already see that Register one and register three So you don't push the values on the stack, but you put them and registers So this is not easy to understand and So what you can also see here is that in our cortex you can You can't insert a constant into the end buffet and command but But you can use shift use shift with command constant and In the end you have to tidy up So far this has been no problem because the last element was always returned But now you have all these shifts and moving And now we need and that is why the highest element of the stack is not the element which is also in the register and Now the stack is in a state which is not in a state according to the canonical stack model, so you have to Bring the stack back into a canonical state But you can already see that You can you already can do all these operations without the stack and However, this compiler does not look ahead and it's always is therefore limited in its capabilities and It doesn't really take into account the control structures and branching So I've shown you all examples and I wish you a happy new year And I would have happy to hear you hear from real via email my email addresses on the slides