 Hello everyone. I am Aditya. I work as a software engineer at Facebook. I mostly work on software engineering and performance optimizations. In the recent years, code size of a binary has become critical, especially on embedded systems, mobile devices, etc. Because we, as a user, we want more features and developers put more features into the mobile applications or any constraint devices. More features means more code and the amount of code keeps piling up and that results in increased download size. It results in slow startup time, etc. So as a performance engineer or people who work on software engineering of the code base, we need to take care of the code size of applications. Now here we will discuss what are the commonly well-known techniques that can be used to reduce code size. Most of these you can do right away by using the features of compiler or standard software engineering techniques. So let's go to the next idea. Here today I will be discussing different methodologies of code size reduction. The first one is using compiler optimizations, then the second one is C++ library optimizations, especially C++ because we know we have templates, etc. and that results in a lot of code bloat. The third one is source code optimizations and the fourth one is by getting insight into the software, like introspecting into the code base and seeing what, how we can improve the code size there. So let's start with code size optimization flags. There are many different code size optimization flags but I'll discuss some of the popular ones that can, that are easy to understand. The first one is the compiler flag OS. This one is well known to many people. So if you just are not using this optimization flag, we can just go and change your existing compiler optimization flag to dash OS. This is supported in both GCC and LLBM compilers. What it does is it does not, it doesn't try to compromise speed with code size. So it will only reduce code size when it is possible without sacrificing the speed. Of course, this is all static analysis of compiler. So there can be some performance degradation or improvements depending on the, depending on the workload and the code base. LLBM has additional compiler optimization called dash OZ, which, which optimizes for code size. So even if the speed of the runtime performance of an application will get compromised, it will still do that optimization in order to save code size. This is very helpful in many applications where the app size is becoming larger and larger and most of the code part of the code base are not very exercised during the usage. So optimizing for code size make a lot of sense in that cases. These are the popular ones. The third one is F no function sections. What happens is when we compile program using like CC++ programs using like GCC or LLBM, there are situations where functions are placed in their own sections. This doesn't happen all the time, but there can be situation where it is possible. And this is very helpful for debugging purposes, profiling purposes. But when we deploy the code on in production, this is not very code size friendly. It increases the code size quite a bit. So if by mistake or by some build system rules, you have this flag enabled, you want to add F no function sections. So the original flag is F function sections that says put every function in a separate section. So you want to get rid of that. You want to add F no function sections. Similarly, the fourth one is F no unrolled loops. Loop unrolling is a very well known optimization. It increases the code size, but it improves the performance quite a bit on many workloads. But as we are talking about code size, we do not want a lot of loops to be unrolled. So if we add F no unrolled loops compiler will not unroll any loop. Now the thing is, if your application has different components, some of them are performance critical. Let's say if there are code which have like which are like some kind of DSP algorithm or some kind of camera algorithm, they need to be fast, even if you your app size want to be small, but they still want those parts of the code to be fast. Some careful analysis is required there. Okay. Now the fifth one is F no exceptions. What happens is in C plus plus applications, exceptions are enabled by default. So we have try block catch block. Many of the C plus plus standard library functions are have exceptions enabled by default. What happens is now the compiler needs to emit runtime code. This is the abstraction penalty of exceptions to handle when exception is thrown. And that adds a lot of code size. So we can add this flag. It is not trivial because if you have throw statements in your code base, F no exception, the compilation will just fail. So some some software restructuring is required as well. You can partition the code base into parts which can throw and which do not and I get advantage of this function, but it's slightly non trivial. But it will save you a lot of code size if you have a decent amount of C plus plus code base. If it is purely C based code, then it won't help, I guess. The next one is F no RTTI. This is also C plus plus specific. F no RTTI. So let me explain this flag a little bit. In C plus plus, we have runtime polymorphism, where we can do dynamic cast to typecast from base class to derived class, specific derived class. In some cases, it requires runtime in information, like there are there is some runtime code emitted to get information about the type, such that the dynamic cast will succeed. So such that the dynamic cast is precise. So for that, the compiler will put some some extra data into the code base. Now, it'll the compiler does not know that we're like which one which how many times we are using dynamic cast on which type, it'll just add RTTI for many of the classes, as long as they have virtual functions. So and that will just increase the code size. So what we want to do is, if we put F no RTTI will save a lot of code size, but some of the dynamic cast will not be legal, then the compiler will give you error in that cases amount of work required in that case. Okay. The seventh one is F in line limit in GCC and in LLVM that same flag is called as dash MLLVM dash inline threshold equals N. So function inlining is again a very common optimization. What it does is when we are calling a function from within another function. So we have two functions, one is the caller and the other one is calling the caller calls the calling. Now, it is possible for compiler to inline the callee into the caller. That helps improve performance because the brand a branch is avoided and some of the function setup and restore can be just deleted. So there is a lot of improvement in performance inlining is one of the most commonly used compiler optimizations. Now for it can increase the code size quite a bit, because when the function is inlined, the existing function still the existing callee still remains because there can be other callers. So in that case, now you have duplicate code in many places. It does improve performance, but it hurts code size quite a bit. What we can do is we can limit the number of the size of function that can be inline. And this is the flag meant to do that. So if we do not want very large functions to be inline, we can limit the code, the number of instructions by changing the value and there in it to play a little bit with this flag to see where is the golden spot because if we put f inline limit to like zero or one like very small number, it'll just stop inlining and it can affect the performance as well. So these are the like there's some experimentation required. And the last one is F note jump tables. When we write switch statements in like most programming languages support switch statements. Switch statements can be emitted in the assembly as a in different ways. There are many algorithms. One of them is by by means of a jump table. And the most trivial is a bunch of if else statements, you can imagine a switch statement as a collection of if else statements. But to make it more efficient, there are like very like well known algorithms and jump table is one of them. Now jump table in my experience increases code size. It might seem counter intuitive in some cases, but it does. So if you add F note jump tables, you might see reduction in code size. Okay, there are some compiler flags, which are only specific to GCC. These are some of them are architecture specific, but some of them are will help you tune quite a bit. These are not very well known. So I would try to discuss them separately. The first one is dash M call pro logs. This one is only supported by AVR instructions, as far as I could find. If it is supported for x86, I don't know, it can be implemented quite easily in the compiler. So if you're if you have compiler team, please ask them to do that. And that will save a lot of code size. So what this flag does is, if we see any function, the call frame of any the layout of any function in assembly or in the machine code, there are few instructions in the beginning, which save the call is the call is saved registers. Okay, some of the register, there is a calling convention in when we emit the binary. And in the calling convention, there is a protocol that some of the registers are meant to be saved by calling. And some of them registers are meant to be saved by caller. So in the frame setup in the beginning of the function that is called the pro log, the sequence of instructions are almost identical in many functions. So instead of emitting the same duplicate code in over and over and every function, we can have a standard function call and just call to that function, it will do the frame setup for us, like, basically, it will save some registers that the function intends to use throughout the life cycle of that function. And at the end, it restores those registers so that the caller can basically use them as they it was intended. This is the ABI the contract between the caller and colleague. So in the in the epilogue of the function, it is the same situation, we have a set of instructions which can be just which are duplicate across many functions. By having just two functions, we can get rid of all the frame setup and frame destroy off all the functions. It does reduce code size, but not in a very small application. The application should be like decently large enough, like if you are suffering from code size, this is a flag to go. Otherwise, it is not worth it. Also, it is not supported for all the architecture. So you might want to check again. The second flag I wanted to discuss in GCC is called dash m int 8. What it does is it assumes that the integer has only eight bits. It does not make a lot of sense in many cases. But if you know that integer value that are being used in an application will not get values larger than eight bits, or you make careful you write code in a careful way, it is possible. So instead of like commonly used 32 bits, you have only eight bits. So that will save a lot of code size as well. It is slightly risky, I agree. But if you are like, like desperate for code size, this is one of them. The third one is m save restore. So this flag m save restore is the same as m call pro logs. And but this m save restore is supported only for risk five architectures. So the same thing, it'll have two functions in the beginning, like if you call a function and the end of a function, it will call another frame destroy function. So it will save code size. I'm still surprised that they have two different flags for similar things. I don't know why they did that. The fourth one is called F reorder blocks algorithm. So when the compiler is analyzing a function, if each a function can be assumed to be a set of basic blocks, basic block is a set of function set of instructions, which execute in sequence and at the end of the basic block, there can be a branch like as think of them as like if it's branch, but all the instructions in base in a basic block will execute in sequence. Now, we can think of a function as a graph, right? And when the and most of the compiler optimizations work on the notion of a graph, which is called a control flow graph. When the functions are when we emit the assembly code, each basic block has to be laid out in a specific sequence, right? And changing the sequence can have overhead of a code size, like think of an if else statement, like if you do not have an else statement, there is a fall through, right? So there is a fall through. So we don't have to have a both if an two branches like if an else plan, we can have only one branch and the default is the fall down the fall through. So if you generalize this idea, like across a control flow graph, reordering basic blocks can incur code size overhead. So we can change the algorithm of basic block reordering and improve code size. These are some of the parameters which can which are which affect the basic block reordering algorithm. I don't want to discuss all of them, but I'll just want to highlight some of them, like the first one is called inline minimum speed up, then maximum inline instructions. Okay, there's a mistake here. The reorder basic block algorithm, there are only two algorithms like simple and there's one I forgot the name. They changed the layout of the layout layout of the control flow graph. The dash dash param is for inlining. I apologize for that. This is a separate bullet point. So when we inline the function, GCC allows us more control over the inlining. And that inlining can be controlled by passing dash dash param followed by the flags which I have listed. So the first one is inline minimum speed up. What it does is we can specify if a percentage there like 20% or 70% something like that. What it will do is it will change the static analysis algorithm within the compiler. So it will it will affect the inlining algorithm itself. The second one is maximum inline instruction single. But that is like 400 is the default value. If you change it to like 300, you will get less inlining. So less inlining means reduction in the code size as well. Similarly, if you see maximum grow copy basic block instructions. So it it'll not copy basic blocks which has more than eight instructions. Things like that. So these are the flags to control the inlining of a inlining algorithm in GCC. Here I discussed some compiler optimizations which are not widely used, but that they can also have impact on code size. Some of them are in GCC and some of them are in LLBM. The first one is FLTO. LTO is the link time optimization. So what it does is when we write CC plus plus code, we have one C dot C file that is treated as one translation unit, including all their header files. Now, in a project, you have several source files. So the compiler compiles each source file like dot C or dot CBB file individually and then combine them during the link linkage process. During the link time, all these modules are combined to produce a binary. The problem with this approach like this is a traditional approach. The problem is we do not have visibility across modules or across translation units. So we lose a lot of optimization opportunities, code size as well as performance. By having by telling the compiler to use link time optimization, what it does is it allows visibility across translation units. Think of LTO as imagine if you copy paste all your dot C file and include into one giant file and then run the compiler on one single file. It will have visibility across many different functions. So most of the compiler optimization algorithm will benefit quite a bit. It reduces code code size as well. So this can be helpful. The other flag is F LTO equals thin. This is LLVM specific compiler flag. Thin LTO is slightly less efficient than LTO, but very close. But it has faster compile time. In the traditional link time optimization, when we press F LTO, when we copy all the object files into one, it increases compile time by quite a bit because the memory footprint grows dramatically. And all the algorithms, most of the compiler algorithms, which are sophisticated, they are some of them are quadratic. Some of them have like exponential behavior, but with a very small constant. So like register allocation, those things are have like quadratic or more like more complex behavior. So by putting all these modules together, the memory footprint grows quite a bit. And that slows down the compile time. Thin LTO tries to fix this problem by only sharing information that are relevant for link time optimization. For example, register allocation doesn't need to know about other functions. Register allocation is purely localized within a function. There are many other optimizations which are very specific to a function. So we don't need a link time information there. On the other hand, like optimizations like inlining, cross module, de-virtualization, and few more, they are relevant across translation units. So only those optimizations need to be some information across translation. Now, in thin LTO, there is, you can find the resources online, but it reduces compile time dramatically, but it still gives you very close performance numbers which are very close to the link time optimization. And similarly, it is the same for code size as well. So thin LTO will give you reduction in code size, but it doesn't hurt the compile time as bad as the full FLTO. The third one is identical code folding. This is one of the aggressive code size optimizations. It is quite common for many functions to have shared piece of code, especially in C++ where even engineers are careful enough, templates will cause code duplication across many translation units. So compilers can help there. They can analyze functions and look for functions which are identical and they can be merged. So we can just de-luplicate those functions and fix the branches, fix the calls and basically get reduction, the huge reduction in code size. In GCC, it is called FIPA equals ICF. ICF means identical code folding. In LLVM, it is called F merge functions. Recently, there was an optimization called merge similar functions. What it does is it doesn't look for complete identical structure. If the functions have slightly, if the function has slight differences, we can still merge them and by putting appropriate eval statements. This optimization is used in some of the well-known industrial compilers, but it is not in supported in trunk LLVM. I have a patch. So the link I have pasted there, it enables merge similar functions in the thin LTO. So you get code de-luplication across translation units. So it can save quite a bit of code size. If you're curious, I encourage you to try. The other optimization is called, in LLVM, it is called GVN hoist. So imagine if a statement where the else branch and the then branch, both the branches have similar code, similar instructions. What we can do is we can hoist those common instructions on the parent and save code size. It helps in performance as well. It reduces register pressure as well. It is pretty. I implemented this optimization a few years ago in LLVM. It is not enabled by default. So but if you use this flag, dash MLLVM dash, dash enable dash GVN hoist, you can use this flag that you can get advantage of GVN hoist code size optimization. It is not a very aggressive optimization like others, but it will give you still give it may give one or two percent code size reduction. Similarly, the another optimization is called GVN sync. It is the opposite of GVN hoist. It basically syncs common instructions in the common post dominator instruction. So basically, if the then and else branch has an identical instruction, you can sync them into the common successor of those two basic blocks. It also gives slight reduction in code size. So you can try that. The sixth one is machine outliner. We can enable machine outliner. What it does is it does what is opposite of inlining. That's why it is called outlining. In a function, we can see if there are few instructions which are commonly found across many functions, we can just outline those set of instructions into a separate function call and save code size. So if we do this only for one function, it will not reduce code size because it will actually add some overhead of branches across as a call. Now, if we know that is a certain set of instructions are very common. So if we outline from one function, it is quite likely that the same set of instructions are there in another function. And hence we'll have some commonality of code. That way we will reduce the code size. This is also enabled in LLVM. I think it is not enabled by default. So you have to add that flag dash enable dash machine outliner to get advantage of code size. It gives quite a bit of code size reduction. However, I think it is only supported for ARM 64. It might be supported for RISC 5. But I don't know because the work was going on very recently. So you have to check it out. The last one is hot cold splitting. This is basically a performance optimization, but it can give code size reduction as well. It is very similar to outliner, but this is supported for all architectures, because this optimization is enabled in the middle end of the compiler. I implemented this only for mostly keeping in mind the performance. But imagine if the same sequence of instructions are found in many functions, then we'll have code deduplication as well. Hot cold splitting and merge functions from the previous slide. If you work with them in the right way, you can see a good code size reduction. All right. C++ library optimization. There are many C++ libraries which are widely used, especially the standard C++ library, lib C++ or lib standard C++. They have a lot of templated code and we use them quite aggressively across C++ code bases. They cause a lot of code duplication. And the problem with code duplication, it hurts code size quite a bit. So if we have custom lib C++, you can actually compile lib C++ yourself and put them in the code base. They can improve code size actually. You can have other C++ libraries like boost and icon. They suffer from similar problems because they are all templated header-only libraries. What happens is when we use a C++ function in our code base and the compiler inlines them, so you get a lot of code duplication as well as increasing the code size because of inline. A nice way to get rid of this disadvantage is by explicit template instantiation. So when we instantiate a template explicitly, then that is the only copy that will be used. And we can just put actually no inline on all those functions. That way, the inlining will not happen. And during the link time, only one definition which was the explicit instantiation will be used. It might be slightly tricky to implement do. And there is some maintenance overhead as well because then you have to maintain your own C++ standard library. But it can be worth the code size. It gives like decent code size reduction. Okay. Source code optimizations. So when we have a code base which is there for like a decent period of time, developers keep putting code in there or like new developers would come. They would add their new features based on market study. You can have new features. When we keep adding features, the source code starts to bloat. It is hard to find people who care about deleting the code base, but it is very easy to find people who want to add code base. So and the problem reflect in the code base itself, you see a lot of like dead code, basically, we can in C++, it is quite a habit for many developers to write code in header file. They'll write entire definition in the header file, even if it is not templated code base. So when we put a function definition in the header file, it just gets copied over and over to all the translation in it. And in a big code base, imagine if you have some widely used header files, it can just have insane amount of code size bloat there. So moving the function definitions from header file to like dot CPP, the source file can help code size. It will reduce the code size quite a bit. Some of the things which we do are not even intentional. It is just abstraction penalty of C++. Like imagine if you have a class where you have not declared the constructor or defined the constructor, the compiler defines them for you. And it will just copy the construction everywhere. Same goes for destructor operator overloading. And if you have like template functions inside the class, same thing happens. Now with like there is a new C++ standards, you start having more rules like rule of three, rule of seven. So if you define a couple of them, the compiler will generate code for other constructors, like move constructor or something or destructor. And they cause code bloat. And like without even you knowing it, if you go and investigate, you have to actually introspect quite a bit to find out these code bloat. A nice way to do is to declare the function, the constructor in your class definition and define it in a CPP file. Even if it is a default definition you want, you can define it in the CPP file and it will save you code size. Surprisingly, it gives a decent code size gains. And I have used this very recently. One other source code optimization is to use a cheaper data structure. This is a, I'm putting up a very counter intuitive example here, just to show how surprising things can be when we are using like C++. So we use standard vector, standard deck, unordered map, unordered set quite a bit like people rarely use standard list these days. But the code size footprint of vector is larger compared to STD list. And why do we use vector for performance? But if performance is not the most desired thing, we don't have to use STD vector. So choosing a data structure can have huge impact in code size as well. And these things are not very widely known because we assume all the time that vector is all the time better than standard list. And similarly goes for unordered map versus standard map. And I have some numbers here. So as you can see, it's a very small test program. On the top, we have standard map versus standard unordered map. And if you see, it's a very small file, you just have to cleared a map and I'm assigning a variable and returning just to not prevent compiler from optimizing everything here. And I have optimized for code size, the clang dash oz oz is optimized for code size aggressively. With a standard map, the code size is approximately 14 kilobytes. But with standard unordered map, it is approximately 15 kilobytes. So I'm not saying that you should start using standard map all the time for code size. This is one specific example. Just to give an intuition that what appears may not always be the case. So you have to, unless you investigate deeply into the code base, it is not wise to just rely on popular wisdom. Like sometimes based on the demands of the code base and workload, things require different software engineering methodologies. Similarly goes for standard list, 13 kilobytes versus 14.3 kilobytes on vector. So it can be quite surprising by looking at these things. All right. Getting in inside into the code base using compiler techniques. In a code base where if you have a lot of code bloat because several engineers are working on on that application. Over a period of time, you have a lot of iterations of features. The old features die out, but the code remains there. There will be a lot of dead code consciously or unconsciously because of the software development process. What we can do is we can find out which code which functions are getting used in production or not. What we can do is there is a there are compiler flags like F instrument functions. We can use these flags to collect data about which functions are getting executed. Imagine, so what these flags do is it'll allow you to add a function call at the beginning of every function. This like F patchable function entry and F instrument functions. You can define a function which is just imagine that function as a just very simple counter. So if any function is called, you find out that this function was called. And if you collect data across a large code user base or a large number of test cases, you will get an idea which function are frequently used or which functions are not used at all. So it doesn't mean that if a function was never called, it will never be called in future. Let's say error handling code, you don't want to delete your error handling code, but you still want to know which functions have the least probability of getting called. What happens is, and then we can deploy engineers to find out that are they actually dead? What using these methodologies, we can increase the probability of finding functions or dead functions quite a bit instead of in respecting a million functions, you're only investigating 1000 or 10,000 functions. So that has a huge saving on engineering cost and still get a most of code size reductions. I have recently implemented similar things in LLVM. It is called function entry instrumentation. It is very cheap. It is lock free. You can deploy in production. It only collects data when a function is called. The performance overhead will be negligible. And there are facilities to disable or enable for specific functions. So if you are using LLVM in your code base, you can try it out. I hope it will give you useful data to reduce code size. So I think this, I have already discussed things here, like getting insights into code base. One more thing we can do is once we find that less used part of the code base, like imagine we have a feature which very few people are using, but we still want to keep them. What we can do is we can collect set of features which are less used and put them in a separate shared library. What that will do is it will not reduce the code size of the program, but it will reduce the working set. So when the program loads, it doesn't need to load the shared library. It will only load them for specific cases when those features are exercised. So this helps reduce the launch time. Another very risky approach is to do binary compression. Imagine there's in a code base, you can keep some of the functions like main function or few functions as it is and just compress the entire binary using a well-known library like Zlib or ZLG. I found this one recently. In the main function, when the program is getting loaded, you uncompress the binary and then load the program on demand. So this will reduce code size quite dramatically. But again, there's a maintenance overhead. Debugging is a nightmare because if you collect some, if the program crashes in deployment, then there'll be tricky engineering things that you have to do. But if you're so desperate, this is also one of the approaches. It can reduce code size by like 20, 30%. It is a big one. Yeah, but it is risky and it can reduce the performance also a little bit. That's all for now. Like some references here. And now I will take questions. Let me see. Here are questions here. Yeah. So first question is what are the security implications of FNO function sections? I'm not aware of any security implications. I'm not a security expert guy. So but I imagine the compiler does not do anything special to help or reduce the security. If compiler has inserted a section for a specific function, I don't know like if there are tricks like hacking tricks. I'm not in that domain at all. So I'm sorry. But as far as I can say, they should have similar behavior. Second question is, is there a good, good process to get to the, is there a good process to get to the right in line limit? It seems it would require a lot of trials to get it right. Yeah, it requires trial and error, but you don't have to iterate like end times. You can do binary search kind of thing. Like let's say the inline threshold is 300. Start with like 200 and see if that basically just simple divide and conquer algorithm, you get you to the right number in like in a, in a matter of day or two, depending on the build time, like if your binary takes like two days to build, then I'm sorry, but it, it should not take algorithmically, it would conserve converse very quickly like four or five iterations you will get to the very sweet spot. Any idea on percentage of code size reduction achieved achieved standard library for C++ I'm thinking how worthwhile it is to maintain an optimized copy of it depends if you're using a lot of C++ code base like if it is entirely C++ it could go 5 to 10%. If you have a lot of NINJA C++ programmers who write all templates all the time, it can go higher also. So yeah, it depends on the code base, but I can assure 5% quite easily. FIPA the next question is FIPA ICF is enabled by default at O2 and OS. Is that right? I wonder what of those flags need to be added to O0 if they are not. So at O0, you don't want to enable any compiler flag, only very minimum number of compiler flags are added at O0. O0 has worse performance, worse code size. It is not meant for deployment. It is meant for debugging purposes and faster build time, things like that. And C++ standard compliance like if yeah, like some functions has to be inline if the language demands it. So they will be inline at O0 also. Some like move semantics will work at O0 also. You don't need compiler magic to work there. But it is not like the move semantics was never there. Compilers have been doing a lot of not all of them, but most of move semantics before it came into the language itself. Compiler knows how to do those things. So at O0, these things are not very meaningful at O0. I have mostly used PGO for performance improvement, but I wonder if based on runtime information, the compiler can reduce size. I've never done that experiment. Yes, I encourage you to try it out. It'll help you a lot if you have a larger code base and where like so many developers have been writing code forever, you'll find insane amount of dead code. Yeah, I can, I would encourage you to try that instrumentation. Firefox gives a, sorry, yeah, can unused functions be removed even without FNO function sections? Yes, linker can do that. Not for all of them, but for many of them. If you have indirect function call, they will not be removed. Like depending on the visibility of the function, that is what it is. If you try FLTO, at FLTO, so many functions are internalized. So their visibility is changed from external to like local to a link unit, and then the linker can remove many of those state functions. So you don't, I'm sure with FNO functions sections, more of them can be removed, but even without it can be removed. Let's see what is the other question. Firefox gives a warning about the libzlz website. Yeah, I'm not recommending that website. I just found it online. They have, I think they don't have very low footprint during the decoding. So it's not my product. I don't know any of those people working on them. Please try it at your own risk, but I just found it online and I was looking at them. I'm loving this session. Thank you very much. Okay, I have few minutes. I can share a few more things. Like if you want to find more about this, I would encourage you to read the manual pages. This is a very nice thing to do. Like if you see man GCC or on LLBM also has a nice documentation. GCC has a very elaborate documentation. Like simple, just command control for size, you will find just search for size and you'll find so many very useful information. And it is a good read, you'll get a lot of insight into how compiler works, like how what are the things that we can do there. Yeah, there might be some other optimizations. I would have missed here. It is quite possible. You can try it out. Or you can ask on the mailing list. There are compiler developers they and like me, they also like to help everyone. So if you ask on mailing list or their chat thread, both LLBM GCC have very like a lot of developers they are active there. So you can ask any question there if you want need help on code size. They'll they'll be very happy to help you there. It is open source you can read read through their mailing list also. I'm sure there are many deep discussions about code size going on there for a long period of time. So there might be some useful things to read in that cases. Also, for me, I don't know, maybe you can reach out to me on Twitter or something I can help there as well. If there are no more questions, then let's see. Yeah, there's one more question. I can understand that optimizing for better code size can affect performance, but can it break the functionality? No, compilers are not meant to break functionality. If there's a bug in the compiler, which is very rare, it can happen, but the semantically it should not happen at all. You should get the same behavior because that is required by the standard both C and C plus plus it is called as if rule as if rule means the behavior must be the same in a very crude way, but you get the idea. Thank you very much, everyone. I totally enjoyed this session and I want to thank the Linux this open source summit folks to giving me opportunity. Thank you.