 Say, hey, the compiler is going to be that magic black box that we put the source code and we generate something and end of a story. And it should be fast, and it should be cheap, and it should be free. And yeah, the same for my car. I would like to take my car and do anything and consume one gallon of gas per week. But it's not like that possible. So if we want to do the analysis of the source code more deeply, trying to find security issue, the compiler needs to have CPU time. And that's not for cheap. So we need to spend more time. Of course, the more analysis that we do in our code, it could take a little bit more time. Can we optimize that? Yes. The GCC and GNU community, it's all for receiving those patches? Absolutely. But we are working in progress in that. Definitely, the automated tools do not support all programming language for now. And this is important. You can buy your static analyzing tool for C code or for the language that you prefer and put on your pipeline. Definitely, you can do that. Nobody prevents from that. And they are very cool projects outside. But partly, many of those are for charging of money. The things that we try to embrace over here in the community is to use the F-analyzer tool because it's for free. Literally, you can just download the GCC compiler and put that extra flag and see what it could catch. There are some cases that, and we're trying to fix that, that it catch some failures that, in reality, are false positive or false negative. Definitely, it's some area of imperfection that we're trying to improve. But hey, with the feedback from everyone, that's what we're trying to achieve. So let's start with the, by the way, when we said that the last three releases, GCC 10, GCC 11, and GCC 12, GCC 12 was released May 2022. And the other one was 21 and 20. Those were has been the releases with F-analyzer tool. And every year, Special David has been adding more and more and more into this feature. The one that has been releasing this year, this May, it's the analyzer use of uninitialized value wording. So let's go with this simple code that I wrote for this presentation. Definitely it has box, it's horrible. I know, it could be the worst C code that you will see in your life, for sure. But one of the things that we see over there is that it has an initialized variable X. And then we tried to print without even something, right? So when we compile with the F-analyze flag, GCC, in that, we said, OK, it is telling you, hey, there is a region created on the stack here, and then you use of uninitialized value of X here. So this is referred to the CWE 457. So when we go to the internet and search for that, it will tell you what kind of potential issues and why uninitialized values are important. Definitely it, like we said, in some language, like C and C++, the stack variables that are not initialized by default and they can contain junk data, right? And that junk data could be used by attackers to control or read those contents, right? The initial values usually contain like the junk that we have. And the problem is that the attacker can pre-initialize because it's not initializing your code with whatever they want, right? By changing the stack, the overflow, but multiple ways to attack the uninitialized. And, well, where does it affect? Well, this is the least when you go to the CWE vulnerability, I forgot the number 457, you can go over there and these are the CBEs numbers that are affected and the projects that are gonna be having that kind of effect. So it's a real security issue to not initialize these values. Thanks to the FNLizer, we could easily detect that in our C code, especially even in the most basic C code as the one that I use in that example. Let's go with something much more complicated or more elaborated on this thing. Anybody see, like, what could be wrong over here? Yes? They might not be what? Okay, that one thing, yeah, sure. That might not be an RB1. Other thing? Okay, great. What about line nine? Yeah, in that perspective, that could be. Now, when we compile with F analyzed, we see that there is a CWE 457, which the number could be forgotten in five minutes, but the thing is that it immediately tells us use of an initialized value for both here. Despite the fact that the logic could be wrong and my code could suck, but imagine that in a much more really production code that you can create. So the thing is that it helps with this, basically, because I didn't want to put like the hundred lines of code of something in C that really makes sense, but just something simple that could say, hey, so that is kind of an issue that we could see over here. And again, why it's important to catch this that are some CVEs that actually are with this same error. The buffer overflows often can be used to execute arbitrary code, and we have seen presentation about that before, right? By a simple overflow, we can have return oriented programming box in security, and they are awful. I mean, I remember that I did an example a few years ago with a simple arrow P attack. It was possible to call a function that was secret at all just by finding the value of the function in memory and it's horrible and very simple to attack. It's usually outside of the scope of the programmers to implicit security policy at some times. A heap overflow condition, it's an overflow condition also, well, the example that we provide here where the buffer that can be a writing is allocated in the heap of the portion of the memory. It means that the buffer was allocated using a routine and such as malloc or other ones that you could use. So that's a really security issue, and the goal that we have with GCC 12, it's trying to differentiate from those kind of things into the source code. Even in the application that do not explicitly use this function pointers, at runtime we could have or seen this kind of errors, right? Well, how can I use this new feature? Because the constraint that we have always is my operating system, my distribution does not provide GCC 12. And how can I start to test this? How can I start to experiment with this new feature? Well, it's a very simple few steps and I just try to replicate before the conference as not as a demo, but as an example for this. So we said, okay, let's download from GCC, GNU, GCC 12, you can go there, there are plenty mirrors and you can just W get the table. And I made this simple script so you can use for any purpose that you need and it's basically very dummy. You create a directory for build, you just run the configure. The flags that are specified over there are the ones that I consider like the most basic one. For example, enable programming language, I only enable C for getting about C++, Fortran or something else. Just for the sake of this demo, I just enable C. I disable the multi-leave, disable multi-arc just because my computer was an x86, so I disabled that part. It reduced the compilation time of the compiler quite a lot. Trust me, it consume a lot of time to compile for multiple architecture. And especially if you only have an x86 on an arm or whatever architecture you're trying to test, you just need to compile for that. And make and make install will create a binary. The binaries that are necessary, so you will have things like C++, GCC, the compiler for Gcov, for Recoverache, and so on. So that's only the three steps that you have. And once you have that, you will be able to say, hey, so I can do GCC dash dash version and voila, there you are, GCC12. But if we come back to the first slide, well, we need more than the compiler, right? We need the sampler, linker, and so on, and the loader. So let's go through that journey. The other project that you need to download, it's gonna be the bin utils and we will arrive to that in few slides. Other flag that also it's released in these years for the compiler, it's ftribial out to bar any choice. So using exactly the same example, you can compile your code and every uninitialized variable will be set with the same pattern or with zero. So you can define that when you compile. Why is that important? Because with that, instead of having random garbage or random junk at your binary that you generate, it will always have the same pattern or all in series, right? So it could help you immediately when you're running your QA test on top of your code, detect and say, hey, I was not expecting definitely that horrible value. I was expecting something different. Well, it's because your uninitialized value, it was there. Or also from the security perspective by putting this flag with pattern, your variable will not be uninitialized. It will be initialized with that specific thing, not with a random garbage junk on that section or with series, whatever you prefer. And yeah, yeah, sure. Thank you. So let's put this flag and set it to either zero or test pattern. Does that initialization occur once or it occurs every time that that stack variable gets created? When you initialize, you generate your binary and it doesn't matter how many times you run, it will have the same thing inside. So in the compiler set int x equal to that pattern when they create a binary. So it's generating clearing instructions on every invocation of that function. Yes. Thank you. Yeah, no more questions? Okay, thank you so much for the question. So here is another example actually that it's used the incorrect use of new pointer and undefined behavior. And the same thing will be catch over here when we try to do this. When we compile this, the application, it will detect and say, hey, so there is a region of creating on stack here and the uninitialized value that we have over here, it will have the link to the CWE that you can go and say, hey, so you're telling me compiler that there's a security issue refers with this number. And then when we check, it said that the application calls free on a pointer to memory that was not allocating using this at a heap location. There is a potential for arbitrary code execution with privilege for the preliminary programs and so on. But again, the idea is definitely we want to enhance these so that people can use more and more and more and improve or enhance the security of the code. And this is like a little kind of examples of what we have in the compiler for security things that has been enabled. And it's for free. I mean, you just need to enable the F analyzed flag by itself. It will try to. By the way, before moving to the other topic, what happened with the kernel? David Malak provided a very good, highly recommended for reading block in Red Hat, telling about using this flag for the compiler. Yes, it increased the time of the completion of the kernel at least 50%. So it increased quite a little bit the time, but there are some patches that are going to be put in the upstream for even doing analysis of the kernel for assembly language. So it's a working in progress. It's a little bit of a commercial, but yeah, stay tuned because hopefully in GCC 13, there will be more improvements for the security catch even for the kernel at that level, which is going to be awesome. Now let's go with for performance. Victoria said floating point. It's a really good quote because they said that the only way that we could achieve the performance it's that we interact with the hardware, right? Because I remember it's a funny story. I was in a Python community conference and somebody was showing about the performance optimization for Python and someone and somebody asked, hey, what if I have already optimized the library? What if I have already optimized anything from the software perspective? Somebody said, then you need to go to the transistors because our job finished when it started to talk about hardware. And to be honest, I didn't think that it was true. I mean, as software developers, there are some layers of the software development that needs to marry with the hardware development. A new feature from hardware needs to be used by the software, in this case, by the compiler and tool change in order to provide these features to the users and needs to be transparent at some level. Definitely, user from OpenStack, PyTorch, or so on, they need to have the free layer to say, hey, I want to use the library and whatever happened in the back that used the specific new instructions that architecture, out of my scope. I mean, for me, it's fine, but somebody needs to do that. And especially the compiler needs to provide that layer and so on. So the first thing that we're gonna talk about it's the vectorizing floating point that was released in GCC 12. So let's first discuss about what is vectorization because there is also a new feature in GCC 12 regarding to vectorization that apply for everything. For every single architecture, doesn't matter if it's arms, x86, doesn't matter. So what is vectorization? Assuming that basic code that does nothing, very dummy, the vectorization, without vectorization, we will have in the hardware this kind of behavior. So you will be using A1 plus B1 going into C1. And the rest of the register will be empty, not being used even because you're just compiling to have one integer per register. So if the register is 128 bits, in the end your integrated 32 bits, the other three section of the register will be empty. So yeah, definitely it's in use of resources. What vectorization happens that, and vectorization applies for multiple architecture, not just for x86, there are also vectorization for arm and so on. So vectorization said, okay, why don't we put multiple data and using the same instructions, SIMD, single instruction multiple data, we perform in let's say one cycle, could be more, the same operation for all the data as long, and this is important as there is no dependency in each other. So for example, what we cannot do, it's to say A0 plus A plus B1, and instead of these, that you need the result from here or these one needs the result from these operations, yeah, that breaks the logic because you need first to perform that operation in order to get the result, put that into the other part of the register. So it is not like in parallel that we can run that thing. So single instruction multiple data, it's okay. Other ones that has dependency, not that okay. And the pollution of vectorization at least in x86 has been that way. So we have had XMM registers that goes from zero to 127, YMM registers that goes from zero to 255, and CDMM registers that go from zero to 511. So yeah, size has been increasing more and more and more. And last year, we present about the new tile register that is gonna be implemented in the next generation of servers. That by the way, it's gonna be the first register of two dimensions. So you can configure your tile registers, but different story. And the analogy for that is like having an stroller. It's much more simple that you have, if you have three babies to buy a big stroller and take the three babies to the park, that's taking one baby at a time and so on. So it consumes much more time. Now I know with my two kids. Vectorization is now enabled in O2 optimization by default. It used to be there, then it was, not for some time, but it's back to be enabled in minus O2. I don't know the reason. I didn't follow that discussion on the mailing list, to be honest. But talking with the maintainer of the GCC, they said, yeah, let's gonna put that on O2. And what is O2? Because the topic of this tool was like discovering optimization in tool change in 2022. So GCC provide multiple layers of optimization. O0, O1, O2, and O3. Let's discuss those. There are OS for small, O fast and other ones, but let's discuss just this first one. So the O0, which is the default if you don't execute nothing, it's reduced the compilation time and it will try to run very fast, not the fastest one, but it will try to produce just the simple binary functional and that's it. Now, with O1, the compiler tried to reduce the code sites. That's the most important function of the O1, reduce the code sites, and the execution time does not increase that much in that case, but does not perform any optimization that take greater deal for the compilation time. So the O2 means that all supportive optimization that do not involve a space speed trade-off, meaning only the basic optimization like vectorization, but I will not make things that I will do in O3 like, for example, F loop interchange. Let's discuss, for example, difference between O2 and O3, which is very, very interesting. O2 now is gonna have the kind of vectorization that we saw over here, like this thing, right? So it will improve the speed of your program that has this kind of code by using vectorization. However, there are some other flags like this one, F loop interchange. F loop interchange using the same example that we had before. Imagine now that these four, it's different. So, well, it's not actually the same one. So max hits one and many zeros and this is 256. Yeah. Question? With optimization, when you keep increasing the optimization, it takes longer to compile. Yes. That's not a big deal sometimes, but the other thing I'm kind of curious is, how much if it detects a compilation error that didn't occur in the initial optimization, is that pretty much a guarantee that your code is broken or is just an artifact of the optimization process? Yeah, there is one mandatory thing that we, let me see if I get the question, but there is one mandatory thing that we, as compiler developers, we put in front of our computer, do not mess the function of the code. So in terms of time, we perform the lexical analysis, the syntax analysis, the semantic analysis, and we generate pseudo code. And in the end goes to the performance optimization at the end. So the performance optimization doesn't change, doesn't detect bugs in that case. So the box detection could happen in the first three section and performance optimization is just rearranged the code to make it faster or smaller, but no box detection happens at performance. Did I answer that? Yeah. Okay, thanks. You wanna give a comment? No, okay, perfect. So let's go back to this example. From the software perspective, yeah, we enable victorization and perfect. We're now using the full register, we're now the wasting resources. I don't leave a key at home at the time that I take the other one to the park, perfect. But there is one block over there from the hardware perspective and it's the cache. If you change the order of the loop in this section, you can grab more memory, put it into your cache and do less calls to the cache in that case. Maybe this example would be the best one, but there are other ones that, by changing the order of the loops, the access of the memory, it's less and you perform your software way much better. So those kind of, here's the thing, you change the order of the loop, you have to take care that it could be possible with the logic of the source code that the developer is requesting to you. And those are the kind of things of optimization that happen on O3. There are plenty of blocks and comments outside internet that O3, it's evil or dangerous or so on. I mean, it's fine. Everyone is free to believe in and so on. My experience is that O3, it's very well designed for sustaining those kind of fixes. And F loop interchanged a very good example of some times that improve the performance. I don't have the exact number of how much because it depends on the platform and so on. But that's the nature of changing from O2 to O3. In one, you use vectorization. In the other one, you're gonna go to try to attack things like memory access and so on. It's much more aggressive. Now, these GCC-12, it's gonna provide a new ICE extension. Support for ABX-12 that if we come back here, it's gonna be, sorry, these one, the CETA-MM registers the third evolution of vectorization in X86. And it's, the important thing about vectorization is that it does not only need meaning of increasing of size of the register, but it also meaning a new generation of instructions for some specific use cases. And that's a good example of this release. So before we used to have a CETA-MM register that was 512 bits length, perfect. Now, what if we as engineers create a single instruction in hardware that actually provide floating point operation, floating point 16 operation with those registers. And what kind of operation can we do? Why don't we multiply complex numbers? Yes, I was like, what? So yes, this thing actually perform operation of complex numbers. So it also can provide subtracts, fuse multiplication addition, and also can perform conjunct numbers at that level. So yeah, it's very surprising. In one single operation, you can write it in assembly or yeah, I'll arrive that part or you can write it in C, complex conjunct. So they are useful as you might imagine for the discrete Fourier transform or inverse the discrete Fourier transform. Where is that used? Well, I'm not engineer over that field but I did a little bit of research and they are very, very much useful for telecommunication and so on. Again, not my field of expertise for sure. You can do that in C. So some of the fears that we have, it's like, oh, how am I going to be using that new instruction? I have to open my code and set start assembly, end of assembly, and put, no, it's not necessary. So we provide, for x86, there is a library which is intrinsic, intrinsic.h, that it compiles with GCC and a clang and everything. It's completely open source in that section. So the thing is in, you can say, okay, I'm going to use this intrinsic function. I'm going to pass the values that I want to use. The only thing that it's different over here is that instead of passing an integer floating or whatever, you need, it's only register to register operations. So you need to do memory to register operations so that you can put your thing into the registers and it could be performed. So the ISA, it's limited to only register to register and we can imagine why. Because doing complex and conjunction operation from memory, trying to do the load and store as microbes could be a little more complicated. So we try to keep that simple and just leave that operation as instruction based on registers. And yeah, in the intrinsic.intl.com webpage, you can find the description of what each one of the function performs. And again, the description of the instruction. Again, it not only works for complex, it also works for conjunction numbers over here, right? And the conjunction of a complex number, it's formed by the negative, it's imaginary component. So a common operation, it's to multiply a complex number with a conjugate number of another example. So it is also supported those kind of operations so you can mix the world of the two at the same time. And here's an example in C. So you could say, you know, I'm gonna make an addition, I'm making of these numbers and it could be represented as it is. When you said GCC 12, by the way, by the way, you try to do this in all other version of GCC, it will not work because it will not be able to detect that it's possible. And so okay, this is the test code that I have and I use minus O2 and I also use the new flag that it's gonna be available over here and so on. This is not necessary, which is for the vector length. So it's fine. So you are gonna see in the object code, you will see that, hey, this is the same instruction that I was trying to compile. But hey, we cannot execute object code on top of our operating system as it is. We need a assembler and a linker. Where do we get those? Well, you go to beanutils open source project and you download from there. You can compile it, yeah, sorry. Regarding the AVX 512 extensions, does GCC take into account now the down clocking and avoid using those when it probably won't be helpful overall? There are some works in discussion regarding to that before the overclocking. However, in terms of performance, there are some pros and cons about that. I have seen some patch and discussion, but I don't know what would be the final solution for that in the community. Yeah, back over here. Thanks for the question. So the script that it's very simple again, it just tried to build beanutils and with some specific flags that I choose by myself that has been useful and this is my personal script that I use in my daily job. So in the end you will have the assembly, you will have the object DOM, you will have other tools that again, part of the tool chains useful for us and like the blogger and so on, beanutils. So in the end you will have all the tools that you need and you can perform and say, hey, so from where I install I can get the assembler and also the linker to generate the binary that I have. And when you check the object DOM of the binary that you generate, it has the opcodes, it has the instruction in assembly and it has the memory address where it was supposed to be in your text section of your binary. As a summary, well, compilers and tool change, in my opinion, play an important role in our community. Healt to create and build new open source software from kernel to libraries are part of our main goal as compiler and tool change. So we ask to the community to embrace the new versions because sometimes like a little bit of fear or I don't know, I don't want to test a new compiler. The knowledge that we do is like the telescope. With an old telescope, it used to be like beautiful, it works, we can see the moon and personally for example, my wife gave me a telescope like for house and yeah, I can see the moon, it's beautiful. But somebody create a new tools, way much more complex to use, way much more complex to provide new features and it's possible to discover new journeys or new paths over there. So it's the known, to discover the known potential that we have in the software, it's one of the main tasks that we have in the tool change community. Thank you, thank you so much. Any question? Time for two. Yeah? Time for two. No, okay, well, I appreciate your time. Thank you so much.