 So hello and welcome everyone. I will be talking about features you might not know exist or or Might not know Are have helpful Little bit the background info about me. I'm working here in Red Hat as a software engineer. I work there slightly over four years now I'm working on Libvert actually and You might wonder why I'm doing presentation about compilers when I'm working on Libvert and you might think that I'm not an expert right and To that say you are right. I'm not I'm just trying to Convey some information. This is an area that in that's Pretty interesting to me and I would like to show you what what the things are What we can do with it and maybe show you something that you will use and it will help you in the future So Let's start with a first question I'm not sure if you're familiar with this code If you ever seen anything like that Yeah, I forgot to tell you if you have a good question or if you ask me something There are three Lovely scarves Defcon scarves Feel free to ask if you ask good question. I'll give you one you can come for one after the So first question from me is what can you do with this code? What would you do if you stumbled upon a Hello world example for some language. What's the first thing you do? Yeah, sure compile so if you compile it How much time will it take one second two seconds? How much time will it take if you need to tweak the compilation if you need to Do something on your own you can't use the compiler the compiler does not do Something you want to do with this code so you have to write part of it if you don't know how to use the compilers that are available you have to write your own and That would take that would take me months maybe weeks for some of you But what I want to show you here is the custom compilation of C code is Pretty easy. I mean, I'm not easy in a Pretty specific way, but maybe easier than you might think So what do you expect in this presentation? I'll be talking mainly about GCC and C-Lang I'll show you what GCC plugins are how they work what you can do with them There will be an example Then I'll switch to C-Lang and LLVM a little bit a little bit big wrong about that And I'll show you various tools that are out there that you might not know as well You should not you should not expect some basic introduction about how to do GCC hello C I'm not going to be explaining that and This is not going to be very language agnostic I'm going to mainly talking about because C-Lang so I'm mainly talking about C-Lang languages and Of course, you might expect some other things that are in Technical presentations, right something will stop working because So Compiling if you think compiling is binding of comports This presentation is probably not for you, but feel free to stick around Maybe you'll learn something new So to the point You simply need hello world example if you want to compile it To binary What are the phases the compiler will go to go through? Anyone? Yeah You can generalize it a little bit Yeah Yeah, I Still generalize it generalized it a bit more So first thing you do because there's an include And stuff like that you preprocess it After that you parse whatever code is there. That's the That you you're parsing the elect seems and you're doing syntax syntax So Then you Get some language, then you get some abstract syntax tree or something and you can do first first passes of the optimization Then you compile it even though it is the same word as the compilations a whole It's being referred to as a compilation when you're Changing the when you're transforming the abstract syntax tree into a into a Or assembling into a assembly language. Yeah, sorry for that then you assemble it into machine code and You'll link it and of course I did not mention Bunch of other stuff you can do several passes of optimization and so on and so on So from user point of view as we said, it's basically just running a command just run C lang Hello, see gcc minus or whatever What if you want to change something you want to change how the compiler behaves you want to you want to see What the abstract syntax tree looks like you want to do some checks there For some of the things there are options They are generic enough for example, there is a There's an option to stop after the processing Anyone would know the Argument for that Great good audience I don't have that much scar here Yeah, cool. How about to stop after compilation so you can see the assembly code that was generated But the options are pretty limiting What if you want to do something more? You might ask what more should I do right? Well, you can You can enforce some third-project coding style you do not do that in In the compiler itself because everyone who uses the compiler would have to obey your coding style and and There are some people out there who do not think your coding style is the perfect one even though you know that, right? And What else type integrity? I'm gonna show an example on that because it's harder to explain without code itself there are some things you might Eventually or someone else screw up and you can question you even though compiler did not See that there's a problem I'm also mentioning custom code generation of course the compiler generates code and bunch of code But I'm talking about custom code generation for example there's a There's a thing that you can do you can do all the function overloading in C where you generate a different code based on based on What the function parameters are so on you can also be custom code generation for example if you have a structure and You want to do you want to have a function that does deep copy of this structure for every single structure there that mean deep copy means we all the pointers we allocated and so on and You can You cannot do that with with some parameters also code completion I'm mentioning partially because C-lang has a C-lang has a binary for that you can run C-lang It's not complete. I'm not sure about the exact name but you can you can run the C-lang binary with Parameters and some code and you tell it where where you are and it will spit out all the Completion options. It's being used from From text editors and so on so The example for the type integrity what I want to mention Is this let's see? let's say we have we have Some header file with structure based on what you are compiling your project with there might be some some How should I put it? Some part of the structure that is there if you're compiling with let's lip something and it's not there if you're not compiling right then you have File with the same file name Which is actually my functions, let's say I just forgot to change it and you have a function that Does something on that on the structure? Also based on whether you're linking with that library or not you access that object that is for is not there and Then you have main which Includes those two files and yet again, I have a lot of them And it creates some structure Initializes to nothing can then runs runs do something on that Can you see a problem with that? Yes What I wanted to show you here is if you look at My hands are shaking. Let's let's try this Yeah, it's usually fine on the case, right So It's not and that's the problem. All right, so for example in here The structure the file that gets included will have everything based on the config age If you include it here like this Yeah, if you include it in here and I forgot to switch those two lines It's handwritten Yeah, because I wanted to do the smallest smallest example possible If you switch those two lines the first thing to get included is my structure and Then the function so if you if you create a structure, it will not have this this pointer in there So if you call The function of the structure do something it does actually something It's trying to access Member of that structure that was that's out of bounds of this structure So you get a second, right? This should be a C file Yeah, I Wanted to do both header files and see files and I and I forgot to Right this this should be C file and There should be part of it that there should be also header file which includes Configure The thing the point here is that you can have Two files and in each of those files. You have the structure with the same name You can pass a pointer to the structure but they will From both files it will be visible as a different As an structure with different sizes Did I Explain that correctly Sorry for that I Wanted to friend of mine told me that I should show you the type enforcement or type integrity check So I just try to do it as fast as possible Because up it's Imagine if I if I try explaining this without the code it's it would take much longer You may say it's very rare. Yes. It is very rare Should probably not happen It is that kind of should not happen type of problem and debugging this is not that easy and GCC will fire a warning if you run it with FLTO, so if it does lean time optimization I'm not sure about the details, but it does checks on some on some Structures and and and it will find out basically that it's not the same structure and it's exit from those two places So I mentioned GCC a little bit overview GCC's new compiler collection this Bunch of tools in it. It can compile a lot of languages and And For example, there's GCC. That's the whole package g++ for C++ compilation can compile go Java other code and so on and it also has a as a tool chain Toolchain those are basically the tools that the GC the GCC runs in order to go through those phases Those phases I mentioned earlier for example CPP does the pre-processing Is does the assembly? LD is the linker AR archiver and so on you're probably not right you can turn on verbose compilation with minus V and GCC will show you Each one of those tools that's being run with the parameters and Much of other info So compiler phases in GCC Let's just go through that quickly. There's link that there are language based passes there Which as I said, that's it depends whether it's g++ new goal and so on the language gets parsed into a abstract syntax tree Which is called generic because Most it's not very generic does not every single code uses that directly What's in it there, but it's a generic tree that's and is the abstract syntax tree representation that we can that you can work with and There are some language specific trees and Other information that gets passed with the generic tree to other layers, but this is this is language dependent When you have generic tree you can run it and whatever the language is on top of that you can Run it to the other passes Which one of them is called? simplification that's basically Transferring that generic tree into a into a Three address three address representation. Anyone knows why it's called gimple gi and p er I First thought it's it's generic implementation But it's probably not it's based on Some earlier simple language that did not have go to and some other things So I'm referring to that as a simple with go to And of course there are other passes Three ssa RTL is those are various optimization phases and so on and then the assembly Where we don't need that In the detail, let's switch to see like an LVM So see like for see like languages is a binary that uses Lip C-lang also part of ceiling our lip format and lip to link and other libraries to my juice and some binaries like C-lang format that does something like indent Little more intelligently C-lang tidy it can fix some stuff for you and so on and that basically if you if use I I might have it on the next I'll wait with that and then you have LVM, which is which is low-level virtual machine and That has its own tool chain It also has its own language, which is not I can't say it's even similar Let's not go into details You have a bunch of binaries there you can use for assembling the assembling the bytecode It is all bytecode You can you can compile it into that bytecode and you can interpret the bytecode so It might be similar to Java for someone, but this is not used by default when you run C-lang file C The presenter stopped working No, okay So C-lang does that if you recall the previous slide for GCC in the phases Here ceiling does that does that language-specific parsing If you want to use LLVM you should use minus minus emit LLVM otherwise it will use Linker and assemble assembly that's in the system and you will get normal binary with emit LLVM Then LLVM language for your low-level virtual machine LLVM language is Something that's used in the bytecode and as I mentioned is that bytecode interpreter so it can Work like I I'm hesitant to say like Java because I don't want you to throw oranges at me So quick recap what do we learn? So we learned what GCC is you probably knew that right? Roughly how it works The same for C-lang LLVM What are our similarities you there are similarities of course you have to do similar things with a code if you have this if you want to have a similar output, right? but the The similarity I'm referring to here is that There's a split between language-specific and language agnostic Part and also what the differences are everyone every part of that looks different and works different so What do those compilers expose that you can use to? Somehow use the power they have in in themselves already, so you don't have to in rain when the wheel GCC has a plug-in API you can write plug-in for GCC and Interact with what GCC does C-lang can Do the same? Way way differently Then there's leapsie like as I said C-lang C-lang is is a is the binary and it comes with leapsie lang it uses leapsie lang but You can use it as well The nice thing from them is that basically C-lang is just Small relatively binary on top of leapsie lang and leapsie lang does most of the hard work and it's exposed to you so you can use it and there are other other other libraries like lip-tooling for reformatting and Actually reformatting but re-factoring the code the format for reformatting and so on so Really really super short introduction to GCC plug-ins GCC plug-ins just a module that has plug-in in it GCC calls DL open on it calls the plug-in in it and You can then register callbacks for different events Those events may be that something's happening or it may just be a reaction to GCC asking us and we'll we'll get to that And use it and use it like this you just run GCC with F plug-in path to the plug-in and F plug-in argument name equal value and This creates an argument for you that you can use in the plug-in Of course you can pass more of those So I'm not sure you will see this But it will be in the presentation and I can show you after that if we get to demos Why don't you hear is that Basically you include GCC plug-in H you need to define plug-in is GPL compatible. Otherwise it will not load There it had there has to be the symbol But doesn't matter what it is and if you use it or not, it just checks it. There's a symbol There's the plug-in in it function You get the arguments from the command line You get some information about GCC version You can register a call back to some evens I chose plug-in finish just to say goodbye after After the plug-in finishes There are a lot of things Lot of things you get in in those structures that you can use Well, I don't think we need it. So the plug-in compilation is pretty straightforward Minus shared is enough to create a share of library this part here is To show GCC where to include the file from So you run the GCC you want to use the plug-in for and if you sprint file name Like the included tool it will print the Path to that include there to with with all the header files and And then if you want to use it as I said you run it with F plug-in I chose to compile the same thing here just to see the difference and the only difference is with this parameter and This is the output you will get or I've got So Little bit info about to what we can react. There are the events as I said. There's I use the plug-in finish There's there's plug-in info event, which is as I said some Not not even that's happening, but GCC is asking you for information about your plug-in so it can output that information into verbose logs and and another another event is plug-in pass manager is set up where you set up Passes you want to react to so don't Don't mistake events for the passes. We were going through There are there are passes for that and there are way too many other events that it's Today is not enough time to cover all that then there are Then there are those passes for game players pass Even more others That's all to be found in a documentation Let's switch to see like See length is as I said library used by the sea line sorry is library used by the sea line itself and What was a good thing is that it's it's public library and Another good thing is that it has a stable API apart from the The sea line plug-in API you can write plug-in, but it will not work in the future Lift line has a stable API However, it has very sparse documentation I read somewhere to the developers themselves See the source code as a documentation and you can you get automatically generated oxygen and that's it There's one presentation from some LLVM LLVM conference, but it's very nicely shown how we can par something with it But what's nice is that there are Python bindings and what can you do with Python bindings? And for example I Python you can introspect the code if you don't know what this function does you With I Python you put question mark there and you get the doc string and it's easier and faster than to then searching through the documentation so So I chose to show you an example of Python What you need to do always after you have to import CLNC index is you create index and That index is able to parse a file You can give it parameters It's usually you usually pass the parameters you get from command line and then if that tree that creates a translation unit the translation unit has something that's called cursor in it and cursor basically I've it as a pointer to the abstract syntax tree You just call a function of course there. We just call on this cursor here and that function basically takes the know the course of points to see if It is it's it's coming from file if the file name is the same as here then Yeah, if it's not the same as here we return because we only want to output information about the file and Then we see if it's structure declaration for example if it's structure declaration we output the line and colon and the name of the structure, that's it and Now we get in and of course you have to traverse it to through all the children because it's This is few lines and you get for example all the structures So just a quick quick example Some interesting plugins I found over there there's dragon egg GCC plug-in which is plug-in for GCC that will Output LLVM code so you can use LLVM bite code using GCC and then and that means you can use GCC as a front-end for all the languages it can use It can parse into LLVM. I don't think it's finished yet, but the plan is supporting all languages There's GCC Python plug-in Which I did not mention earlier for a reason Since GCC switched from being written in C to C++ I'm not sure how it was used earlier, but now GCC Python plug-in is quite hard to use For me, I did not dive into details, but it's basically a Python compile as a plug-in for GCC. You load that plug-in as an argument you give it You give it the path to your Python script and It runs Python executes that Python script and from that Python script you can react to all those passes What I forgot to mention with those passes you can Replace them also not only check what they are doing, but replace them with your code and generate something else There's VXSRS, the XRC plug-in from Mozilla if I'm not mistaken, and it's basically for large codes a code search and I should say a navigation tool For for example for Firefox Yeah, another plug-in I wanted to mention and I mentioned it the GCC plug-in and I think examples or extras that will do that Function overloading in C So you're not using C++ you're using only C, but you have functional overloading. So Now we can get to demos We've got still few minutes So I can show you Some of the things went through so pretty quickly because we have like seven minutes Is this visible readable So I mentioned those C-Lang format and C-Lang tidy binaries Let's start with those. I have some Ugly code in here As you can see there's there are a few problems. It's the intendation isn't right. Of course It's the easiest one and this This initialization is basically deprecated since 1998 So there are few small problems with it If I copy that here and do C-Lang Format On it it outputs Quite nicely you can specify So many parameters to that and and and change a lot of things I just wanted to show you that it's it's easy easy to use you can also You can also call a tidy Which will not only tell you what the problems are That there's some old-style old-style field designator for X and Y but if you run it with fix it will apply two of those such as fixes and That means That means it will fix it for you There are a lot of fix it's if C-Lang is able to provide a which is called fix it and What what's wrong? Basically it looks like this. There's a problem and this is a fix it for it then Then ceiling ceiling tidy can fix it for you What else? That's the I've got the Script here that I showed I don't see left column here. I'm not sure whether you see it But it's basically what we went through there Let's use it on that ugly C file Yeah, one one thing I did not mention. I'm not sure whether whether live ceiling Python bindings are available for Python 3. I only have them for Python 2 So if you cannot import it be sure you have Python 2 run and because well, that was just one structure you get this Structure point is declared on line 6 column 8 if we were if we were to change it to Delete this You would see that it goes through that that was the that was the skip for files other than a VC We'll see that it will output a lot of things everything that gets preprocessed and included gets To show the GCC plug-in well as I said, this is how I compile it the I don't think I need to Show you how the plug-in looks like right? It's the same thing One thing to mention that I forgot to mention This is how it works, but If you only use this and do not put path in there on the final name final name It will not be able to load it because it is not in This path this is a path where this will output a path where you need to install your plugins in order to in order for GCC to be to be able to find it without a Relative for absolute path well Don't have much time left So let's get this Where did I put the here um It was just pretty fast fast Few demos because I want to show you show you the Documentation links and tell you something else So there are there's more info you will get those slides posted on the DEF CON website The first is linked to GCC internals documentation The second one is as I said the the ellipse lane does the oxygen automatically generated without any How to use it? I want to mention Ellie Bendersky's website. I'm sorry if I'm pronouncing it incorrectly Because I don't know him personally, but basically that guy posts a lot of very interesting information there and and so, so there's valuable information about lip see-lang and other interesting stuff and The last one is the link to GCC Python plugin and some Information about me if you want to contact me As I said, I'm not an expert in this I just want more people to be interested in it and I want to continue that work to create something usable Quick recap we learned how compilers can help us What we can do now we we are able to write the GCC plugin we know where to find the documentation so we can Do some stuff with the c-code We know how to use this live ceiling library and Some other tools that you might implement this syntax checkers for your project or Use for refactoring you know where to find more info and most importantly You got the contact for me and that's your reference to who not to blame if it doesn't work for you because That's not my fault. So I want to take you back to that first code again. Whoo Miss behaves Let's look at the code again. What can do we do with it now? We can do the same thing What we know a bunch of other stuff we can do with it. We can change it in Compile time we can list information about it And so on and it will not take much time. We have the documentation. We know the tools And so on So what I wanted to leave you with is My first idea on the first slide that custom compilation of C and C++ is easy and That's it. Thank you. Should you have any questions? Feel free to ask me You're faster the first minute. I can't spell just now Oh is the, uh, button plug-in. And for that you can, you can basically use a debugger. And of course you can use a debugger for, for GCC, right, like in JDB. But if you write, if you write a plugin, it's quite easy. Just register, register all the, all the passes and everything. Uh, you can, you can then just use step, see what was where, step, see what's where. I, I have not tried it, but um, I think it's okay. Yeah, sure. Let's move somewhere else so, so there's a, can I grab this? Yeah, sure. Yeah, I am. So you can use this if you want. Yeah? So remember that I want it to miss. It's good to have you carry on. Yeah, you can use, I think there is a, um, I want to use a, this red button, some with a red, you can press it. Yeah, that's why. Whatever you guys mentioned earlier, if you like, it's also for pages. Okay. You just, you just configure C++ and it gets run for all the files. Yeah. So. Yeah. That's the second desktop. Yeah. I guess it's your second desktop. Oh, yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. It's so much work to do. I don't see the moment, in general. I don't see the right time. I think it's the right time. We are here for the better. I guess. I think it's the right time. I think it's the right time.