 Okay, welcome. Good morning everybody. I'm Stefan Backman from Redhead and this morning I'll talk about how to write Clang plugins to have more fun and more profit with a code base. So I'm not sure if everybody in the audience even knows what Clang is, let alone the Clang plugin. So if there's anybody who does not know, I'll start it off more slowly. Anybody who already did write a Clang plugin? Half of it. Great. Björn asked me to why not make a talk on this, but Björn is not yet in the audience, but meanwhile he did write at least once, so he probably knows by now how to do that. Great. I'll start off with a little example. I'll write out of our code base from a couple of weeks ago to get a reason to actually do write some plugin. This is some C++ code. Anybody in the audience who does not know C++? Oh, it's a technician. Is the recording going on? I don't know. Okay, we don't know. So yeah, this is some arcane kind of C++ code. It has a bug. It is a destructor as you'll see, or not by the thing, and it tries to have some stream that it wants to receive to the original position when it goes out of scope so that the stream that it's operating on is back at its initial position. So what it does in this destructor is to seek back to the original position. If anything with the seeking goes wrong because your disk has disappeared or the network is down or whatever, it tries to catch whatever error because it doesn't want to throw out of the destructor because throwing out of destructors is very bad. Yeah, shame on the person who wrote that because as you all of course see this code has a bug. It didn't use a normal try-catch block. It used a function try-catch block which is directly written past the function declaration to try. It falls off, there's no real code body. What this kind of construct does for all those who don't know it yet and never seen it before, there's all the stuff the destructor does including destructing the sub-objects and the member variables. And if anything of that throws an exception, it goes into the catch block. So it will catch this one as well or the body stuff as well. But in that catch block it re-throws your exception. So it is different from this one which will do the body, catch the exception, then destruct all the basic classes and member variables and this is what the guy wanted to write and the other thing was what then actually re-threw the exception on a failure to seek back and then cause the program to abort. And when I found that, I thought that's one incident of this, maybe we have more of those. But is there a way to grab for that? You can grab for try, you can grab for catch, you can grab for the tilde and the destructor but you won't find any of this with these simple techniques. So playing to the rescue, with playing it's easy to pattern match or find things in the abstract syntax tree that is derived from this actual syntax that you saw there. So the compiler takes your text, your character by character stuff and transforms it into an abstract syntax tree that still represents the structure of your code but not the nitty details what characters were written at that place but it still contains the essence. So the idea is to write a plug-in that just hits on these cases of using function try blocks with the destructors. This one was one with an empty handler is even more telling that this was a mistake but all kinds of function try blocks on destructors are kind of always bad because what you should actually do is not do this stuff in the destructor in the first place. So we take it and just find all of the destructor try blocks in our plain fucking that we are about to write in a moment. So this talk was presented as bring-in-your-own-computer. I didn't know that it would only last 30 minutes so it's probably not realistic to expect that you all pop out your laptops now and start typing. But we do have a hack list tomorrow so if anybody of you by then is still interested in this stuff we can, I'm glad to help anybody set up to actually do write your plug-in of choice. So what you need to get this working is of course a clang compiler. Most of the time when you do develop for LibreOffice on Linux at least it will be the GCC that is from the distro. So you need to set up some things on your machine to get a clang working. Nowadays per distro it's mostly easy to just install clang you have to install some additional include files that you will then link against that export the functionality from the clang nerds to your plug-in. So you often need to install some additional packages. And then when you compile LibreOffice in the AutoGAN input you just need to set two variables that tell it to use clang instead of GCC. And in the non-debug builds we don't build the plugins by default so if you want to play it safe and do a non-debug build, which developers never do anyway, you will explicitly also enable the compiler plugins. If you happen to build your own clang, which is also built because trunk is moving fastest as we in LibreOffice do and brings in new features all the time and sometimes new bugs which you can then report to them. If you do build your own clang you can enable assertions in the clang code that might be advantageous if you write in your plugin some code that is not actually working right. Then there will be lots of asserts in the clang proper code that you have then fired and then you learn earlier that your plugin doesn't work as it should be. So the basics of the plugin is just some CX-X file that you drop into one directory in our code tree in the compiler plugin's clang directory. There's already many of these pieces there so you can take inspiration from looking at any of them. And there's kind of a boilerplate structure that all these plugins have. You'll define one class for your plugin that needs to derive from two other classes from this recursive S-visitor that I'll speak about in a moment. And from our own plugin helper class that Blue Wash Ones wrote that does the plumping stuff to get this plugin into the framework of all the plugins running when you then compile. So it needs a constructor to get some data from our plugin instantiation stuff. And then it needs a run function that starts off all the stuff. And this is some optimization already. It checks if this is actually a C++ compilation going on and not a C or Objective C1. Because these function try blocks are C++ specific so it wouldn't make any sense to run our plugin on C code. So we speed up the compilation a very tidy little bit by not running this seamlessly on C code. So what that does is some magic introverts, Deckel, which is actually starting off this recursive S-visitor stuff. And what the recursive S-visitor does is you have this syntax tree where every kind of structure in your program is represented by node. And you just walk down this tree node by node and for every different kind of node there's a different function that can get called. For example if you have an expression, if you have A plus B in your code then there'll be a node for the plus expression. And it's of a certain expression class subtype. So you have for example a visit binary plus expression thing. Or you have a visit call expression if you do a function call or if you do a structure call or if you do a throw statement. There's a visit throw statement stuff or a visit if statement and also for all the declarations in the code. So if you call the function the function that you actually call is declared somewhere. So there's a function decal function declaration node in your syntax tree. So all these tiny little things get visited one by one. And to learn about what kinds of visitors there actually are, what kinds of nodes there are that will get visited. The best way is to look into the clang include files because they list all of these nodes. A different C++ class represented as different C++ classes in the compiler innards. And you can just look into these header files to get an idea what the names of these classes are and what's available to actually visit. They do differentiate between the properties subset maybe and the stuff that is C++ specific. So some of the expressions for example are only in the CXX include files. So the throw statement for example would live in the statement CXX because the throw statement is specific to C++. Now yeah many things we could visit but what should we with it? Again compiler to the rescue. Clang has a nifty feature to not generate code when you ask it to compile but to output this abstract syntax tree. And it does that in a somewhat arcane manner, lots of detail. But among all this detail is the stuff that we're interested in. So you create a very simple C++ source file so that you don't get lost just one line. What it does is a structure declaration with the definition of the destructor that does have this arcane function try block. And what falls out of asking Clang to add it to dump the as the extra syntax tree is some blah blah. And when it's destructed decal aha that's this part here and directly underneath it. So it's only child in the tree is the CXX try statement. Normally you would expect the function definition to be followed by a compound statement that is the terminology for a block with curly quotes around it. So every function declaration normally is a block following it but in this special case the block is this part that is hidden inside the try statement. So aha we know now we need to visit CXX try statement directly inside of CXX destructed decal. So because there's other try statements that are used for these normal try catch blocks everywhere in the code which we don't want to catch. So we first need to make sure we're inside a destructed decal. So we don't actually add a visitor on the CXX try statement but we first introduce a visitor or write a visitor that will go into or that will fire that will visit the destructor declaration. And then there's again some boilerplate that you need to write into every visitor function that you write. First thing is to if this declaration is at a code location that is not interesting for us because it is from a standard header for example. And there's so much dirt in these standard headers for better or worse and we don't want to warn about any of that or even look into any of that. So we have some other functions to find out this declaration is from where we don't want to look and then we just bail out early. The other important thing you always need to return true from these functions because as soon as you return false from any of them the whole traversal of the tree stops immediately. So the rest of the program is not even looked at. So that's an easy mistake to make always return true in these verminches. So next step is these this because the best visitor doesn't use virtual functions as for speed reasons. So it uses this curious pattern where you pass in the class itself. And what it does behind the scenes is a very very fast way of dispatching on all these all your functions that you write. The downside is if you if you make a mistake in the function name then that will not be directly called by the compiler. But what will happen is that your function just never gets called if you if you use a name that that doesn't match anything. So a good idea is when you start to write a visitor to make it fire on every occurrence of that. And if you then run it over the code then chances are that you do catch many of these because there's for example many destructive declarations in our code. And if you get many warnings then then you know how my function is working. If you get none that's typically a sign that you've made a mistake with a declaration of your visitor. So what we do we just add for every destructor that we don't even work due to location. We report some warning, destructive found. We give it some information about the location whether it is so if you run that live. The cell that I didn't build before and I added just this state of the plug in now. So for every occurrence of a destructor it now yells a destructive found and gives you a nice carrot at the start of the destructor function name. And then some wiggly underlight for all the all the rage. There is this nice feature of it was the wrong key. That's the nice feature of giving it one location that is the main location of the source of the declaration you're interested in. And then passing in also the complete range of the declaration so that tells this reporting engine to annotate nicely your source code with this carrot and the wiggly lines. So that's a nice feature. The GCC then copied legally to be par again. So yeah that's now we know our plug and works rudimentary. Now we want to get it to actually do what we wanted to do. What was it? We wanted to find a CXTRI statement directly nested inside a destructive decal. We now need to look into its body. So if this one doesn't have a body because it's only declaration or because it's deleted. Then we can bail out early as well. So we have this one line. But if it has a body we can get its body and all these little things about what can you call on a destructive decal. Like that it has these functions that this declaration of the body and get body and whatnot. You find all that in the Plank header files. So that's the source to always go back to and read through these include files and find out what the features of all these Plank internal classes representing your nodes are. So the body needs to be one statement only. Exactly either this trite statement or the compound statement for the usual case. Another little detail is that LLVM and Plank don't use the standard RITI or dynamic hard stuff from C++ because it's too slow for them. Because they have many dynamic dispatch kind of things going on. So they have their own tech based stuff that is much faster to just dispatch on an enum like thing. And so you have not dynamic cast but their DOOM cast. So you take the body check whether it is a trite statement. So if it is the DOOM cast it gives you a non null pointer otherwise it will give you a null pointer. So if what comes out of there is not a null pointer you know aha I'm in a destructive decal directly underneath there's this trite statement. So yeah this is the occurrence of this function try block. So in this case we yell. So we even improve this report a little bit by no longer reporting on the destructor itself or the location of the destructor but on the location of this nested trite statement. But still in all cases we turn true so that we don't prematurely stop working. And yeah to actually try it out you need to reword that one fix because as it turned out in the end that was the one and only use of this arcane feature in our code base. But there was one and that's always I think that's always the case whatever odd plain plug in your thing about to try and analyze code turn to the LibreOffice code base there's always at least one case that you'll find with that to demonstrate that your plugin is working. So if you reword that fix and then it wasn't in rather perfect that case with that problem so if you then remake rather perfect that one will find. And that's it. No you don't that's another very great and. Sorry the question was for CNC++ all these macros all this pre-processing going on is that lost by the time you reach this abstract syntax tree working and it is in a sense but it is also not in a sense. And so the syntax tree of course is unfolded I'm not sure if there was anything more specific you wanted to ask or else. Yeah that's that's the problem so what happens is that all the macros of course get unfolded for the syntax tree but for every bit that falls out of these macros every character. That then goes into the street the characters that make do make up the resulting syntax tree playing records where that character came from so these locations that are shown that are used to to beautify the reports. These do know whether they come from a macro definition or a macro expansion where you pass something into a macro as an argument and all the way and this one can then also be again taken as a macro argument and stuff like that. So it can get very nested and then very ugly and then but in the end this information is always there for every character that actually does end up as some kind of no if you have these marker only things then yeah it gets it gets ugly if you know where they are. You can match on the light if you always have a marker after say the closing parenthesis of the deck of a definition of a function definition or after the closing parenthesis of the of the declaration then you could match on that. And then I plan to get you the characters that follow that in the source code and you might be able to get that but that's very tricky tricky area. But that's also because of the language and the preprocessor doesn't make it easy to to regain that I'm not sure there's a better approach for these kinds of things purely purely text or in source text oriented might be better approach there than going. Yesterday there was a mini HDMI to HDMI adapter that belonged to this nice building accidentally disappeared so if you guys could check that you don't accidentally have something extra in your bag from yesterday would be nice because we really need to recover it.