 I'm Stefan Bergmann from Red Hat. And I'm enjoying to kind of misuse the LibreOffice code base as a test bed for cool compiler features. And that's what I'll talk about today. So just to get one misconception out of the way right at the start, the compiler wants to be your friend. So even if it's incomprehensible, 15 line error messages at you, what it actually wants to tell you is, I love you. I want to hug you, maybe hug you to death. But that's what it actually wants to tell you. And then you have to treat the compiler away. And you have a good relation, a fruitful relation with it. So C++ compilers, in the case of the LibreOffice code base mainly, have changed quite a bit over the last years for various reasons. The C++ standard itself is evolving, it's getting better. And there are some healthy, different compilers that try to each be the best one in the field. So some healthy competition going on between GCC and Clang, for example, when Clang started to appear on the scene. It's better error messages. For example, it's also better diagnostics that go beyond simple error messages, warnings, the way you can plug new features into these compilers, new tests, that you write yourself specific to your project at hand, testing things that are not generic, but rather specific to LibreOffice, for example. And there's also the dynamic side of things that the compilers introduce features that at runtime do checks and find even more things that are feasible to be found through static analysis at link time and build time. And for both of these kinds of checks, static and dynamic links, beside the compilers themselves, there's also various tools, standalone tools that offer such features with different feature sets in various cases. And some of them are also built on top of the compilers. So I'll split this somewhat into two static and dynamic parts. I just started off with a kind of a success story that we had in the LibreOffice code both with the leveraging, the tools that are available to us, this override feature. So in C++, you can have virtual functions that override other virtual functions declared in the base classes. And originally, C++ didn't give you a clue for a given function declaration in a class, whether it was actually intended to override some other one or whether it was to introduce a genuine new one. Maybe with the same name as another one, but with a different set of parameters. So when it comes to the LibreOffice code base, there are really huge and also very broad hierarchies of classes. And if you look at one instance of a function declaration somewhere in that hierarchy, you have no idea whether it will be feasible to, for example, change one of the parameters, change its type, and whether then you would break all the code because you would introduce a new virtual function that does not override, no longer override its parent and is no longer overridden by its subclasses. So we always stay clear of cleaning up things like that. And rather, try to work around it than to dare introduce a back there regression. So comes along C++ 11, which now has the feature that you can optionally mark an overriding virtual function with a keyword override. And at compile time, it then does not actually override something from the base class because you mistype, for example, one of the parameters and actually introduced an overload instead of an override, then it will produce a compile error. So this is a rather easy thing to implement. So no wonder most of the compilers that are around today do understand that feature, mostly because in the case of the Microsoft compiler, it has some own idea of how the feature should actually come back to that. But the feature is optional. So you're not forced to write it onto your function. So we're back at square one where we started as long as it's optional. We don't know whether it is missing in a place. So we still have no idea whether we can change the actual function that we're staring at. So I wrote a plug-in for the Clang compiler to actually complain or produce an error if an overriding virtual function is not actually marked with an override. We cannot use the real override keyword but we have to hide it behind a macro cell override for the few compilers that still don't understand it, which is mostly the very old GCC on the very old Mac base line, which we'll drop anyway. Kevin? I just wonder, will it make sense to actually have one more kind of keyword that would say, I am a method that I would like to be overriding? The new case is that I have five parameters in some function and forget to, of course, do not add the cell override because I've written that. So if we knew that all the functions that are supposed to be overridden would have to be also backed by the overridden, then it would help us to get these cases out of the way. If that one needs to be overridden, then you can mark it as pure. Sure, sure, but in some cases it makes sense not to make a pure, just because it's like an empty implementation makes more sense. Yeah, but you can make a function pure, and also have an implementation for it. Oh, OK, sorry. So that case is covered. But there is other cases. There is also a final keyword, and then there was interest to also have the negative of that one. So a final means you cannot override that one, but not final will be the same one as needs to be. I think I'm not sure that once was a proposal after this went into the standard to also have the negations, not with an exclamation mark, not override and not final. And I'm not sure what became of that proposal and whether it actually made any sense, but somebody must have had similar thoughts. But I guess that it would be covered by the making it pure. But if we find uses for something like that, we could, of course, write a plugin to decorate the functions with some macro that we invent and check for that. Actually, the pure version of our case, I was interested in. OK, we can talk about it later or whatever. OK. Yeah? Would it make sense to set of write on the cell underscore overwrite in some place just to write overwrite? And if the compiler doesn't understand overwrite, to define it as a macro with an empty contents? Yeah, that would have been possible as well, but it's always kind of cleaner to make it sure that you're actually using macros. Because using macros is confusing already. So better be very clear about what you're doing there and not try to be too clever. But yeah, it could have been done that way as well. Yeah, so with that plugin in place, this plugin was written in the case that it also rewrote all the places that missed it because that was a really huge patch. Nobody would have wanted to write that by hand. So touched virtually every file in the code base and added these cell overwrites into various places there. And with that out of the way, we had a status quo where every overriding virtual function was now marked with that cell overwrite. Newly introduced code would bark if you forgot it. And if you actually looked at one of the functions and wondered, does it overwrite or does it not overwrite, you just need to look at the declaration now and see whether it has the cell overwrite keyword next to it and now can make guided decisions on that information and then actually change things that you wouldn't have there changed in the past. So, some more information on these client plugins. Well, already had a lot in talk on it earlier. And here in Lubbush, actually did much of the work Lubbush started the original framework that we're using there, all right. Many tests or many plugins and writing such a plugin is really easy because all you need to do is copy an existing one, place it in this one directory and the next time you build, it'll be built automatically and include it in the compiler. So all the source files that you then compile will include that check. And one of the features of Clang or that made Clang popular is that this interface to write plugins is rather easy to use. Compare it, for example, to the GCC one which is out there much longer but always had a kind of reputation. It's been very hard to write plugins for it. So this one has a learning curve that, of course, has some form of a curve but shouldn't be too hard. And especially with the ones we already have shouldn't be too hard for each of you to write one more. Plug and think of something that you wanna find out in the code. For example, examples of what we have is, for example, that we, for all these sal info sal worn stuff that needs some string tag, what era of code it belongs to, that there's a plug in the checks that all these tags are actually consistent and not mistyped and not everybody invents new tags but instead looks at the surrounding code which tags are already in use so that there is some consistency for that. Then there's plugins that help in changing from this old sal rule type def to in places where it can be used, actually use the C++ Boole. Bad mixtures between sal rules and integer literals, for example. Other cases also like this replacing of string arguments with const references and lots more like that about 20 last time I looked. Once we get too many, we'll maybe slow down the compiler too much because each of these plugins in effect is a full traversal of the syntax tree. So when we have too many of them, it might slow down, it'll just be the compiler and we'll need to think about optimizing the framework or throwing away the ones that are outdated or move them to the attic or something like that. I'll for now try it out. The next step then up the ladder, as I said for the sal override thing, you don't want to add that in all these places. So the cool thing is these plugins can also magically rewrite your code while they operate on it. This is more tricky to write and get right and get working especially in the phase of macros where you want not only to change occurrences in arguments to macros, but also in definitions of macros or you don't want to do that or the case that of course will not be covered is where part of the thing you want to rewrite comes from different parts of macro arguments that are assembled into something where there's no single place in the code that you could change to have the same effect. You can even run multiple of them in parallel. I had that with the south world changes and that gives you a really spooky feeling if you start your compile and then see how it changes to source files. When you compile again it even still works and it didn't use other garbage. There is one other approach that people, some people in France, is the Scotchinelle thing. It looks very promising. The idea there is to rewrite patterns in the code declare these patterns in the way of patches. Like for example, we wanted to get rid of all superfluous parentheses around return arguments. You could write a patch with like a typical disk patch and then run, that's a standalone tool, run that one on the code. But last time I tried at least it didn't understand the revolver C++ syntax though, so it didn't produce useful results there. I think they initially aimed at the Linux kernel C and are actually used there. Maybe they go to a state where they can be useful for us too because that's a nice idea if you think about it of how to specify in a very high level way of patterns you want to rewrite. As I said, there's also standalone tools out there, quite a lot of them, quite a lot of them that we are, or some that we, people in the community, actively work with and turns out that all of them find not completely overlapping instances in the source code that are nice to clean up. The thing is that comparable to how we cleaned up all the compiler warnings in the source code which was a heroic effort to get the whole base warning free, but you need these heroic effects to make these tools actually effective because if you still have thousands of warnings they regularly generate, you don't find three new warnings they generate that you newly introduced with your new code. So to have these tools effective, you need to have a clean slate at one time to start for a moment. For example, for Coverti, as Quedain told earlier, we have now a state where we have very few remaining old warnings, so can't start to use it now. There's also the CBP check, Julia Nabe, regularly produces patches that fix things in there. There's a clang static analyzer, a person who never looked into that, but it's also useful. Next thing on our plate to look at is C++11. As I said, the override features one example where there's useful stuff that we could actually use in our code base, not all of these features are of the kind that you can't hide them behind macros for those compilers that don't understand them. So what we need to do or what we plan to do is bomb our baseline requirements to something where all the compilers we then will use have a sufficiently large subset of C++11 available. What that means is that for the TDF baseline Linux builds on CentOS 5, that one is still on a GCC compiler that would be too old, but there's some tooling to compile there with a newer compiler and the missing parts of the C++ are statically linked into the resulting executables and libraries so that you can even run that on an old baseline machine. So that this solution for that one from Mac will just drop the old pre-10-8 machines and be on an Apple claim there that is new enough. The poor one is a Microsoft compiler, even the MSVC12, it does have a number of C++11 features but there's still some that it does not get like for example deleted functions, it does not support, which is a shame because it's not really something that should be too difficult, especially with the easy parts of it. It does not have variadic templates, it would be good to have functions that forward all their arguments, arbitrary numbers of arguments to some other function like we have some kinds of rapid functions which that would make, we could make good use of that. And also for this override feature, it has a very funny bug that if you also include the inline keyword in your declaration, function declaration, then it'll complain that you cannot combine inline and override, which is total nonsense but that's one of the things where you have to tweak your code. The other thing that we'll need to think about is what to do with the URI, the UNO interface for the client code. So if some external developer wants to write an extension that, and wants to write that on an old baseline machine for example so that it is guaranteed to work with all Linux Debra Office instances, then if in the headers for the URI and we would make use of C++ 11 features, we would force that person to also update to the specific compiler for example. So one idea might be that in the first step, we'll leave all the headers that make up the URI out of pimping them up with the C++ 11 features. Maybe we should just start in the coming weeks of using in the meat of the code above the URI features and then they'd see the tinder boxes that break and are on two old compilers still and update that. So our idea is to for the 4.4 Debra Office version to kiss goodbye the old compilers that are no longer capable or not capable enough and go ahead with making use of C++ 11 features like for example, lambdas, it's a useful feature to clean up lots of places in the code. And when you wanna do actual C++ 11 development, Scott Myers is active as well, having written a new book on all the shitty corner cases that is still completely arcane and broken even in the new standard. So I recommend reading that before writing any new lack of code. Some dynamic features. There is one cool dynamic feature that grew over the last years and first started out as in addition to Clang standing from Google and then also got added to GCC. So that's something that is available in more or less complete or evolving states in both of these compilers at least. It is these sanitizers that at compile time, the code is instrumented with checks that at runtime then will tell you if you do something illegal. Like there's one sanitizer for things about addressing for out of bounds if you write beyond an array or read beyond an array, a static array, a heap array, a stack array, can even detect users of stacked functions, stack variables after return. So when you return a pointer to a stack variable and then modify that afterwards, it can find that out. It uses suitors amount of shadow memory to track what actual memory is in what state. So it requires quite some run. But all of our, if you run make check and run all of the checks, it is completely clean by now so passes. There is also a leak detector included there and that one does not pass yet. So Moggy had some, did some work on it cleaning up some of the leaks but there's still lots left that people can pick up and look into. Another of these sanitizers that are available there is the undefined sanitizer that tries to track or find occurrences of all the whenever the standard says that some operations undefined behavior, it'll try to track those down. Like when you have signed integer overflows, which is often the case when you do computation of hash sums for run all that sets, then lots of the functions are using sign instead of unsigned integers. So I went into many of the places also in the external coding and added simple fixes there to switch to unsigned for which overflow is well defined. Other thing that it finds is if you call a function through a pointer to a different and the function argument types don't match up. I found lots of cases there. Most of them were harmless where some function was supposed to get it to receive void pointer but it was to receive a full pointer but then in the other case what static has to the full pointer anyway afterwards because it would always be called with a full pointer. The one cool thing that it does is that it catches all the downcasts, the invalid downcasts where you do a static cast from foo down to bar but in reality at runtime what you have is not a bar but is only a foo. So something a dynamic cast would catch but dynamic cast is much more defensive so many places that think they know what kind of object you have and that it is safe to cast down, they use a static cast or a C-style cast and this one finds all the places where we do the wrong assumptions. And I'm currently going through all the CPP unit tests and trying to clean that up but there's so many cases in the code, error reports in the code that this is really a hard laborer from the about 180 I have only covered like 150 by now. There's also some technical problems involved there with making sure that the RITI that is necessary for tracking down these wrong casts for example that is available despite our disability hiding of symbols at dynamic libraries and it looks like we need to sprinkle a lot of the code with some new macro that I introduced to for the case that you're compiling with this jacker that it then emits symbols with higher visibility than in the normal case. There was also a problem with running the unit tests against the sanitizers and clang because of missing symbols but with the help of Mikael at the heck night I finally got that one fixed. So one more case where it easier to run these tests out of the box now. And yeah, that's about it. Time's up, no questions. Thank you. Do you have questions? Sure. Do you know MSBC 2013, the more complete single cluster? It has somewhat more but also not complete. So what I base this on is the clang people, they also decided to move forward in UC plus plus 11 and they settled on some clang and in GCC versions MSBC 12. So I thought that would be a sweet spot. I outsourced so to speak the decision making and hope that they would have made good decisions. There's also some magics on some Apache website where they do claim, for example, that MSBC 12 does support the variated templates but it doesn't so. And I saw on the clang code they also have this problem that they still cannot use that feature even after their switch to C plus plus 11. So I assume that the 13 one isn't there yet either.