 Hi everybody, sanitizers. Who doesn't know what that would be a talk about? Everybody knows, that's good. Because what I want to talk in this next half hour, with dramatically less than 320 slides, is about the idea that we can build LibreOffice with various sanitizers enabled and what benefit that brings to the project, but also to each one individually who tries to actually do that. So who of you does, who of you did try to build with sanitizers enabled? Encouraging. By the end of the talk, I want you all to be excited to try it out. So the idea of these sanitizers, I think they started with Google quite a number of years ago and then they first implemented it for Clang and then that got also ported to GCC. So experience there should be similar, so you can try that with either of those Windows, Microsoft compiler does not have that, but at least the Linux ones. And I also only do it on Linux, so I'm not sure how out of the box we would have an experience on Mac, for example. I never actually tried that or much of it. So there's various of these sanitizers that you can enable at compile time. So the idea is always that you tell the compiler to, when it compiles the code, introduce more extra checks into the code that will then at run time find interesting things about your program, where it, for example, violates the C or C++ standards regarding undefined behavior. So that's one of the two classic sanitizer things that these sanitizers check for. Anything that would evoke when it is executed at run time undefined behavior. But that also means that whenever you want to try these out, you need to first compile the whole project with these extra compiler switches so that it does instrument your code. So that's different than, say, Valgrind where you can, after the fact, just quickly debug something. So if you look at the undefined sanitizer things, there is quite a large scale of things that it finds. There's from the very benign things that, for example, that's a thing that occurs quite often in the code and where we need some trivial fixes to get beyond the sanitizer warning to silence the sanitizer warning that isn't that interesting in this case. There's a mem copy going on and you have two pointers and you have a size and often enough the code leading to that mem copy. If the size is zero, then one of these pointers might well be a null pointer from your logic, which doesn't hurt because if you're copying zero bytes, then it doesn't matter that one of the source or the target is null. But the standard says, the C standard says that functions like mem copy must never be called with a null pointer. So the sanitizer here dutifully tells you if you run into such a case that you're violating some null pointer, null argument specifications that are spelled out for these functions. And so the easy fix there is always that if size is zero, then you just skip the mem copy because you're not doing anything and then it doesn't matter if you would have touched a null pointer there. And the compilers are always trying to exploit every kind of undefined behavior. So you need to think of it like you're trying to do your code as best you can and there's the compiler waiting here on the side and waiting for you to write anything that evokes undefined behavior and then grabs all your code from you because then it can prove that if you run into some code that doesn't define behavior, your program is exploding anyway. So it can remove all that stuff that can't happen because if it would happen, it would go to the undefined behavior. So always watch out for anything that could evoke undefined behavior because your compiler is only waiting to act on you and throw all your code in the bin and do nothing in the end. So even for trivial things like this one, it's always good to clean the code up even for these cases. And then, of course, there's, as I said, a large spectrum of undefined behavior on the other end maybe is these things where you have integers of fixed sizes, signed sizes that do then tend to overflow in certain situations. For example, in Ryder, we have this idea of this flying frames that are flying around somewhere and Bjorn might know that better than we need to be anchored in some way but they need to get out of the layout algorithms so they are moved far away somewhere at a very large position. And that very large position used to be that long max thing and these values that you're computing with are of long type. But if you then start to move these around a bit further then they might fall off the edge and overflow and cause undefined behavior, the computations. So what we did there at least a few years ago was to define that far away thing where they get moved to to be now a 32-bit integer max thing so that on 64-bit, at least, we are still computing with long, long as 64 on Linux 64-bit. So these values are still very, very large but not so large that they would overflow 64-bit computations. On 32-bit, they would still overflow but at least for the machines of the future we don't get into these issues anymore. Any other ideas of undefined things that might be caught by these? So unsigned integers, for example, don't cause undefined behavior when they overflow, that's defined to wrap around. So there is an additional flag you could enable to also find overflow of unsigned integers but that of course would give you many false positives so isn't that effective to check for. And then there's other cases like you're shifting to large amount or other undefined behaviors. If you have a function pointer and you're doing a call through that function pointer and you're calling, you reinterpret it, reinterpret it to a function pointer to some other function pointer with different arguments then it will warn about that or floating point division by zero or integer division by zero or when you try to cram a too large floating point value into an integer value that is also undefined behavior. So among the integer types you get this truncation thing if you want to cram a large floating point into an integer that also is not allowed and is undefined. And there's a number of these as well especially in these areas like these newer drawing layer things that operate on floating point positions, XY positions by the older tools stuff still uses the integer variables and at the borders of these two where they come together there's occasionally cases where we get warnings from these. So next up is the address sanitizer which warns about quite something different. It warns about any case where you're trying to access memory that you're not supposed to access like. You have an array of some size and you're trying to address out of bounds for that array or you have some heap memory that you're already deleted probably in some other thread and now you're re-accessing it or stack use after free also where you have some point or some reference to some local variable on some heap frame and you return that and then later on you pass that to somewhere else and that dereferences it and just as I did these slides I ran into one of these so if you move in the sidebar you can move the slides up and down and if you move them too far away then you ran into a stack use after return before somebody the other day had changed some boost bind to some lambda function and then passed in the pointer the coordinates as a reference where it was a reference to some other place on the heap on the stack, sorry. So this one especially is very similar in what it can find to Valgrind but again how it finds it is completely different so what Valgrind does is you have a binary you have your compiled binary and then Valgrind interprets that binary and tries to reconstruct where are you addressing memory and is that a good or a bad addressing of memory and with the address sanitizer what you do instead is you instrument your code up front so that every access to memory that happens in the code whenever the compiler writes out a load or store instruction it adds code around that instruction to check whether that instruction whether the access to that address would be valid from what it knows about or your memory is laid out what this also doesn't find compared to Valgrind is uses of uninitialized memory so when you read from a variable that you haven't assigned to then it doesn't detect that there is some other sanitizer for that the memory sanitizer that is supposed to find all the uses of uninitialized memory but the drawback with that one is that you would need to compile your complete software stack with that so with the others you get away with just compiling your own code and libraries like the C library if they are not instrumented of course you don't find issues in that code but it doesn't hurt you still find all the issues in your own code but with a memory sanitizer you would need also the system libraries to be instrumented and that's a big drawback to get that productively working so I never tried that out in earnest there is supposed to be a G-Lib C library version that is available but that's a place where it starts to get difficult to get these things set up but in some places you still get benefit from the sanitizers because if you have an uninitialized value it is often neither zero nor minus one but some random value and the address sanitizer also initializes any memory to preset values like this 160 for example so if a Boolean value is read that is uninitialized then the value that is actually read out of the byte is 160 and that can't be a valid value for a Boolean so the undefined behavior sanitizer kicks in and says this is an invalid value for a Boolean it also does that for enums so if you have an enum with just three values then it knows that only zero to three are valid values for these so in some of the places you still get the benefit indirectly of the sanitizers telling you that there is an effect of you accessing an invalid memory, uninitialized memory one more sanitizer that's out there is the threat sanitizer that's supposed to detect data races but if you unleash that on our code base just like if you do with Halegrin which does the same for the Valgrin suit you find lots of them and nobody bothered to clean these up so there is often it's harmless reports that for example you have different code paths that can lock mutexes in different orders and that can of course lead to deadlocks but to clean all these up that don't in practice lead to deadlocks would be quite some work so there's lots of work that would be needed to spend upfront to get this work it would be great if we had that working fine so that new issues we could find there but would need somebody to step in and do the upfront work also with the address sanitizer what you can also check with that is leaked memory so by when the process terminates it tells you a long list of things that would leak again to get this list to zero would mean lots of work so luckily there's an option you can set for the sanitizer a runtime option that tells it to not emit all these leak informations so we do have leaks but it's mostly harmless if you exit to have leaks so we don't bother that much about them and all these sanitizers and all their options are all very well documented at that URL there so if you want to try them out so performance of this thing as I said compared to Valgrind Valgrind is the the analysation of the code and the instrumentation the interpreting of the code at runtime and is known to be quite slow these sanitizers are reasonably fast so for example it took me a while to get this presentation up upfront but now that it's up it works reasonably well so there's no I mean it's not that much to switch slides I try to edit the slides with this and that was a bit very slow the cursor movement for example I think it's worse in Impress than in Writer so it must be something in the edit engine that we use in Impress and Calc that slows things down there so there's like a constant multiplier that affects the speed of the executing it's like on the sanitiser documentation pages they talk about like two time slow down it always depends on your workload and what I often do is when I have some back to reproduce or something to try out the sanitised LibreOffice on my Linux machine most of the time it works fine and before you get to the actual back from the back report you run into another sanitiser reporting so it's a double win you fix two bucks for the prize of one in these cases and also the make check I do it on that box make check also does lots of issues, lots of LibreOffice processes to test the various code paths and that works, it doesn't run out of memory or something so ASAN needs shadow memory to keep track of what memory areas you have reserved you have in use so it reserves terabytes of memory but doesn't actually touch them so with Linux over committing it all pans out fine and on a 16 gigabyte machine it works reasonably to do an eight fold, make check eight threads in parallel, eight processes in parallel and how do we use it so we do have one of the tinderboxes does I think daily by now daily builds that most of the time run green sometimes run into issues sometimes run into random check failures so we do have some of the checks that are apparently picky about timings and because we are running very or rather slow in this case some of the notorious tests fail more often on these than on other machines but it's a success story overall so we do find issues with these in freshly written code the fuzzers of course are also very happy about the sanitizers because fuzzers need to decide I have some input here does that exhibit good or bad behavior in the program under test and one notion of bad behavior is just crashing zack folding another good indicator of bad behavior is if that code runs into undefined behavior or there's some address operations that don't lead to crashes but are otherwise bad and are found by ASAN so these use these what I started the other day is to use all these bug documents that we have the equivalent also uses for other testing and pipe them all through these convert to for example convert to pdf functionality of office so I take each of these thousands of documents one by one pass them through convert to pdf and see if there's any issues with those and I find quite some of these so I'm not yet finished even with the first round of piping all documents all test documents through there or bug documents and of course as I said develop a dog footing I use it myself quite often for just what I would call my office use I'm not a user of office and when I write a slideshow like this I stumbled into issues there because I never tried before but when I reproduced a bug for example then I always very often use a sanitized thing and I would encourage you all to try this as well there is some stumbling blocks to trying it yourself as I said clang in GCC I only use clang I'm not sure how well it works with GCC but I guess it's about the same situation there we not yet but in a few days I hope we'll require clang 9 which is the latest clang version that is not yet out so they are at release candidate 4 by now they wanted to be out in end of August the good thing about clang 9 is that it contains some improvements that makes things on our end very much easier so the problem is that the undefined behavior sanitizer needs to have access to the runtime information the RITI for many types RITI symbols and there are some two schools of thought how to compare to type informations to RITIs and the way that clang does it needs to have more access to these symbols from outside of the library so we needed to compile with some cheesy amus compact flag but that has problems in other places and now in clang 9 we fixed clang to also emulate the GCC version of how it compares RITI pointers and that no longer needs that amus combat thing and that no longer needs lots of this if def code that we have still in our code base I wanted to wait until clang 9 is officially out and by that time I'll probably throw away that old code so if you're not living on trunk but I think most of the people who are using sanitizers or the LO plug in any way are living on clang trunk so they don't experience that much but that might be a stumbling block for a short duration until all the distros pick up clang 9 then anyway and of course we have not that optimal documentation because as always when it's just a thing that only a few people care about then nobody bothers to get into writing documentation so what you need to do to try it out is set the CC and CXX variables you put into your autogun input to just add these whatever sanitizers you want that sets some flags internally and configure which are maybe strangely named so whenever you run into one of these disable or enable runtime optimizations checks in the code or in the configure that's where we check our CC or CXX variables whether they have sanitizers enabled and then in some places in the code like for example when we then start Java we tell it not to Java in the process then we tell it not to do jitting because the jitted code like for valgrind is hard to analyze for the sanitizers and would then probably cause issues there's also another issue with the undefined symbols so these whenever you compile a library with the sanitizers enabled then it calls into a sanitizer helper library which is linked into the executable so the individual libraries are not linked against that the executable must be so if you use sanitizers for your daily work have them enabled in your daily builds and you add some functionality to a library that needs a third library that you didn't link to against this build your sanitized build will not tell you only when you then upload that to Garrett then the other compilers will start or linkers will start to complain because you have an undefined reference and we need to disable these undefined reference checks due to how the sanitizers work one thing that used to not work and thanks to no pointing it out no works is that you can build your code not only with enabled debug enabled dbg but also with a disable so the full optimization of the code then of course the debugging is less effective because your code is got restructured by the compiler but there was one sanitizer one undefined behavior sanitizer that complained about more things when it is running in optimized mode and you need to set to environment variables what I forgot there is to tell it to not do the leak detection as I showed on a previous slide so there's some more things to set up and that's it we're running out of time so now ask me to do this and I added a little Easter egg in here and once I click on this it I added a dummy integer division by zero in there and I trigger that when I click on the exit from the slideshow thing and then it shows us so the bottom part is here it starts runtime error division by zero so this is proof that the slowness at the start of the presentation was because this is a sanitized build actually doing its work thank you