 Okay, I'm sorry if anybody came here to I've still got the wrong title up there. I put the wrong abstract in in Conference notes This is actually going to be an update on my last talk which was about indeed about global analysis across a large code base So what I'm going to be talking about largely today is the things I've been busy with in clang plugins since the last time I was here to give you some context I largely write clang plugins for LibreOffice and and that means that I build little pieces of code plug into the clang compiler and make use of the clang AST APIs and they enable us to implement various verification checks and To do some large-scale rewriting across our code base so that these plugins have in the past enabled us to clean up things like the fact that we've we started our life with about seven different string classes and We've done pretty well. We're down to about three or four and It's enabled us to clean up all sorts of little bits and pieces along the way And it's enabled us to move to more modern usage of C++ My current status some people will be happy to hear that I have in fact run out of ideas My pipeline is rather empty which means that there's not a lot of large-scale changes Coming coming down my pipeline the the flattened plug-in which is there to reduce the indentation largely of code by reorganizing Methods so that they do early exits still has a fair number of stuff coming But I don't have any new plugins coming. I Track clang upstream on a fairly regular basis that occasionally the plugins need tweaking in order to accommodate mild changes in clang clang's actually become really stable and Plug-in APIs are actually really nice to use. They're pretty reliable these days occasionally I still run into issues with source locations being slightly out Ranges of source being being a little bit iffy, but 95% of the clang stuff these days if something goes wrong, it's typically my fault Okay unique better is a plug-in. I've just largely finished Pushing changes for The point of unique putter was to find places in our code base where we could productively use the standard unique pointer smart pointer class The point of using unique putter is that it makes obvious something that is otherwise just implicit in your code The fact that a given class or a given variable is the exclusive owner of another object So my plug-in started out using my favorite onion peeling strategy Onion peeling means I look for very simple approaches and then I run the plug-in across the entire code base So I look for very specific patterns. So my plug-ins thought it out looking for patterns that look pretty much like that We class clearly is deleting a member field that it owns itself in its destructor And I ran that across our entire code base of 10 million lines look for stuff fixed it And then I extended the pattern matching and I've been gradually making the pattern matching more extensive I've looked for more more different patterns of owning and deleting pointers and As I as I've as I've kind of cleared out one set of changes in my in our code base I've then made the pattern matching a little bit more flexible I've got to the stage now where the pattern matching in fact is so flexible that It points up a lot of false positives, which means my plug-in now has an awful lot of manual exclusions But I consider that a worthwhile trade-off I don't aim to make these plugins Completely robust they're not meant to be absolutely perfect plug-ins That is just in my opinion largely a fool's errand C++ just has Fastly too many corner cases if you want to make a plug-in like this that's looking for issues completely robust You will find yourself on a hiding to nothing exploring all kinds of weird Subparts of the C++ AST and macros and all sorts of other weird stuff So I don't aim to do that. I just aim to get stuff. It's useful for see for Libre office that being said these plugins are there the source codes if anyone wants to So try and extend them or upstream them or do something else with them. You're welcome to I'm always happy to Give assistance But I have deliberately not tried to upstream these to LLVM because I know that They will inevitably come back to me and say but what if and what if and what if and they're right It generates false positives, but we're happy with that So we just keep trucking Method cycles is a plug-in that I that I fairly recently wrote and I'm very proud of this one Took me a long time to get to the point in time where I understood the claim AST And I understood C++ well enough to be able to write it what it does is it constructs graph over our entire code base of Of method calls So basically I have a map of pairs from source method to destination method will actually from source function to destination function And I remap it messages lightly So if you call a virtual method it walks up the walks up the hierarchy to potentially multiple pairs and Tags all of those with calls and then we generate an entire graph This takes two or three times longer than my normal runs do it generates 40 odd gigabytes worth of log file data And then takes roughly half an hour of processing with the Python script to spread out results It only found about seven things because I have other plugins that look for unused code as well But it found seven things that my that my unused code plug-in was unable to find because they were cycles the methods were calling them each other in a nice little loop much like the two loops up there and Consequently the dead code has been had been present in our code base for a long time And it hadn't been found. So I was very happy with that one Unused fields is a global analysis plug-in I've been extending it lately to look for fields that are unused in the sense of they're not generating any useful work So for example an example up there. We've got a standard vector. We're pushing information into it But other than that we're doing nothing with it My normal analysis don't find this because you are in fact we are in fact calling a non-const method on it So we are actually Modifying the object But other than that we're doing nothing useful to it surprisingly There are quite a few Incidents of this in our code base, which I suppose is not a surprise when you've got 10 million lines of old code You find a lot of stuff. So I started with this idea and I extended it to a variety of other calls on various STL collection classes and Funnels fun as a bunch of dead code single well fields is I've extended that one as well to it used to be only for Fields that we declared on classes now. It looks for any statically allocated any study of values, it looks for Kind of primitive type values. I ints char pointers those sorts of things It also looks for our string classes and it looks for places where those things are only ever assigned a single value So in which case it's not really mutable data. You can either be declared const typically or it can be inlined into the relevant modules other way, it's I'm effectively extracting Latent information and making it concrete It versus float was not my idea at all a guy called Mike again ski Very very interestingly pointed out that code where you are comparing an integer and a floating point value can never be true And it's generally indicative of somebody making Mistake in this particular case. He found that mistake down and I believe it was our drawing ML import code And so we then ran that using a clang plugin. That was an interesting case We wrote the plug-in and then we had to park it It was using an API called evaluate as float inside clang and at the time evaluators float would throw its hands up in the Air and crash on about half of the classes in Libre office, so we parked it and six months later I tried it again and learned the whole clang was fixed So that works quite nicely only found a few places, but it's nice to see the clang is moving so fast I used inam constants I initially wrote this to look for inam constants where you've got an inam and One of the constant one or more of the constants is not touched at all. That's relatively straightforward case So I cleaned out all of the the dead cases where they made sense to do so again, there's an idea from Mike again ski and then we started looking for places where I Had an idea in my head that there was a read write Analogy here as in some cases you read information in some cases you write information And if you've got right only information or read only information, that's generally indicative of a some kind of dead code But it took quite a long time for me to find a mental model in that I could use to analyze the AST So in this context right means we generate and store new information So if you've got right only code you're storing an inam value into stuff But you never ever look for it Read means that we just check for the information We never but we never store it Now as it turns out right is better to find than read Which means I've actually managed to use the right analysis to find useful dead code read analysis, however It largely spits out too many false positives mostly because of things like costs where you have an information You have a you load a file and you cast the data type in the file into an inam So you're not you're not really the cost is acting as a As a as a as a as a right, which means you've got a read write field. So it doesn't work very well, but The the analysis produced nice stuff in the end redundant functional cost is a Is an extension of another another another plug-in that Where we're looking for normal redundant costs In this case we often had code that was accidentally Constructing a new OU string From an existing OU string and then passing it into a method in the method in this case was taking a const ref So there was no point in doing that. So the plug-in looks for looks for cases like that Collapse if was looking for odd code where we had Two very small if conditions and they could profitably profitably be combined into a single if condition I Ran this myself and it's not on by default because it's quite hard to find a balance and I didn't want to annoy people Static there looks for things that can be declared constant static ie they can be in which case they're stored in the read-only part of a segment when The code is compiled and they can be usefully shared between different processes Const field was a failure. It was a good idea The idea was that we can declare fields const and thereby mark those fields as As being effectively safe for multiple threads to access However, when you mark fields as const it tends to interact with other C++ features For example, you end up with implicitly deleted copy and movement assignment operators Even though you have non-deleted copy and move constructors So it tends to produce rather weird outcomes here and in the code So that one got abandoned, which is a which is a bit of a pity, but You know you try some stuff and sometimes things don't work out simplified construct was Effectively a cut-down version of a of a clang tidy Check there's an already an existing clang tidy checker does something quite similar, but it tends to be a lot more aggressive So I implemented just the simplest case Where we have things like unique bitter and other things that take no pointer and don't need they are they have default constructors So there's no need to do that We have a class with OU string buffer, which is a a mutable string class internally manages memory dynamically and As you can see people who are often calling the code It's got an APN method to add stuff to it But if you if you're doing it if you're adding two strings together and then a pending it to the string buff You're not really using the string buffer It's way it's the way it's meant to be used you're constructing an unnecessary temporary So the plug-in goes looking for places like that and in fact, I think I did a rewriter here So it automatically rewrote a bunch of that code So we then don't bother. We don't need to construct the temporary and also we're we're losing a bunch of temporary creations Should return bull actually we had a surprising number of Methods floating around in writer which appear to predate that the presence of bull as a C++ data type And consequently those methods were returning into a long and they were only ever returning one or zero And they would be used as bull methods So the plug-in looked for methods that only ever returned the constants one or zero and we converted those to return bull So it's no more more obvious what those methods are actually being used for I then took my unused fields analysis and I extended it to unused variables So I looked for the same similar kinds of things In that case we have a sequence, which is a collection class and it contains Strings a string class and we are modifying it but other than that we're not doing anything with it So that's effectively dead code. So my unused variables more plug-in goes looking for a bunch of patterns like that In our code and spits out complaints like that Then we have a tools rectangle. We have a class in the tools namespace called rectangle and point in size And those had a bunch those had methods that were returning Non-const references, which makes the code a little bit awkward to read It can be particularly when those that when we have code like that You have a point dot X and then you adding to it which actually modifies the object But it's not very obvious that you're you're modifying the object itself rather than you know Modifying a temporary or anything and particularly when it was buried inside other inside other statements It could be quite hard to read so I wrote a Plug-in that not only finds these cases it automatically rewrites them. So we ran that across the code base to Make the code easier to read then I spotted an interesting error case where somebody was Somebody passed a pointer into a parameter that was actually a Boolean and C++ does a silent conversion for you It does a it does a pointer not equal to null conversion for you And that was and it was clearly not what was intended the type system doesn't pick this up So I wrote a plug-in that goes looking for places where you are passing Pointers and other similar auto-converted things Into bull parameters and it's but side of warning cell call was an annotation we had that modifies the linkage it modifies Whether a method is is internally linked into a library whether it's externally visible and we were using it in a lot of places That it didn't make sense. This was mostly leftover code So we went through it and and eliminated a bunch of it and that is what I've been up to any questions Yes The question is about using these plugins on on other on other code bases I think they'd be fine. There'd be a nice extension to the plank tidy set of stuff There's about 110 of them now They probably need some tweaking to run on your code base, but I'm happy to help out if you want to If you want to ask me questions or do stuff Yes, sir No, we see plus-plus does short circuit evaluation. We Lubos used to do that. Does he still do it? Anybody remember? We did do it somebody somebody was doing it at one point. I'm not sure if they're still doing it. Yeah Yes, sir McLoche has upstreamed one of them the macro check that he wrote other than that. No, I haven't upstreamed any of these The ones that are the ones that are most proud of all the ones that are least able to be upstreamed because they are Yeah, yeah Yes, they're part of the LibreOffice repository so anybody's welcome to grab them Yes, sir They're all in if you look in the LibreOffice get tree. They're under compiler plugins. Yes, sir Sorry. No, I have not. Thank you very much