 Hello. Hello, everyone. We have Michael Meeks with us, and he'll be sharing his thoughts on working with ancient code bases like LibreOffice. Over to Michael. Hey, thank you. Yeah, you've helped. I would like the lapel mic if I can have it. Cool. So let me do the slide. Ooh. And, you know, you can't make these things up, can you? Password required. I think this is VLC behind the scenes, doing something awesome. Yeah. Okay, perfect. This is what I'm going to talk about, so I won't tell you what that says. I'll just do it. So restructuring a giant code base. So we have a unique and exciting problem in LibreOffice, which is we have a 30-year-old code base, and it's gigantic. It's eight million lines, or, well, you know, depending how you count them. I mean, you know, it's quite big. And the great thing is that it's got the strata of lots of different problems inside it. Hello, hello. Oh, this is very intimate. Thank you. Yes, I'll put that in here. That's great. So there's lots of different things going on in there. So LibreOffice, you may not know, grew out of the object-oriented mania of 30 years ago. People discovered object orientation, and they were overwhelmed with their ingenuity and cleverness. And they were convinced that this was going to solve all programming problems. It was the silver bullet. And so Marco Borges created an object-oriented graphical toolkit as a young teen in his parents' garage. So you need your parents to have a garage. They're increasingly less popular these days, but it's a vital part of innovation. That and the water cooler. So buy one on Amazon, you know. So he created an object-oriented toolkit, and it was reasonably successful. He grew a company, but communicating with the customers was a problem. Like, they didn't grasp how silver bullet-y object-oriented toolkits were. And so they created some demo applications to show off the raw power of the approach. And the customers were much more interested in the demo applications than the toolkit, as I understand it. And so, unfortunately, many of the original design decisions were laid down on the basis that this was a throwaway application that I'm just going to write for, you know. So whereas if you go and listen to the Microsoft Distinguished Engineer, and he talks about the brilliant people that wrote Microsoft Office and says, you know, the piece table design is still what we're using today 30 years later. It was absolutely brilliant. To some degree, we're cursing the bad design decisions and getting rid of them. So there's a lot of restructuring that's been needed. Quite a lot of it going on. So when you try and restructure a big code base, the first real choice you need is, you know, so what language should we use, you know? And particularly a pertinent today. And then, you know, what platform should we target? And the quote at the bottom is Larry Ellison, who's, if you haven't heard his What the Hell Is Cloud Computing rant, you really should. It's extremely instructional and very funny and probably obsolete. But anyway, so my tip one, I have relatively few patronizing tips, but there is actually, I think, still no silver bullet. So if you meet a silver bullet salesperson, you know, you might want to see, you know, try filing the thing a bit and seeing if it's silver all the way down, you know, for a start. Here are some silver bullets I've seen. I'm joined to programming, of course, a seminal, you know, moment. A Java lets you develop 10 times faster. A garbage collection is absolutely amazing and it's going to solve all main lifecycle problems. C sharp, interestingly, also helps you develop 10 times faster. So I don't know if it's that 10 times faster than Java. Maybe it's only twice as fast. But there's some good things there. You know, the syntactic sugar of C sharp is infecting other people. They start to think that programming shouldn't be a long and laborious process where you type unnecessarily large statements and you feel good because you can parse them mentally. Maybe we could have simple stuff that we can understand. Vala, of course, is, as everyone knows, a modern language with, yes, all the features you'd expect. A wonderful preprocessor-based, well, I don't know, I can't take the mickey out of Vala. But what I would say is that there were people, back in the day I worked on Gnome and I used Evolution. I was part of the team, privileged to be part of the team for a while. And Vala was created and a whole load of people arrived and decided they were going to rewrite Evolution in Vala. The only difference being the language, you know, like so still Gnome, still GTK, still whatever. But what we need to do is rewrite the work of a team of 50 people for years in this new language. So I don't know, it didn't work as far as I'm aware. There's probably some software still there. But anyway, it goes on and on, this kind of craziness. And as such, I think, probably my personal favorite, is another great Larry Ellison quote, which is, yeah, quickly build their version of a spreadsheet we're posting out using JavaFX. So according to the register anyway. Now Larry made a fatal mistake, or someone in Java made a big mistake here, which was to use a name that can actually be Googled and you can actually see the trend on. Now most people, when they create a new language, I mean apart from Haskell, of course, you know, you use a name that, you know, you can hide very easily. Ruby, you know, like Rust. Go being a particularly awesome example of this, this thing, you know, like, yeah, the K trend is just completely flat over all time. Zsquat is just the terrible name. Like, don't use this name. And the reason is that, you know, I've seen the Gartner hype cycle. Who's seen the Gartner hype cycle? I mean, it's been fed to me my whole life, and I've never believed it, right? I feel these guys are complete idiots. They're management consultants trying to sell something to me. Like, why would I buy that? Anyway, Google trend for JavaFX. And you see, you know, there's this peak of whatever, maybe there's two peaks, you know, marketing, we're really getting their act together. This is the recommendation to use JavaFX, you know, just before the trough of total gloom and the, you know, the utter, you know, the suffering of this. And then the, oh, dear, how did we end up with this technology? What are we going to do about it? How can we possibly make it actually work and so on, you know? But it's nice to see that there is actually some slight factual basis behind some of the things that we're told. I quite like, particularly like is, and you can watch the talk. This is from Igor Zeich, who's a distinguished engineer at Microsoft. I'm no language bigger. I mean, I, you know, choose whatever you like, but just don't switch it every few weeks. Yeah, 30 years of C to C++, you know, actually a language that can be innovated. And, you know, since I wrestled with the Giants an awful code, an increasingly good code base, you know, I'd like to try and persuade you that actually existing languages are things that can be evolved and improved. Yes, it's really painful. Yes, you have to go to standards bodies. But actually writing entirely new languages is, you know, quite painful. Well, there's lots of talks and conferences about them. So, yes, I think these are the typical risks that happen. So I'm not a language bigger. Choose what you like, but, yeah. It's the other particularly sad thing as you come to refactor your cadence, but it's a huge driver of duplication language choice. I see perfectly well-meaning people who I love dearly who think that it's important to rewrite something totally in this new thing. And I've seen this again and again with Java, with C sharp, with, I won't go any further. But anyway, so, you know, and if you look at web technologies, there are a whole plethora of duplicate web technologies that to me appear indistinguishable. And the only real difference between them is that one is in PHP, one is in Ruby, one is in, you know, ah, whatever, but I don't know. There's a great stable of these, right? And you think, guys, why can't we? Anyway, this is because I'm ignorant. The problem is, of course, that LibreOffice over a long time has adopted a number of these and cargo-cultured languages. I think one of my favorites is the JavaScript scripting framework written in Java called Rhino that we have, which, you know, well, as you know, the Java is ten times more productive than anything else. So it's very easy to write a JavaScript interpreter in it, and then it sits around in the code. We're slowly trying to write out Java. And not because we dislike Java, it's a perfectly good language, but it's actually everywhere, which is the slight irony of the write-once-run-everywhere message that actually it's write-once, and then try and install this thing of a certain version on all your Windows hardware forever. So yeah, so there's some good work. Actually, funded by TDF and our donors to get rid of hsqldb, which is the last sort of blocking thing for that. So language choices you need to make. Then there's some API choices you need to make, too. So we're particularly blessed in LibreOffice by being a cross-platform app that runs on any platform you might like to think about, including I was thrilled to see Haiku the other day. So, you know, the BIOS client, luckily I know nothing about that, but these are some of the major APIs here. So the Windows rendering APIs that we actually use to actually render LibreOffice on nearly 200 million machines is actually the GDI API, which came out with Windows 1, and it was initially a 16-bit API, and it seems to have made the transition to 32-bit and now 64-bit quite cleanly, which is slightly baffling, isn't it? It has features like not having any alpha transparency, unless you have separate alpha buffers that have been tacked on later into the API, and this sort of thing. We also used DirectWrite from 2007, which finally provided you a way to actually choose which font, which glyphs you're actually getting, because GDI does like a font config, fonts will substitute underneath you in a completely unpredictable way. You have no idea what happens. And there's loads of code in a few of your premium browsers. They try and work out what it is by rendering to a metafile and then parsing manually the metafile to try and work out which font was used for which glyph by the... There's some exciting stuff there. So DirectWrite finally does it sort of right, but doesn't work very well still in lots of places. Ten years on from its creation. So one of the things I like to point out is that, you know, as you restructure, you need to choose the latest, greatest API, but will it really be there tomorrow? It's very difficult to know, isn't it? Of course, in the same time, the Linux world has managed to go through quite a number of toolkits, several of which we support concurrently. So people also like to tell me off for wrapping stuff. You know, like, why do you wrap platform APIs? Why would anyone do that dumb thing? You know, we have a perfectly good API. Like, why do you put this wrapper around it? That must be evil. That was relatively obvious. Why? Not only the cross-platform aspect, but also the fact that it keeps changing. So I think we're getting to QT5 from 2012. I think there's still some work going on to make that work. Is that right, Torsten? Somewhere? Yeah, yeah. So anyway, the platform churns far faster than you can update a multi-million line application to use it, which is slightly distressing, despite all of the work that goes in to actually improve this stuff. Of course, there's a whole form factor thing. What should you be targeting? There's a constant evolution of trendy platforms. Yeah, the audio assistant one is particularly fun. We're looking forward to seeing that. And then, of course, the hardware people are busily at work trying to overcome the fundamental problems of physics and scaling things by making processes much wider and frustratingly insecure. That by improving their IPC, obviously, speculatively. We also have a wonderful thing in recent times, which actually gives us hard numbers. Crash reporting, a statistics Marcus has produced and produces lovely, lovely data. So of the two million-plus crashes that we analyzed, we're getting more threats. Like, it's real. We need to actually do something about threading. Although the gradient is not very quick, this is a year of data. But you can see that the one CPU people who presumably, you know, I don't know what these... I don't know where you find a one-core CPU. But anyway, there are people that manage to do it at the, you know, 3% level. And this is probably not Raspberry Pi's either, because Linux distributions compile this stuff out. This is mostly Windows users. But yes, anyway, there's lots of threads coming along. And the cool kids, you have the 80-core machines are growing, you know, like, you know, this is clearly the future coming here. Sadly, there's cores not threads. There's no threads of the double-watt in many cases. So let's talk about threading. So one of the things we need to do then is to try and use all this hardware that's on the die, but not actually being used. So we've done a whole lot of work there to try and do this. And we do this in a way that is extremely compartmentalized. So it's nice... So there's two approaches we have to threading. One is a very generic UNO-based approach with a common-like scripting framework called UNO. And in theory, it's highly thread-safe and stuff. But the reality is that extremely granular APIs that are useful for scripting, understandable by individuals, not like teams of people, turn out to be really, really bad for performance and really hard to implement, just because threading is really, really hard. And you really want to message pass if you can, but your API doesn't look nice if you message pass. So, yeah, yeah. And so that kind of threading is there, but it was still sort of one-plus-epsilon threats. That was Taras Klexer, you know, nice thing from Mozilla. We just, you know, we do a lot of threading, we do a lot of locking, but we don't get any parallelism. So, you know, all of the downsides and none of the upsides. So it actually turns out to be much more effective to find one piece of code that's at least reasonably, you know, like you could scoop it out of a patient and put some threading around it. So, for example, image scaling for various reasons. We still do CPU image scaling for high fidelity, something or others on lots of places, and it's a beautiful parallelism piece. It's completely safe, and you can, you know, you can convince yourself after reading it for some hours that it's probably going to be safe. And you would think that's funny, like the XML parsing thing that I'll show you in a minute, we also read very carefully, thought was safe, and shipped for years, and then, of course, you know, you find the threading bug in it, which is slightly worrying. Hmm, the rendering restaurant is armed, it's done a fantastic job of turning all of its clever primitive decomposition, CPU software rendering stuff into something that parallelizes as well. So if you have a 3D something intersected with something else in the drawer application, you know, it works quickly, a complicated chart, I guess, will really benefit from that, too. The calcour, I'll talk about in a minute, but again, desperately trying to find something that is a piece you can chop out and wave your hands and persuade yourself is safe in the middle of that. XML parsing I particularly like because people are interested in XML parsing, particularly that it's extremely slow. How can it possibly be that slow? And I think you have to go back and blame the authors of the spec, you know, really. I don't know. Like Jason has a lot to say for it. But either way, people come up to me and say, I have this brilliant new idea for parsing XML is going to be so fast. We're going to memory map the file, we're going to use pointers to something or other, we're going to copy it into buffers and we're going to do very little work. That's really interesting. That will be a lot quicker. The problem that I have is that I have a constant time XML parser. It parses any size of XML file in a constant time. And can you be that? It's kind of a nasty trick, really. But you saw the CPU, the graph, right? There were all these threads sitting around there not doing anything, right? I mean, I just cheat. I just shove one of these in at the bottom. And so, effectively, we have a SACS API. A SACS API is incredibly inefficient. It's really not designed well. The worst thing you can do for a CPU iCache and dCache and whatever is to do a very little bits of work that then create lots of other work that then comes back and does another little bit of the same sort of work. And so there's an unzipping bit. There's a parsing bit. There's a tokenizing bit. There's all these things. And then it disappears off into La La Land into the calc core filter. And it comes back again and it's a disaster. So it turns out that even if you don't have two threads, doing a whole load of parsing, storing the results, and then emitting all the events that you got from it is actually good news. Anyway, so this is nice. So the fixed cost then is only really true if your consumption is slower than your parsing. Obviously, you have to balance these things. But actually, we try and move some degree of work between these two threads dynamically, depending on who's more or less busy, which is quite fun. So, yeah, that's good. So there's some set-up time as you parse your first swing buffer full of events, which is the kind of constant. What else? So we've been trying to throw calc, as I mentioned, and there's some heroes, Tullo, Chris, and Dennis Francis have been doing some work for me on this. And they are... Yeah. So we have this single SD interpretive that calculates every single cell in your spreadsheet. You know, every formula is a stack-based, reverse polish. You know, it's great. And so we're trying to cut this thing out. It seems like it might be something reasonably separable. And then you discover that actually the token array, which describes the formula, is being mutated as you iterate over it, like the iteration index in the ancient code is stored as a member variable of the token array. So, you know, two threads. Yeah, not very nice. And of course, macros have access via this Uno API to the whole model. They can delete the formula that you're editing while you're doing it, you know. That's great. And then, of course, some of these formulae actually mutate the dependency structures which are document-based. And if you parallelize this, you, again, have major fun. And then there's a whole lot of other stuff that's mutated in document, and then caches that are built. But after a while, hopefully, you can chop enough muck out of this that you can shrink the domain enough that you can, again, persuade yourself that this might actually be safe to do. We put a whole load of asserts in there. So we, you know, in my old age of having been a type safety bigger, I'm turning into a thread-safety-run-time-checking- Python-loose-type person. Luckily, we have some very large data sets. So you're actually loading and calculating some of these documents. Give some nice assertion failures. We're still working on a routing out here. But, yeah, so it's quite fun. And it actually gives you some reasonable good results, too. If you want a nice 16-core machine, let me advertise AMD, who produced some very nice ones. And at a reasonable cost, too. Look at that. And they funded a chunk of this work. So this is my Lane Linux laptop. And the Windows one just happens to be a much faster machine, as you see. So that's why it's only taking half the time. Yeah. So what can I say? It gets quicker, but not 16 times quicker. Particularly, the hyperthreading is not nearly as good as you might hope for, well, particularly, obviously, heavy floating-point loads. Yeah, so what I really like to see is some language support. So what I need is some innovative people who love thread safety and love improving languages to go and persuade the C++ Standards Committee that having some kind of really restricted subset that does a lot more checking would be good. The C++ Standards Committee are still trying to study the problem of having built-in primitives like threads and mutexes and things like that. But hopefully, we can get them past this and into the fertile ground of language innovation and include all this good stuff into, I don't know, if you're crazy, come and see me later. I can sell you some Swampland as well. Good. So there's a whole lot more to do there. And of course, at the same time, we're trying to clean the code up so people can get into it. So one of the great things here is the German comment translation. These are the people who actually finally got the curve down. You see this? I'm very proud of this bit here, you know? There are lots of people that start projects. But this is really like an exponential decay to me, and I was getting pretty frustrated by here. And I said... Bang, you know, these people turned up. Brilliant, Jens Karl and Johnny M. They said, right, we're going to finally fix this. Of course, there is a single German comment left. Well, actually, there are probably some we can't detect easily. You know, just for old times' sake. So you can see some particularly fruity examples of how not to do it. Yes, this is strain horror. So of course, the code base is very old. And yeah, I don't know, this is how we used to create strings with this RTL or U-string, RTL, U-string, PRAM string. Yes, anyway, in the modern world of magic templates, you know, you can not only compile very, very slowly, but you can make the code look nice. And actually, you know, human time is much more valuable than compile time, perhaps. We also have pretty iterators in this world. You know, it turns out that, you know, virtuous as you feel, having typed this great long beast, managed to get it compiled, and understood the error message when it doesn't compile, it's heck of a lot simpler to do just that. And so there's a large-scale code clean-up thing here, shrinking lines and, you know, making everything good. And a huge tool there, and I say this with some degree of sadness, because I'm an FSF lover and a Richard Storman, I have a soft spot for the man, you know. But the GCC is well on the way out, and Clang is just doing awesome stuff. So we have a whole lot of Clang plugins doing automatic rewriting, effectively expanding C++ for our use case in some cases, but also generically with some really, really nice tools, you know, I mean, the unused field checker is something that everyone should use, right? It looks at your giant corpus of code and goes, eh, you realize you got this field, that field, this thing, and no one's actually using them. You might want to consider, you know, actually removing them. Default parameters, you know, unnecessarily virtual, de-virtualization and so on, see. Just trying to clean stuff up. The ones that have sort of been most used in the last few weeks, there's whole loads of changes going in there, and also some sort of aesthetic ones, too, you know, making this function look less ugly by de-confusing its logic in various ways, which is quite cool. So I'm a fan of that. And then cleaning up other stuff. So I mentioned our GDI, we also have Apple, you know, have a Cortex API for shaping stuff. Windows, we were using Uniscribe directly. All of these APIs, of course, are changing slowly. And we thought, wouldn't it be nice if we could use a single cross-platform free software thing that we can look in the box and it lays text out the same everywhere. It shapes it. So thankfully, due to our wonderful donors and the Dockin Foundation's funding, Khaled Hsni, who's a wonderful Egyptian chap, has basically moved us all to use half bars on every platform, which is brilliant. And we would love, I would love, at least, to use free type everywhere, as well, to render pixels. Because then at least you're 100% sure the pixels are just identical everywhere. The slight problem with this, on Windows, in order to print, you need to use GDI. It may not be obvious to you, but a lot of applications are just drawing on a very big version of the screen. You know, it's all in pixels, what comes out of your PostScript printer inside Windows. And of course, if you want a native platform window, you know, print dialogue, and you don't want to send lots of pixels to the printer, you're kind of doomed to use GDI. Or GDI Plus, or Direct2D, or... Or you could generate the PDF equivalent that Microsoft forked, which is called XPS. I think there's some kind of hook where you can shove that in and still use the native platform thing. So it's kind of a bit irritating when you're a WYSIWYG word processor, and it's important to you that what comes out of the printer is actually what's on the screen. It's actually a very small use case in actually carrying out paper still, and the web browsers don't have this problem. What comes out of the printer bears very little relation to what's shown on the screen. But we actually really care that those glyphs are really there, and that's a bit of a downer. And it would be great to come up with a solution that works in Windows. Drawing their primitives, so increasingly just moving to new and cleaner rendering models that work really well. So again, I've done some great work there. One has been doing another thing. So there's an internal tension between having widgets that are native platform widgets and having code that's easy to port and works everywhere. And of course, we have both. So that's very important. And there's various ways of achieving both. We have our built-in UI toolkit that's now actually used again for Collabor Online or LibreOffice Online. And you can see the full glory of what widgets look like in the 90s, you know, with their hand-drawn bevels and that sort of thing. We're getting almost back to that, the whole flat UI look. Actually, you know, it's not too different. But to make things look pretty and to make the black-on-black theme work that everyone wants, you know, we sort of capture roaming widgets from different toolkits, and we torture them into making, you know, the pixels we want. And it's not very great. We miss a whole load of animations and other stuff. And so we've tried to tease pieces out of that, the native menu bars and files, I mean, file select is just a must. Obviously, that's been there forever. Our tooltips. So really, Quain On is a hero here. We've finished converting all of our dialogues to Glade XML. So we have actually a file format that's reasonably editable. We have an editor for that now, which is useful. Previously, it was all done in twips, and the twips were scaled. So we knew that German is a longer language. So you should increase your font size horizontally, like all of the positions of everything by, ah, 50%, something like that. But if it's Chinese, we know it's a much shorter language. So we should shrink all of the positions by, say, half, so that it looks nice. And it turns out that actually doing proper layouts is better. So... And you laugh. And you laugh. But I talked to the VCL-oriented class library people, and they said, yes, we made a deliberate decision not to do automatic layout, because a human being can always lay out the dialogue better. I'm not kidding. But whether he can lay it out in 156 languages, you know, better, is, yeah. It turns out he never has the time. So better to do a bad job. So anyway, there's nearly 1,000 of these UI files, and another thing people say to me is, why don't you do a Firefox, you know, like strip down to the bare essentials or a Chrome, which is perhaps even more extreme, you know, remove everything, right? And part of the reason is that we have nearly 1,000 dialogues in the product, and there are lots of complex features in each of those, you know, there are many tabs, and just the surface that you need to actually control what your document is doing and how it lays out is staggeringly vast. I tried to grasp how many UI files there were in the whole of GNOME. All of these things on my disk from the rest of the desktop in total. And it gave me a suspiciously smaller number of this of the order of about 200. But if you can repeat that on your laptop once or later, I'd be pleased to know. It's hard to grasp the scale and the bigness of what's that. And the good news is now, Aquarium's done some fantastic work making it actually load UI files. So if I can try and get my VLC to show you, I can show you the demo of his work here, really. So this is Loob Office, if you haven't seen it. And this is a dialogue. And this dialogue is actually using GTK. So it's a GTK dialogue. And you're getting blue. You can see the fading out things here. You can see as you move, it's got some degree of beautiful transitions. This thing on the right here is actually a VCL rendered preview thing. Because that's kind of hard to... You can't pull that to GTK. It's just like tons of work we have those. And there's lots of weird... I'll play it again because it's obviously too cool. And if you're appreciating the goodness of the pixels in the background and the intrinsic value here. It should be really good. It should work beautifully. And not like a squatting alien for those corners where... I mean, I'm not prejudiced against aliens, I would point out. I'm an alien lover. But, yeah, here we go. So there you go. So some great work there from quite a while. I'm really impressed. Of course, that leaves us in a WX widget sort of situation where in some cases then we're using these native things. There's really horrible theming stuff. The page breaks around, the focus around is knackered. This thing is not working nicely. So actual improvements as well. Other things we're doing to try and clean up are things like the security work that's going on. So OSS Fuzz is rocking my world and my mailbox as it spams me regularly. But many other people. And again, Craylon here is just a hero for fixing endless things. So there's a dozen core cluster, perhaps more. They're the Google providing to run AFL under ASAN. Well, I don't know, if you have a large C, C++, anything that compiles the binaries that can be executed by AFL, you really, really need to be plugged into the sky. It's just unbelievably good. And people come to me and talk about manual auditing still. Generous, lovely, wonderful, wonderful people. And I try and persuade them that actually, in fact, we all know that robots are going to drive our cars. It's inevitable. They're probably into walls initially. But in the end, it'll all be good and walls will get softer and, you know, it'll all be fine. But one of the easiest tasks, you know, is some of the stuff we're seeing, you know, like the clang plugins that are starting to do a whole lot of code cleanup tasks that engineers can do automatically. That's only going to grow. But one of the perhaps most tedious tasks in the world is auditing, you know, following the consequences of being slightly too large through the code and the exact side effects of that. Or following, you know, the life cycle of this or this index into some other thing and where it's dereferenced. And following all of those back through the code back to the file or untrusted data. There are people that still do this and they drink coffee, I hope, lots of it. And, you know, they prop their eyelids open with matches and they do find things. However, the problem is that when they've gone away and heavily audited the random number generation, you know, scheme in SSH and Debian. The problem is that next week someone comes in and, okay, there is still a role for manual auditing as you can see functionally. But, you know, they carefully audited this binary code checking thing. And the next week someone comes and commits a patch that looks perfectly inoculus that breaks exactly this code path. And the great thing is that OSSFAR starts to show us these things within a few days. So simply because it's there, it's got this huge corpus and it's updating very regularly and following master. And it's got genetic algorithms and AI and back propagation. It's just an amazing thing. It sits there and it finds the problems before they can escape, which is just fantastic. So the benefit of hooking it up and, you know, consuming an entire, you know, North Pole's worth of electricity is somewhere else is that, you know, you have security in the longer run, which is great. Of course, Caverity is another great static checking tool and Caverity Scam we're grateful to and our score is still lower than that of the Linux kernel. And I'm not claiming that that means that we're more secure than the Linux kernel, Samba or, you know, I don't know, whatever else you happen to be running, but it's nice. You've got to, you know, tease these people, haven't you? So we do all this rampant and rabid refactoring or re-fixing and cleanups and all this sort of stuff that there's no commercial justification for, you might think, except that if you don't do them, your code will become like a relic and not a very holy one either. So one of the things we need to try and stop this becoming a real problem is unit testing. And you'll see that the graph almost looks like it's going upwards, like 6.0, there's even more unit tests to search. Say, I'm a lover of unit tests and we need more and more of these and this is why there's a great credit list. These guys did greater than 20 commits to unit tests in the last year. So there are my heroes, look at that. Fix each bug just once, don't continually fix it. You know, there's plenty of jobs for life in LibreOffice without needing to re-fix the same bug. And then of course, you know, refactors, as we all know, just cause regressions and don't, you know, do any good for anyone. So how do the stats look? Well, we track our bugs very carefully and the Q18 is awesome. RAL is here, I think Cisco is here. Is RAL here? Yeah. You know, waving over there. Yeah, so we tag regressions very nicely and so we look at these and so over the last year, we have 33,000 commits and our open regression count has gone up by 142, which sounds bad. I'll talk about it in a minute. But that's one effectively lingering regression for 0.4% of commits. I think this is a reasonable, a reasonable ratio. I mean, it's not for free. There's a lot of work going in there. There's something like 2,000 closed regressions in that time that we've created and killed. So many of those didn't escape. Thanks, thanks again to the Q18 that are tested before they hit the masses. Of course, there's always one or two embarrassments in the early series, but how serious are they? Well, of course, we also try and triage the most irritating of these and make sure that they're, you know, they're addressed really quickly and they don't escape. I'm just going to persuade you that some bug fixes rub Peter to pay Paul, so one of our problems is, you know, that some of our bugs are in the form I click on the file and then I go and make a copy and when I come back it's opened. You know, and we're doing some great work, but often we have to do a trade-off between well, let's make this case much quicker, but unfortunately this minority case will get slower. And this is a regression minefield, you know, like this minority case that you think has one extremely vocal user who files regression saying, you know, everything got 100 times faster for 90% of people, but my 2% use case got half as quick. What are you doing? You know, like, you idiot. And so you get these regression bugs and the, you know, of course you could revert the patch, but the problem is you shipped it now, so now you get 98% of the people going, what? It got 900 times slower, so what are you doing? You know, and so to some degree there's an eternal nature to regressions and that's my excuse for it going up slightly anyway. I think there are other causes for that too and we should probably fix more regressions, but at least in theory it sounds plausible. What else? Okay, so online an entirely different platform. So the difference between having your software on client side rendering to a single desktop versus service side is quite noticeable in terms of how we structure our code and what we optimize for. And if you haven't seen it, LibreOffice Online, CollaborOnline is just awesome. You should, you should again play with it and, you know, tell your friends. But the optimizations are completely different. So in, on the PC we're trying to avoid loading stuff off the disk because disks used to spin and they were slow. And memory is precious. So why fill it full of stuff you don't need? So there's all of this componentization splitting into smaller pieces. And, but online we, we hate that because well, it'll become apparent. We actually deal open all of these things and we link them day one. So we're like, yeah let's take a 150 megabyte, 200 megabyte hit. Straight out. And, and, and multiple seconds. If it takes 10 seconds to start, no problem. Let's load all of the dictionaries and all of the hyphenation patterns as well as we start. Because having done this we can then fork our children to have this wonderful copy-on-write thing. They don't need to load any dictionaries, any code. They don't need to do any linking. It's all just there. They just, just use it. So we, we, we deliberately go wasting memory in our pre-initialization left and right. And we waste CPU time. And we should really waste more CPU time. And more memory. There we have some good ideas of, of ways to make it bigger. And, and so on. And if you use Java, you know, EJBs or something like Java startup time heavily focused on the server, server side of this, which is perhaps why it's not succeeding on the desktop. So having done all of this wonderful copy-on-write sharing work, the problem is that then we, we fork and our children, if we're not careful, go and then touch those pages. Which is really bad because every touch you have to actually physically duplicate it. And depressingly often having touched it, you then put it back to where it was. So I, I read a little tool. It could be really useful actually for things like KDE in it if it's still in use. Web servers that starts to look at your memory as compared to your parent's memory and which pages the operating system is sharing. And which pages are actually different, which is actually a different set. So there's actually quite a number of pages that were once touched and now the same. This is just a little example of its output. So this is a string that says MathML 2.0. I don't know why it says that. I didn't really care. But you'll notice that there was a three at the beginning of it. And the, you know, the parent had one. So as we forked and did something, we, we incremented the reference count and in doing so we wasted a 4K page. Now one 4K page I can cope with. But when there are lots of them, this is not good. So actually creating the tool helps us save, you know, some significant chunks of memory. We staticize all these strings so the reference counts aren't mutated now. We have a different startup allocator that's, yeah, that handles this. Which is kind of nice. And so it can give you the statistics saying, you know, how many, how many of these pages are dirtied. What I really like is to integrate with Massif or something like that so that we can say this non-string toxic bit of memory I can't understand that looks very repetitive and weird, actually was allocated by this stuff, which would be kind of nice. That's always a tool for that already. Julian will tell me in a minute. The other thing about memory use is that you, you find some interesting things in the code. So we use Cairo to render fonts on the server. The comments are particularly encouraging when you're trying to find why this sort of 40 meg of stuff for a relatively simple document hanging around. Yeah, these numbers are arbitrary. We've never done any measurements. Well, we did some measurements and yeah, it turns out that having, for example, a fixed number of glyphs that you cache independent of their size is not necessarily a good thing. So if you work on Cairo or Pixman maybe you'll get some patches from us soon. But possibly we actually need to take advantage of this and pre-render a load of glyphs in common fonts, in common sizes. You know, Arial, 12 point or the equivalent, you know. So that then we have all of these things ready. We don't actually need to allocate any memory. It's just a blitting experience. We've been tunneling dialogues, which is particularly encouraging. I've said a lot of things that could be interpreted as sleep office has problems. But actually there's huge strengths. Doing this online stuff Kendi and his team have been actually enabling this Pranav Kent as this guy working on this to collaborate. We've been pushing these dialogues through to the client. So it's a bit like a VNC, but it's collaborative. So you can then have a similar selection of text. You can have multiple people popping up copies of this dialogue. One can change the border, one can change the color, one can edit the size of the text, commit it and it all works. And that's pretty nice. And actually the work to make it all work while it's ongoing and requires auditing and so on. So I agree. It just works out of the box because of the existing good design decisions underneath that. There's some work there to try and make modal dialogues, not pile up a huge stack of pain. So I'm told I only have like half 30 seconds to questions come. So I just bashed through some of the stuff in 6.0. One of the things I'm excited about is, well the EPUB 3 is pretty nice. Mail Merge was always a bad idea from spreadsheets. You should replace for your contacts. It's an even worse idea from tables in writer. But we unfortunately have a customer that stores their address book as a word processor table and then wants to mail Merge from it. Bless them. And they can now. And I do bless them. We need more customers. So APNPGP is again really cool assigning and encrypting documents there. So you know some fine stuff. Better filters, better round tripping. The EMS stuff is really nice. They're from I think improving previews. Better ergonomics, improved neighborhood bar. We dropped support for Windows XP and Vista. Ha ha ha, hooray. At last, you know I mentioned that these APIs have a long tail. And we have angry people, particularly in Russia who complain about Windows XP. Which is strange. And of course loads of stuff in online as well as just a whole slew of features there. The Android version is coming on nicely. So pictures are being inserted, presentation modes. There's some exciting stuff from Jan Iverson and Jan Neuer-Mürt on iOS which is coming and probably we'll see that later. But what you should do is you should think, this is an awesome project. I could get involved. I could do something. I could make a difference here. You know we've got dozens of other big projects in the world that are duplicates of other ones. But LibreOffice stands alone. It's the open source word processing project, you know, out there. And you know, come and make it better. There's something for everyone. There's links you can click. There's fun stuff. And one of the nice things about it as compared to writing a small Python script is that the problem domain is so large you cannot fit it in your head. So the techniques you learn working on LibreOffice are the big grown-up engineering techniques that you're going to need. And once you've grasped them and you're confident with them, you can work on anything. You can do anything. It's like a Superman training school, you know. I didn't learn anything, but I tell you it's just amazing. So I've got a paper on this and you can read it. It's quite fun. And just getting a good structured approach to solving big problems in a way that you break it down and actually make progress. Cool. So yeah, we need smart people. But what we really need is that we live in a world of AI and technology. And you know, increasingly this is automating everything we do. Documents should be no different. So what we really need is an AI of incredible subtlety. And I hope some of you have seen some of the traditional neural networks and data sets and training that can see what you're doing and suggest cool new things to make it easier. So yeah, that would be absolutely brilliant if we could go there. So conclusions. Well, there you go. Force Platform Change really sucks and it's tough on us when I see that little hack to clean up GTK to remove the threading support that we've unfortunately broadcast to 200 million neural programming API. I feel sad inside, you know. But on the other hand, I know that GTK3 will never die, at least not for the next 15 years. So anyway, restructuring, refactoring, it's fun and it's risky. It is risky to some degree and today it may seem hard to justify, but tomorrow you wish you'd done it. Stopping doing it is also extremely risky. And we seem to be surviving the volume of change, improving and growing, which is cool. Online is awesome. Free your data, share it with people, control it, don't give it to a large evil or not evil corporation. Just keep control of it. And yeah, people are always welcome. So thanks so much for supporting LeBrotis and listening to my talk. Very good. We have a few minutes for questions. Are there any questions? Do you have any unit test for user interface, like emulating user clicking, typing texts and checking if the results are correct and if so, what do you use for such unit tests? That is a fantastic question. So we have a chopping, open QA called Rob, who's come exactly to be trained on that. We have a hackfest later, Marcus with the red shirt over there waving a hand right now, wrote the infrastructure to do that for the document foundation. It's built into our toolkit because it really needs to be and yeah, we do have some degree of those tests and you can run them in Python I believe and they're quite simple to do. Great way to contribute. If you find a bug and you don't want it to come back, write a test for us. Make sense? You said you had some issues with threading, as everyone does. Do you have there's nice frameworks and libraries in C++ now to ease development like HPAs, boost fibers. Did you look at those? Do you plan to use one? Yeah, so do we plan to use the new C++ features? So yes, definitely for threading. I suppose so. So we have a threading abstraction of course that's old and we're trying to move away to that to more standard C++. There are benefits to that. It's quicker in some ways. I think there are a number of things to be said about contended mutexes and critical sections and standardization and windows not working well and other things, but why not? Come and get involved. Make our code look beautiful. That would be cool. Anyone else? Oh, gentleman over here. Hi. I work for a government department that's recently adopted Office 365 for all our staff. Are we locked in forever or is there some way where you can help us move in the future? So when you say you've adopted Office 365 what does that actually mean in practice? Because if you're using the online editing functionality of Office 365 you rapidly discover it's less functional and collaborate online which is only two years old and it's not a tool that you use. What you tend to use is the PC version. And actually if you use Office.com and this thing, you discover that almost anything you do there's an advert for it. You can't really do that here. Try it in your PC version. So I think often the use of Office 365 is a data store which you shove documents into which are in a non-standard but popular format we understand to some degree and then a PC version of an Office suite. As Larry Ellison said, if cloud is the new you know, if Orange is the new black and cloud is the new whatever, then we'll make cloud press releases and branding. But I don't know. How tight are you to the actual cloud web service piece or is it really still documents? I don't know. Yeah, yeah. So be scared but perhaps not incredibly scared and consider using next cloud own cloud Pideo C file, any of our you know the open source alternatives that can now actually in many cases do better. Actually Office 365 is subscription service. Right. And as you stop paying for it you basically have to remove your software from your PC. Sure. And it's a standard way to get Office running but it's basically you are not tied forever you will eventually will decide it's not worth to pay that and actually very costly. Yep, yep. Yeah, that might point out. So I think there's a little truth in that. Anyone else? No, gentlemen, the front here. I will. It's a big project. How do you balance new contributions versus the code quality to keep it maintainable? I mean it's a balance that any project leader has to make. What do you do? Yeah, absolutely. So just to be sure, just to be clear, I didn't lead the project. I merely participate to a degree. We have a sort of collective leadership a thing that's an engineering steering committee. So first perhaps slight improvement. But I mean it's a difficult balance. We are not blessed with a vast number of competent new people who want to contribute. So you will be welcomed. You know, don't worry about that. But in terms of trying to steer people away from obviously dangerous things with very little benefit, occasionally we do have people that seem to insist on wanting to do just cosmetic changes of almost no benefit and that can be distressing. But the good news is that if people don't want to review it, it just piles up in their review kit. Yeah, yeah. It's a resourcing issue, isn't it? How much time can you spend to try and bring new people on and get them into the code? We like to think we're a bit of a tar baby. People get, you know, they can't escape after a while. Even though they say they do, they stick to it. And the same much low-hanging fruit. It's easy to read 10 files and find lots of things to do. So, yeah. Do you have a go? I think that's my time. Thanks very much. All right. Thank you, everyone. Michael Leeds.