 Let me put this on. Thank you. Thank you for sticking around. Very good of you. Here we go. Exploiting concurrency. It sounds bad, doesn't it? But it's good, you know? So here we go. So as we all know, processors aren't getting any quicker clockwise. You know, they take it about this quick. And there are lots of clever tricks to try and improve IPC, you know, obviously deep branch predictions and, you know, superscalar architectures and things. But they only really go so far. There are huge wins in IPC coming. The real win, though, I'd argue, is coming from all of these other CPUs that are sitting on your die, each of which can do multi-threads. I don't know if you know. One of our sponsors, look at this gratuitous commercial plug, AMD, they're releasing this thing called Ryzen rather soon now, which I think is going to set the benchmark for the number of cores you expect. 16 threads is going to be the new normal in this year. So it's fantastic to see Armin's work there. Our drawing layers are going to, you know, do all sorts of wonderful stuff much more quickly. But the punchline is that unless you're using the other 15 processors, ah, win, lovely to see you, you're wasting, you know, 90 plus percent of the silicon real estate that someone has very carefully etched and put on your device. And that's super sad, right? So we need to do something about that. So we have started, obviously, we're trying. We use actually quite a lot of threads. When you start LibreOffice, will you attach a debugger to it, particularly on Windows? You think, brilliant, we're already using six threads and we're not doing anything, you know? Unfortunately, most of the threads are literally not doing anything. So the main thread is where everything happens, pretty much, and then we have a whole lot of random threads. You know, there's one our custom memory allocator of dubious benefit has its own thread, does very little. There's a factory thread that's just listening for an accept on a pipe for some arguments. So if you run another process, it just listens there and handles the thread, it does see nothing. Update check thread, which is basically a sleep. I don't know quite why we need to run a sleep in a thread, but we do. We have the GIO thread, which I have no idea what it does, but it's probably, this is under Linux. And a Pulse Audio thread, similarly, we often don't play audio, but they're just in case, which is useful. Of course, on Windows, the situation is significantly worse. We use, I think, real-time timers on Windows, and Windows' time APIs are sufficiently bad that the way they do this is by creating a whole pool of threads that presumably do something really dumb. And GDI, of course, does this as well. So it's normal to have 20 or so threads in a Libre Office on Windows, and no concurrency at all, practically. That's the punchline. None of them are doing really anything. Still, if you're not using the app, that's fine, right? You're not doing anything. So it's not actually that bad, though. We do have some threads that make some sense. I think Stefan's config manager stuff has a nice thread that sort of spawns off and writes the configuration when it's needed in the background, which is kind of cool. And increasingly, some of these are quite short-lived, so they actually start and they end cleanly, which is nice. So there's quite a lot of work being done to get life cycles right there. We now have a thread pool that even works. Amazing. Thread pools, you know, writing them is quite hard, and we've done some work to improve our concurrency primitives so they're much harder to screw up with. So there are some things that we do do, not just the drawing layer, which is fantastic. I added it to the bottom, because I didn't know about it until I saw the talk before. But if you have huge high-fidelity images you want to scale, we do that across as many threads as we can steal. Things like zipping, ODF, we did a whole lot of profiling work to find why it's slow to save files, and it turns out that a lot of the time when zipping stuff deflate is quite expensive. So we spent a lot of time threading that and optimizing it, and then it turned out we were trying to zip images. Like, if you zip a JPEG, it gets bigger, and you just throw it away at the end. So you might as well just not bother zipping, and actually the images are most of our files. So although we still have threading there, actually the real big win is not doing dumb stuff, which is often the way. Our XML parsing I'll talk a bit about later. We do some threading there, but this is, as far as I recall, where the threads are used. So why thread? Well, you know, so there's a lot of reasons to thread, but at least the ones that I like are these. So the CPU has a whole lot of resources we often don't think about. They're actually reasonably limited. And things like your branch predictor. Your branch predictor is great, but if you do much too complicated a piece of code, suddenly your performance falls off a cliff. And there is a limited resource there. And of course, the less you use it, or the fewer branches you take, the better the job it can do. Similarly, your cache is a very limited size. So if you can partition your work into do this bit and then do this bit and then do this bit, each bit can work much more efficiently. So you can win even on a fewer-threaded or single-threaded machine. And the more code that the processor has to decode and turn into its own magic world of micro-ops, et cetera, and try and cache those, the more code you're using, the less effective that cache is. So the ideal is to have lots of very small tight loops doing one thing, handing work on, handing work on like this. So that really requires a different way of programming, really. So XML parsing, I'll talk about in a minute. But really, this whole message-passing thing of doing a little bit of work and another little bit of work and another bit of work and passing these messages along leads to a very nice, safe world. The only problem is that our code is not designed to do that really at all. So this is the ideal, yes. However, technology change provides opportunity. And actually, by using these threads and doing stuff here, we can actually whip the competition in some interesting ways. And we've done that already. So we've made it possible to load certain Excel sheets faster than Excel using their file format, which is something that I think we can build on and expand to lots of other areas. And of course, there are plenty of well-known problems with threading. So I'd like to talk about this just briefly, just to give a flavor for why this is cool. So we can parse an arbitrary-sized XML file and emit SACS events in constant time. So there's a caveat, a star at the end. But consider this, people always talk about their XML parsers and how mine's faster than yours, and it does these clever memory metrics, and it doesn't allocate, and it doesn't that, and that. Well, we forgot all that. We just use a standard XML parser, but we do it in a thread, and then we emit the SACS events in the main thread. So the constant time is parsing the first bit of the file, and after that, you're emitting events. And you would hope that the dominant cost is the consuming the XML and doing something with it. So effectively, you can get free parsing that your consumption is slower and that you have a free processor somewhere. However, having a free processor is the easy bit, right? Okay, perfect. So we've been doing this for open XML import from a long time ago, from something like 5.0, and actually Calc has been threading concurrent loading of sheets as well. So, huh, with occasional data loss, as you see here. Recently fixed by Cohe, and we were convinced the bug was in Calc, initially, or then, you know, we convinced the bug was in the SACS parsing thing. We spent a long time, and then we found, in the end, it was in the package, unzip thing right at the bottom, which has had an UNO API, so we were convinced it was thread safe. And if you mean safe by not crashing, it certainly didn't crash, which was good. On the other hand, it has a stream pointer and the stream API is seek and then read, and with multiple threads reading from the underlying zip file, you could interleave the seek and the read if you're under heavy load and, yes, you end up with rubbish coming out of your zip file as a set of XML. So, luckily now fixed, but, you know, there we go. So, the ODF formats, we just pushed some stuff in 5 for 5.4 that will start to thread the ODF format import. So, it's actually just one stream coming in then, but, again, splitting the parsing and tokenizing and unzipping from the consuming of the events. And thanks to Mohammed Abdul-Azim, who did some of this with Google Summer of Code, and switching to the Xfast parser. So, there is a hope, as we incrementally move there, that ODF import will speed up significantly. And if not, maybe it will stay the same speed but use more energy, who knows. So, where else can we win? This is the speculative bit of the talk where you're all inspired to throw tomatoes and get involved with doing fun stuff. So, XML parsing can still be improved, I think. So, there's a zipping, unzipping actually takes some percentage of our import time, something like 15%, I think, maybe it's only 10, I forget, something like that. So, we can put that in another thread, and then you have tokenize and so on. Unfortunately, XML parsing is really expensive, unbelievably expensive. It's namespaces, it's funny attributes, it's entities, it's, you know, just the basic inner loop of that parser is, it's just nasty. And it's checking that there are no duplicate namespaces and it's worse than you might think. So, quite possibly, we want some kind of look ahead parser that's going to jump ahead in your file and start pre-tokenizing it. Maybe, I don't know. XML parsing is something a lot of people do, if we could, you know, come up with a better approach. Unfortunately, you can't tokenize early because namespaces can change fully, but, you know, you can have a go. And so, we get some pretty big files. I mean, it's reasonably normal if you have a large spreadsheet to get, you know, like 100 megabytes of XML you need to pass. So, you know, small XML files, great, but big ones, highly repetitive are pretty bad. So, probably we'll need to do something there. Handling images is particularly slow. I mentioned earlier they're the dominant memory object that we deal with, and they're getting worse. People's cameras are getting bigger and higher resolution. So, you know, 100 pages of text is something about five megabytes of memory, but one image is easily that, right? So, I don't know how many hundreds of pages you write, but you probably put images in because they're worth a thousand words, right? Anyway, good. So, very high resolution words, I'd say. So, image handling is a bit of a mess, but we can fix that, and we have opaque image handles. Luckily our history has gone in sort of revolutions from very old slow to very, and actually the very ancient hardware that we had with opaque image handles is not dissimilar from the very modern hardware we have. Just in between times it went via the CPU, you know, CPU images. So, actually we should be able to queue our image loading and immediate, our image load be immediate, but only when you actually access the data or render it, we then take the hit and queue all these things, load our whole toolbar in parallel, decompress all those pings, allocate the memory, do all the good stuff. So, that would be a pretty nice isolated change. It would be quite helpful for rich presentations with lots of images, like mine, you notice? Lots of images. Anyway, VCL windows. So, on windows we have this disaster area, whereas on windows message queues are thread-of-fine. So, if you create a window in a thread, it knows which thread it was created in, and you have to destroy it in that thread. You have to ask for GDI resources in that thread and stuff, which is bad because we have a sort of thread-safe widget scripting API that can spawn threads, create windows, end threads, you know, just, it's really not nice. And yes, so we have all of these events that we try and push across into the main thread to do stuff. There's just an absolute bunch of horrors here, and Michael Stahl spends lots of time fixing deadlocks and hangs and things and tests, and you have these horrible uncontrolled re-entrancy hazards. So, every time you wait on a condition, you've got to be prepared not to fully block that thread because you might need to do one of these horrible windows operations that comes in. And so, just, it screws up your threading abstraction, and it makes, well, it's a nightmare, basically. So, wouldn't it be nice just to have a thread that actually did something once in a while, which would be creating windows, you know? Move all of this out into a place so you know that when you send a message, you're not going to deadlock or block or do anything stupid. And while you're there, another thing we could be doing is actually doing our rendering in another thread. Now, we have a metafile, but of course the metafile is also a file format, which is a bit of a disaster area, and it's also a limited subset of what VCL does. So, Quickie has actually a patch to try and refactor this into two pieces, and it's potentially possible. We've done some research, but, you know, that we could split this out and do our rendering in a separate thread. So then you can start doing your document layout and just measuring it here and actually rendering in the other thread. That needs a lot of work around immutable structures, safe things that you can share between threads. It might work. If you have a thread there, it would be nice. It would really help with OpenGL, which doesn't like threads despite the advertising, to have all of the rendering in one place with one context. If you use OpenGL, you'll know that it's just terrible and it has all this implicit thread local state. It would be really nice. We've always struggled with, like progress bars. So if you want to render a progress bar on your one thread in application, you need to spin the main loop, and so you have this yield that goes on while you're doing something. You yield and the progress bar moves, and hopefully nothing else comes in that's really bad at that point. And we live on optimism, but wouldn't it be nice if you have a progress bar that lived in another thread that carried on rendering and updating itself sensibly while you carried on doing what you were doing? Removing another horrible uncontrolled re-entrance What else? There are smaller pieces we can do, so there are various tools. One of the things, Rust is the new trendy language, right? We want to add safe concurrency, so why not throw away the whole language or most of it and start writing new compiler and stuff.