 So, actually, I'm not sure if I've talked about Plasma in the past, at least Plasma specifically. Well, let me introduce myself first. My name is Aleish Paul. My description is much more different than before. I mean, besides organizing sprints, I do other things with my life. Well, that's my email. I'm on the KTV board. I've been developing KDE for over 10 years. I put over 10 years here, but it's actually 12. I'm starting to feel old, or starting. And I'm employed by Plasma. Actually, my realization that I'm starting to feel old was that not only I've been for 10 years a KDE developer, but I've been more than half working for Plasma Systems, which is a weird milestone in my life. Anyhow, I develop things. And actually, the project I'm going to talk about here has been something I've been doing for Blue Systems specifically. Like, I do KDE stuff on my own time as well. All right, so I'm going to talk here. Well, first of all, I'm going to talk about what I did for Blue Systems, but it's not only me being working on this. Actually, not only in Blue Systems and not only in KDE. There's several people who have been worried about performance in the past. It's a recurring subject for software engineer, I guess. And well, I'm going to talk a bit about where we come from, the tools I've been using because, well, talking to you guys about a success story, I mean, it's kind of a nice hard feeling, but it's not that interesting, maybe. So we'll be talking about the tools we used in case you want to optimize something. And we will end up talking a bit about the stage we're in now and where to go from here. For me, I didn't really look at that Plasma startup ever. Like I said a bit earlier, or maybe hinted at, I wasn't that much of a Plasma developer until this thing came into the official woohooer of, until this device came into, well, I would say my hands, but actually it would be my boss's hands. They talked about it before on the other talk, basically, well, because they're coworkers, so they had the same problem. And also, it's a very interesting product, right? Like I said, it's a very cheap laptop, and well, we do a lot of, have a lot of laptop users. And well, in general, you develop on, I mean, this laptop is probably 2,000 euros. I cannot extrapolate every experience I have with my laptop to what other people, other users are going to have, right? For example, I remember, I am not hearing that much anymore, but some time ago, you would get these people coming with a spinning disk on their system and saying, well, something takes a really long time to start. And yeah, I mean, my laptop cannot reproduce that. And I wouldn't, I don't know, get a super old computer to do that. Maybe that would have been the nice thing to do. In any case, the big problem with this device was that, well, IO was super slow. Actually, it was even slower for us, because since we needed to iterate on the device, we were testing mostly from the SD card, which actually wasn't that much slower than the internal disk on the device, because the internal disk of the device is also really slow. But it was quite slow, super slow. Definitely much slower than any other laptop. So the profiling we did, the improvements we did, probably were only seen mostly on the IO part. Actually, that other laptop, it was, or is, dual core and several gigabytes of RAM already. So I mean, it's decent hardware. It's just that it took a long trip to just get things from the operating system and back, right? One thing, for example, we started looking at was K-Package. K-Package, you will know that Plasma has been using, it's a framework that we use for having small plugins that people contribute. Maybe old Plasmoids, for example, come from K-Package, but it's not only that. There's several pieces of our architecture that will be using it. And for example, we found in this case that we were reading some of the metadata files several times. And like I said, reading from this was super slow. So while cutting down on that was a huge advantage. In this laptop or the laptop I had back then while reading, I mean, it obviously is slower, right? But it's very different if it's a slowdown of 1 millisecond or 100 milliseconds when your laptop is doing things super, super quickly. What we did there was making sure that we don't read too often. We don't write too often because when we write, we have to read again because, well, it's desktop files. We have another problem with K-Package, which is that it doesn't depend on K-Config because it's a tier one package. In K-Config, we have the good desktop file puzzle. It relies on a bad K-Config puzzle we have in another framework, which is slightly slower. So we actually started converting these metadata files into JSON, which is actually what we're using for the rest of KID plugins that we have all over KID. So we will see that if you have K-Packages, which we're installing right now, the desktop files and the JSON files is actually because of that. And if, well, just make sure that you're installing the metadata JSON file for your systems because it will help a lot the loading, although it's not technically necessary. We fall back into desktop file if it's not available. Something also started looking, which is what's a Star KID. Star KID is a script, or was a script, actually, from, well, back in March 1999, which is a long ago. And it grew all over the years slowly, but slowly. When we started looking at this, it wasn't super old only. But it also was, yeah, somebody forked it at some point because we started working on Weyland. And I mean, adding ifs, especially in Bash, was boring. So it was much easier to duplicate all of the code. So I mean, that's what we end up finding. Here, I moved something. I'm missing something in my slides. Ah, oh, it was, all right, well, whatever. I had the, yeah, spoiler alert, right? All right, let me see if I can, no, oh, for fuck's sake. Well, I fished the URL where somebody started it in SVN. Actually, I think that it's not even the first version. It's just the first version I could find because it was coming from CVS, like even longer ago. Can I add a new tab? Oh, it's in the wrong screen. Well, this is the first StarCaddy I could find. Can you read anything? Well, what it was doing back then was doing some K-Control in it, starting an audio server, starting a WindowManager sound, WM sound. I don't know what would that be. Then it had a slip, which was forked because there's an ampersand in the end, but it was still a slip, and it was launching the KFM, which was the KD file manager, which is what we use back then as what we use right now as Plasma Shell, and then the K-Panel, which was a different process. You can see it there. Well, that's a script that evolved into weird Frankenstein over Monster, over like, well, 20 years essentially, even even even forked. Basically there, my task ended up being, back when we were doing the, when we were doing the Pinebook was making sure that, I mean, it was like super long script, setting a lot of environment variables, launching processes, checking the result of the launch process. Obviously, like, all of these calls, since it's batch, it needs to wait for the, well, first start the other process, which would, one would think it's fast, and actually it's fast, but I mean, it's not as fast as just sending something to the processor like we do in C++, for example, right? It needs to, well, load a bunch of libraries and make sure that things happen. Then it was doing a lot of that. What we did that time was add a bunch of ampersons, for example, on things that we needed to make sure that they were launched some point, but, well, not necessarily, not necessarily, we didn't need to know if it had succeeded or so, whoa. Then we also looked at Clazy. I don't know if it's a tool you've looked at, but it's super useful to make sure that, well, we're making the best use of Qt that we want, it has a bunch of plugins that we will check whether something could be done better. It's a Clang plugin, so it uses all of the knowledge that Clang has about our source code, which is actually like much more than ourselves. Well, it gives you some warnings and hints, and actually it even has some functionality for, well, it's called fixit, which basically means that it modifies your code with code that is theoretically better, and then you get to decide if you commit that or report it back. We did a bunch of improvements there. We did a lot of, well, like I was saying before, making sure that we don't read too many files. One thing we do, which is very powerful in KT config, every time that you open a config file, it will fall back to KT globals file that may be installed by your system provider or by yourself, there are some global variables that will override the values that the user is getting. This makes a lot of sense in many cases, and it's very powerful, but there are some other cases where we're just using KT config to read a specific file and we don't need to check globals or check something else. Every time that we load the KT globals file, we'll have to actually read an actual file in the file system and we'll see the values, put them on a map. This is not, I mean, it's cheap, but it's not for free. So if we can save that, well, that was quite useful, especially on a device where the IO and the disk IO specifically was low, and like I said, sync. Like you will imagine there were a lot of cases where experiments that say this improves a lot, but then they didn't end up in production. Well, one thing that I tried that actually we never ended up merging in the end was having the KT globals file always loaded in the background in KT config, and actually this had some, like we could have proved that it was better, but then we would have had to make sure that the cast information was up to date all the time and I mean this had some kind of complexity which would have mean more code than possibly things not working. So we didn't go there, maybe we will want to go there. But in general, if you see these kind of things, well, consider if it's worth it or not and well, do it. Another experiment we did was actually like, like I said, our K package files, so plasmoids can are, well, they're essentially a little folder with a bunch of files inside. And what you do when you're using these formats is basically starting to read several files that are inside, you will have the metadata, but then usually some QML file inside or JavaScript or whatever. And something that we found that in this specific case improved performance was putting, well, those packages in separate RCC files. For those of you who don't know RCC files, they're kind of like a zip file that is more or less transparently integrated with Qt, so you can tell Qt just load this RCC file and then use it transparently. This was an improvement and actually at the moment you can tell K package when you created to create an RCC file instead of just installing all of the files, just something that you have to obtain on because I think that it had some kind of regression on some platforms, but for example, on the Pinebook it was definitely better. This whole Pinebook thing happened, I think it was late 2017, maybe 2019, like I said, I'm getting old, so what do I know? But in May 2019, we had a whole new effort into improving startup for Plasma, coming from a whole different context, it wasn't because Pine or anything like that, there's a company interested in Plasma who said this is too slow and I don't like it, here's money, Mr. Blue Systems, and Mr. Blue Systems said, yeah, sure. So, and since I had been working on it kind of while I started working on startup again, well, yeah, me too. And well, first of all, this time, what I did was create the script, if you're curious, you can take a look at it. Basically what it does is it compiles a tool that comes from SystemD equals, SystemD boot chart, which basically tracks all of the startup chronogram and presents it to you in an SVG file. It does a couple of other things, like installing a file that you can use from SDDM afterwards that will let you choose to start with this thing. And it will help you profile the Plasma startup, it will give you a beautiful SVG file that I will show you in a second, what it looks like with how much time it was spent doing what. And actually, yeah, we're gonna look at the first differences we have there. Here, oh no, in here, and it's in the wrong screen. Oh, I have three of them now. So, actually, one of the first interesting things with it, I mean, the script itself was just to see myself what it looked like on my laptops, but well, nobody would trust me if I said there's improvement in my computer. So we installed it on Neon's CI so that, well, whenever there's a new ISO or something, we would generate one of these graphs. And, well, that's from May 8th, no, 2023, it says over here, well, on the URL, it says 23. What you can see here, for example, is that, for example, there's a lot of big spike here. Most of the CPU usage is at the beginning, of course. And actually, not at the very beginning, like there's some kind of idle time that the computer is actually starting to, well, see the light, but not really anything's happening. And here below, we can see all of the processes. For example, one of the limitations of this is that CasePlus QML is using a spinning thing that is not really supported by the graphics driver of the VM running on the Neon CI. So basically, it's rendering the spinning thing and, well, it shows a lot of CPU. But on any laptop running a normal graphics driver, you won't get so much spinning CPU usage. But other than that, it's completely quite accurate. We also see these weird things happening. This is coming from Ubuntu somehow. But in general, the plan was, let's see if we can squeeze this whole thing a bit into the start. A bit after that, we changed it. For example, you can see that this was moved a bit after, well, quite a lot after. Actually, this is the Discover Notifier while looking at the database to see, well, if there's updates on the system or not. For example, here we moved it a bit later so that it wasn't something that we needed. So the user can wait to see if they need system updates or what. We can see that here it's starting a bit, doing things a bit earlier. I will show you a bit now why that was. But in general, you could have seen that having nice tooling, it gives us some kind of edge on knowing if the improvements we've been working on are useful or not, or we are wasting our time. One of the tools we use that are most important is Divas. You will hear about it a lot. One of the ways of seeing visualizations of Divas is using the Bustle tool. Installing it in your distro is a bit complex because it's also a bit edge. But there's a really nice flatback that is maintained by the owner or the maintainer. So you can just install that. Basically, what it gives you is a nice chronogram. It's a chronogram, as in the time increases vertically, but you can see how every process talks to each other. And if there is some process that is blocking, you will get to see as well with the little arrows. This, for example, showed us how we are doing a big load of network manager calls. Actually, at the beginning, and first I thought, all right, let's look at Plasma Network Manager because it must be shit. And then actually it wasn't. I mean, there were some improvements we could have. I did there, and we could do some more. But actually there, the problem was that actually Qt, on the start of every application, it starts querying about all of the different networks that are available on the system. And actually, it's blocking on all of them. I have a tiny patch for Qt Network Manager so that it doesn't block anymore. But we will have to look into why is it needed to have all of the network manager state copied into every Qt process. But in general, I mean, it's very nice to be able to see who's asking what and what's happening. It's also useful for you, if you want to improve the battery on your system, see if the applications are waking up and working up other processes to ask for things. Well, then you can get to think about whether it makes sense or not. And if it doesn't, well, fix it, right? This is something really weird. Maybe somebody here knows is that I do have some kind of process that keeps asking Resolve for the hostname of my system. It happens every five seconds. I mean, it's fine, right? Like my laptop is not going to hurt because of that, but it's weird, and it annoys me. And it only annoys me because I've seen it on the thing. If I hadn't seen it, I would be happy. In general, also, looking at whatever is blocking. To fix blocking calls, what I did was, using GDB, adding a breakpoint to the Qt function that says, let's do a blocking call and see why it was blocking. And if it's possible to remove it, then you remove it, and life is nicer. And Network Manager, and LibNegra Manager, Qt actually does a whole lot of blocking there, but I don't know if there's much we can do there. And grouping. We had, actually, Kai, he did some improvements on, while querying APIs, it's not very powerful. If you have to keep asking things one by one, you can actually say, give me all of these objects, and it returns you a big XML file that you can just pass at once. I mean, in this case, for example, it's so much more powerful or much faster to ask for a lot of stuff and processes locally than do a lot of requests to another process, because you will have all of the processes swapping back and forth on your system. And well, that's not fun. Definitely not a powerful performance. A good tool to look at memory is Massive. Massive is one of the tools by Valgrin. I'm not going to show it today, but it's something you need to keep in mind. The big reason why Massive is important for performance is not that much, because you want to actually have your processes not take a lot of memory, which you actually do. You don't want to take all of the system's resources, but it's also very slow to allocate memory and deallocate it all the time. So we need to use that and make sure that you don't do a lot of things. OK, Gasgrin is the same, but instead of using Valgrin Massive, you will use Massive Valgrin Colgrin, and it gives you where in the CPU is spending the time. We need to remember when using Valgrin tools is that it's not actually running on our processor. It's some kind of virtual machine, so it will not be 100% representative about what we're doing, but in general, it was really well. Another one I really like, really, like this one and Massive visualizers have been developed by Million from GitHub, and he's awesome. He's also a KD developer. That's why I'm praising him. Hotspot is really cool. This is Hotspot, and basically, you get to look at where you're spending time in your process. You can filter by processes. So when I was talking about blocking calls, I mean, it's especially bad if you're doing blocking calls because it will be slow. But it's only or it's especially a big problem when you're blocking on the main thread. So if you actually keep the rest of the system or the process from proceeding, but if you can put it into a separate process, it's usually nice. So what you can see here is that's why I took this picture is that there's a big chunk over there that says red image. This is plasma loading the wallpaper. And when I looked, then it was like a big 30% or so of the whole run. And well, it helped me track down a bug in Qt. So this is after fixing. You can see before, after, after, before, or it's a bit slow. So with this, actually, the patch was really simple. I can show you the patch if you like. But well, we didn't just improve plasma. We did improve all of the Qt users of PNG images. So basically what it was doing is it was still loading an image that was the same size of my display, but it was scaling it. It was scaling it to the same size that it was, which is, well, unnecessary. But it was doing it. And now it's doing it much faster because it's just putting the image where it has to be and, well, not processing it. Brilliant. Another nice tool, this one is coming from Nome. It's SysProf. Why this is interesting is that it doesn't focus on one process. It will be profiling the whole system. So for example, what I did was modify my plasma startup script to call these two things, which basically is start the SysProf and then after 20 seconds, system D will kill it. So I was only concerned about the first and the second. So I started a process that started gathering all of the information. After 20 seconds, I killed it so that it was saved on the system. And then I would analyze it. QML is something we use a lot. We need to remember to only initialize the objects when all of the properties are set. It's not fun that we spend all of the time with what properties changing quickly. We can delay things using the QML loader. And there's key object tracking. I am actually out of time, so you can look at my blog, what it's about. But if you use QML, I think that it could be interesting for you. There's API trace, which is nice. If you're using OpenGL, you will get to know which OpenGL calls are being done, which textures are being uploaded. If they're too big, if they're too many, anything that can be reduced, you will be able to see there. GPUviz is a Valve-developed tool that you can use to see where actually every process is spending time on and how it relates to the VBlank of your graphics card. It's pretty cool as well. And one of the things I did, for example, was dumping that Bash script, because Bash is not how we make things fast. We're using C++ now. We're using libraries. Well, if you want to hear some stories about that, you can ask me when I have a beer in my hand, and I will tell you all about it. We ported away from deprecated APIs and like, crazy related things or things that you see that they could be done better, like this kind of problem. We had tens of these. I mean, it looks absurd, but whatever. And things to do, there's Plasma Shell. And actually, every Qt process is doing font config loading wrong. If we do font config loading right, we're going to improve the startup of any Qt process. So it's not going to be only Plasma. Actually, it will be Plasma. It will be Queen, but it will be Kate. It will be Dolphin. It will be everything. So we have to look into that. I started looking into that a couple of days ago, so I didn't fix anything there yet. Actually, there was a guy who even did his degree thesis on it, or her bachelor's thesis on it. And he just didn't contribute it upstream, which is, well, sad. But oh well. We're doing more things with Wayland. Better tooling with Divas would be interesting. And yeah. Thank you so much, Alex. Any questions? So about the Divas performance stuff, there's a thousand environment variable called QDBusBlocking called main thread warning milliseconds. And if you set that to zero, you effectively get a warning on console every time something does a blocking Divas call. And then you can just go through all of them and fix them, right? Cool. Any questions? Do the changes in QT6 regarding QML, will they change anything for you in startup? Everything I've been working on has been Qt5. There's, well, there's that change. It's already, actually, that one was on 5.12. There's a couple of 5.13. But in general, you should have available. If you have a nice rolling distro, you will get it as soon as possible. I'm sure that the brilliant things that I've talked about will have an impact, definitely. But I didn't look at that. OK. Any more questions? Do you have a before and after picture, like time used in booting Plasma before your changes and after? It's kind of hard to have. Like, we could show the boot chart from yesterday and we'll be able to see some differences. But actually, my last improvements have been in Plasma Shell itself. And it's actually kind of hard to tell when actually Plasma Shell is loaded. But if you want, I can prove you, it boots really fast on my laptop. OK. Any more questions? OK. Thank you, Ali.