 First, who am I? So I've been with KDE for a long time. Yeah, the KPDF, which you probably don't know anymore, because that's how we used to call it, Ocular, 20 years ago. I've been doing translations, releases, working KDE games in KDE. I was the founding president of KDE Spain. I was part of the KDE, or I'd done a little bit of everything. And the most important part of this slide is that I'm not a security expert. So I'm going to talk to you about something which is security-related. But it's just something I've been doing lately. So if you have questions, I might not have answers for your questions. So just be aware of it. So which kind of security issues am I going to be talking about? So it's going to be mainly related to wrong use of memory. So using an initialized memory or using memory that was already freed. This means the application will either crash because you're doing something with memory that's not yours or that wasn't anymore or whatever, or that will behave unexpectedly because your variable is not initialized and then things happen that you don't expect them to happen. People that are more experienced than me in security say that any memory-related crash can be turned into a code execution exploit. I've never seen this happening. I don't have a clue how this happens, but that's one of the mantras there is in the security community. If you're crashing because you're using an initialized memory, that means someone can put code there and make it do things they want and not what you want. So it's very important to fix this kind of bugs because it can basically be code execution and we don't want to be doing code execution. So which tools do we have to dictate that you're doing either wrong memory use or using your memory after it's been freed or whatever? So the first and very basic tool is the operating system. Your application might crash if you're using the memory wrong. It doesn't happen all the time because the operating system and the C library and whatever, it's sometimes a bit too lazy doing things and it will give you more memory than you need and then if you use that memory that is not really yours but the operating system gave it to you because whatever, it will not crash. But yeah, so yeah. Then we have Balgrind, right? So Balgrind will help us find these kind of memory errors. The problem with Balgrind is that it will take 10 days to run your application if it's a bit complicated, right? I've tried that with Ocular and PDF parsing sometimes and I got bored after a day of having it running on my computer, right? For some things that are very long, it's just not feasible to do. And then one of the modern tools which is similar to but Balgrind does but it's just so much faster is the compiler sanitizers, right? So that's something that as far as I understand Google has been working on and there's the other sanitizer, the memory sanitizer and the undefined behavior sanitizer and basically what they do is they add code to your application that checks for all the things that can go wrong but since it's inside your code it's much faster than Balgrind. Balgrind pretends to be a virtual CPU kind of thing so everything is so much slower because things don't, they are intercepted by Balgrind, right? So that's one of the things that makes the memory sanitizer like the compiler sanitizer, sorry, much faster. It's also a bit harder to use because you actually have to link to them and you need to know how to do that, right? Balgrind can't just run your binary. But let's assume you know how to do that. So what is fuzzing? It's a technique in which you basically send garbage to your application and try to make it crash or not crash, right? So that way you make sure that whatever the input is your application will not crash. So one could do fuzzing by hand, right? You could just start with PDF info, for example, which is a binary, if you give it a PDF it will tell you the title and the outer and whatnot and you could just starting echo A, echo B, echo C and D and E and just like try all the possible inputs, right? That this is very basic, nobody does that but it's one of the ways you could do it, right? It's just like do a for loop forever and just send it random shit and see if it crashes. So what is osfuz? Osfuz is the main topic of this talk. Osfuz is a fuzzing engine developed by Google. Actually the fuzzing engine is not called osfuz, it's called the Vuzer, but osfuz is like the bigger term that you're using everywhere, so I'm using that too. So basically it's a very, very smart fuzzing tool, right? It's coverage based, which means it knows, it understands your code and will minimize the randomness of fuzzing, right? So if you have a function where there's an integer that has an X and then you check for X being bigger or smaller than 50, it will not try 51, 52, 53, right? Because it knows that 51, 52, and 53 it's just, it will execute the same code, so try all those, right? So it is really, really advanced. I don't know how they actually do that, because when I think how to do it, I don't have no idea how good I do it, but they do and it works. I mean, it's really amazing. So what is osfuz number two? So the problem with, for example, the sanitizers and clang and whatever is that you really need the latest version because they are constantly improving. So sometimes if you want to run this in your distribution it will either not work or it will be hard to set up or stuff like that, right? So they basically have a set of Docker images which are updated to the latest everything, right? So this way you just run, download the Docker images, run a command and it will start fuzzing the project and it's like almost effortless, right? It's like, so at this point there are like 240 projects in the osfuz GitHub repo if we can see them here. So yeah, there's like lots of things, right? Like there's VC, there's ClamApp, the Antivirus thingy, there's Cool, David, which is like the new video stuff. There's FFMP, MPH, there's File, there's Firefox, FreeType, GoScript, like lots of lots of projects are in osfuz. What is osfuz number three? It's also a software as a service thing, right? So they have, Google has lots of servers, right? So they basically give you a thing that will run everything for you, right? So it will get the Docker images that you created, run them for a while, find a bug, tell you you found a bug, right? They are very strict on the bug policy. So when it finds a bug, it will send you an email saying I found a bug, this is the bug trace and the file you can use to reproduce it and whatnot. You have 30 days to fix it. If you don't fix it in 30 days, they will make the issue public, right? Some people say that's very extreme, but I mean, it's a way to force people to just get their ass and fix things, right? The good thing about this is that all these software needed to run this thing is free software. So if you really wanted to do that in your project because you're crazy, you could do it, you will need lots of processing power, right? I mean, like doing fuzzing basically means running software for a long time. So I mean, if Google is doing it for free, why would you not let them do it, right? And that's what's fuzze number four, right? So basically that's an image they have on their web. So how it works is that you write a fuzzer, we'll see some example fuzzers I've written for KD staff recently. It's not very hard, at least the ones I wrote, I guess you could go into more detail. You write a fuzzer, you commit, it builds in Jenkins, blah, blah, blah, it finds bugs there and it tells you, right? So it's basically that. It's like they run the fuzzer itself, they put it on, well, lots of people, you're all late. You put it on the web, run for a while, tell you and then you have 30 days to fix it, right? So what do we have in KD and OSS Fuzz, right? We are actually running OSS Fuzz for K image formats since January, for K codecs since February and K archives since April. Kind of health related to KD, we are also fuzzing Poplar, which is the PDF library we use since May last year and LibEcal, which is for the calendaring staff since April. What all these have in common is that they will be probably used without you wanting to use them, right? The typical example is K image formats, right? Somebody will send you an email with an image and it will run code because you will preview the image in K mail and if you have a bug there, it will crash K mail just because somebody sends you an image with a malformed, an email with a malformed image, right? So the idea here is that we have to be very, very cautious about things that run without the user even pressing any button, right? So I'm gonna show you now how the K archive, oh, that's something in Spanish there. I give this tag in Spanish first, so there's like one line in Spanish, whatever. Do you see this? It's big enough, right? I guess. Right, so this is what we have in the OSS Fuzz GitHub for K archive. It's very easy. Let me open the few files. Right, so first the Docker file is relatively easy. We have to check out everything we need, right? So we get libzip, sorry, zeta lib, libzip, bzip, exit lib and then we get Qt, ECM and K archive itself, right? That's the things we need to be able to fast K archive itself. Then we build it, right? It's not very hard. You have to build everything, but it's like configure make, configure make, configure make, configure make. Qt is a bit harder, yeah, Qt doesn't like, so I do some shit there from Qt, but ignore Qt for K archive with the cmake make and that's it, right? It's not the rocket science and then the fuzzer itself. So how it works, if you have to write a function whose input is basically a byte array, right? So it's a car pointer and a size. So what we do there is basically I create all the possible archive files that character could be, right? So seven zip, tar, tar with gzip, tar with bzip, tar with exit, zip and r and I just run all those codes, right? And these found quite a few things, right? If we go to the K archive log, K archive log, right? So you see my name is here, don't assert, don't assert, don't assert, don't crash, blah, blah, blah, right? So it found a few things, yeah, more here, yeah. I need to analyze memory leak in bad memory, right? So K archive, which I'm sure it's something that people have looked at the code very closely and it was written carefully, still had a few bugs, right? One of the random bugs we found, which is kind of interesting, let me find here, is K archive and very long file, right? So K archive has a recursive function in which it will try to, if you give it a path, it will try to find to which folder it belongs, right? So it will go up, up, up, up, trying to find the folder it belongs. If you get a path, which is longer than for like 5,000 characters and basically being ABA slash A slash A slash, right? So it's like very short name directory names and just one inside the other, this thing will recourse too much, it will end up exhausting your stack, right? So basically the stack will just grow too much and it will crash. So we did a very poor man solution which is just don't recourse too much, which in this case, it should work because I found out that only, like even with a small stack, it will crash after 1,500 recursions, which is like a very long path. I think like Linux actually doesn't let you have a path that long between like the path max is 4,000 characters. So even if you do like one directories of one character, you still need another one for the slash. So that should be fine, hopefully. Otherwise somebody has to rewrite this to not be recursive, right? Which is a pain in the ass. Let's not do that. So yeah, so that was K Archive. K image formats is basically the same. If you see the fuzzer, yep, Lou, it's the same thing. You get the data, you create an image, you read it, right? It's nothing very complicated. So, yeah, sorry, yeah. We've found lots of bugs in our image formats, which are like the very random thing this we have. So if you look at our image formats, we have readers for things like Brass and RGB and TGA, and that's the GIMP thingy, but we only support the very old GIMP format. So yeah, it's not very useful, right? But if you look at the log, yep, I did fix lots of bugs here, right? It goes after the first page, and then I think a bit more. Yeah, so basically we were vulnerable to people sending us random images and crushing everything, right? Actually, that started because somebody sent an email to security at K.org saying, I have this image that crashes everything, and then I started running OSSFuz, and it was like, yep, there's a few more images that crush everything. What's even more interesting is that it found issues in the PNG handler, right? So what we're fuzzing there for our, we're only fuzzing our own code, right? So we're only sending, like the inputs we have is a GIMP file or an RGB file or whatever, but since it's coverage based, and it also saw that it could end up in the PNG, it was able to morph the file enough, so it ended up inside the PNG file handler inside Qt itself, right? I found that in lots of cases, this was being used in an unutilized manner, which is bad, right? But that shows how powerful the thing is that you start giving it a GIMP file, and it will morph the GIMP file to be something the PNG handler half of understands and then fails reading it, right? So it is really, really, really very powerful. So future work, we should fast more things, right? We have more things in KDE that are run automatically, right? Baloo is one of the, like you put a file in your file system and Baloo will go there and like will do all random shit on it, right? That has to be made sure it won't crash, right? Because yeah, that's a good, same thing for K file metadata. More things PIM related, like I'm pretty sure that when you get an email, lots of things happen inside PIM that we should make sure are not crashing. Yeah, the problem is somebody needs to work on that, right? It's not very difficult as I showed you, but only, like it's very easy to do if your input is a byte array, right? If your input is a byte array, that's easy. Like you can plug it in the function and it will be trivial. If your input is something else, well, there's more work that needs to be done, right? Yep, and that's my talk. We have some time for questions, I think. Baloo has a question here. Thank you. So considering that is not that heavy running this fuzzer and you need to run in a way, it's not possible to actually add in this unit tests. So at the point that every function that it has and it passed the fuzzer, so it already having the incompletely built system and can guarantee that all the frameworks by default been tested. Sorry, I didn't get your question. Yeah, so it's not possible to put in this unit tests by default. Well, it is, I mean, it needs to run for hours sometimes to find a bug, right? So like you can't really, I mean, you could add unit tests for every single crash you found. That's something we could do. That's something David wanted me to do for a K archive and I didn't do it because there's like 20 or 30 files that make it crash, right? So it's like, and this thing just still runs every day. So if you regress, it will find it, right? I mean, it will take a while. It will not be part of a unit test. We could put it as unit type. We could put unit tests for all the cases we found. So to make sure we don't regress, we haven't been doing that for now, but I mean, David wanted to do that so I could be convinced otherwise. Any more questions? Okay. Ooh, almost. I was going to say thank you, but I waited half a minute, half a second more. Yeah, thanks for that. Unfortunately, I missed the beginning of your talk. Do I understand correctly that you add this code to some Google repository and they run the stuff, right? So all you have to do is provide the code. You only run the code, you upload it. They will run it every day for a bit and they keep the state, right? So like tomorrow they don't test the same, they just continue. So yeah. And you get notifications whenever they find something. Yeah, you get an email. You get emails or something. Because it's related to that question. How do we ensure no regressions in the long run? So I'm wondering if we can treat it as CI as in two years. If we reintroduce the bug, they will find it through the magic of randomness, right? They have to somehow try the same file again. Right, right, right. So there is, so the public bugs, you can see them. There's a bug tracker somewhere. I don't have a link here, anyway. There's like the, so when it finds a bug, it will give you 30 days to fix it. And if you don't fix it, it will be public. So the public bugs, you can go to a webpage. It's a bug tracker and it's like, everything, this is everything public. Okay. And the ones that you fix, I think, are also public. So one of the things we could do is go here, make sure there is no, like as a part of a release process, go here, make sure there's nothing new. We can also add more people to the emails that I received, like at the moment only, I received the emails. Problem, the only problem, it needs to be a Google email. It's a bit, for that. I like the idea of having to check a webpage before releasing. I do check already, CI and a few other things. I could check the fuzzing thing, that sounds good. You'll send me links and look into it. There was also a question back there. Do you have some sort of stats, some kind of hall of fame or hall of shame? How do KDE libraries compare to other open source projects in terms of number of issues or like code coverage, things like that? You can see the code coverage, but no, actually you can. So you can only see the code coverage for your own project. So I can see the code coverage for K much format, right? But I can't see it for Firefox. So they don't have this kind of gamification thing in which you can do. I don't see it. It must be somewhere else. I don't know where it is. Right, maybe they do have it internally, but I'm not sure if that's public. All right, if it is, I will be interested to know where it is. I don't know where it is. Question here. Hi, so I came late to the talk, so maybe you already talked about it, but the fuzzing, is it just a stream of random bytes that come in? No, so it is, it's a very smart thingy and it's basically coverage based, so it gives it a random number, right? And then it flips one bit, and then it tries to see which variable changed because you flipped that bit, and then if you have something like this, right? And also it knows that the only interesting values for X are 49 and 50, right? Because it's the two values that will run the both branches. So it's not, I mean it's random at the beginning, but then it learns about which bit influences each variable and it will not be random. I mean, it's half random, it's smart, let's put it this way. And can you also kind of define what kind of inputs come in? So for example, if you have a HTML pass so you don't want random bytes coming in, because- Right, but you do want around- Yes, there is a way to give it that dictionary of kind of like, those are the keywords that you should be flipping or working on, right? Like, you can give him some direction. I really haven't done that, but I know you can kind of instruct him what to focus on. Okay, thank you. More questions? Well, okay, thank you, Albert. Yep, thank you.