 Ja, tudi se vseh različim, da je vseh vseh vseh različim, ne vseh različim, vseh različim, in vseh testujem, tako, da se vseh različim. Tako, tudi, koncert je vsojnjena, vsojnjena, in vseh, kako jaz vseh zelo vseh zelo vseh zelo vseh zelo. So, C, C++, portrait or whatever, you have all those memory issues, right? You read too much, you write to the wrong place, you free the memory and then use the memory and so on. In principle, the same rules would apply to any program in any language. We could, for example, feed randomized input to our Python program to see if it doesn't fall into an infinite loop, but it would be much less useful because all those bugs that come from lack of safety wouldn't be there and programming is much easier in those languages. So, this talk is about fuzzing of essentially CRC++ programs. And if we work in this area, we have to have a whole set of tools. I mean, we ramp up the diagnostics in the compiler. We tell it to turn on the all possible options, all possible warnings. We run our program under a bargrind. We do cover it, we do a GTM. It's another static tracker that can be hooked up to in github. Sometimes it provides very nice results and so on. And then we write tests and we put asserts in our code. And all those things are tools that you just need to use. And if you don't use them, you let some bugs that could be easily caught, not be caught and maybe be found by your users. And so fuzzing is yet another tool in this whole suite. And it also builds on those other tools, because fuzzing is just generation of input, so it's not on its own, it doesn't do much. So, I think it's a bit surprising if you haven't done it, but fuzzing tends to find different bugs, because while random input is very much different from normal input and then you just find different issues. So, the concept has been around for a long time and people have been doing this for years and years, but there have been two technical developments which made it much more mainstream. So, first is that we can instrument the code and know exactly which branches in the code are taken and we can take this data, so we take an input sample, we run the code, we look which paths through the code were taken and then we try to generate a sample that takes different paths and whenever we hit a different path, we know this based on the coverage feedback. So, we are not blind, we can see where we are going in the code. And the second thing is that people started using genetic algorithms to actually take this feedback data and try to generate inputs which would hit more new places in the code. So, I mean, you can think about this as a black box optimization problem where you have a, well, you get generate some input. The input space is very large because you have, let's say, a kilobyte of data that any bit and any byte can be set to an arbitrary value in there. And then you run a program and you get some number that specifies what the coverage was. And you flip one bit and then the coverage changes completely. So, it's an optimization in a very non-linear space where the fitness function for neighboring points can be completely different. So, in this kind of scenario genetic fitting algorithms are generally the answer. And this allows us to find bugs in this enormous space of parameters much faster than we would otherwise. And the kind of a third non-technical development is that people have made it really easy to use or relatively easy to use and it's just nice. So, fuzzing is about just generating randomized inputs in the program. And I said that it's randomized but not random. So, I want to underline the fact that if you just took input from devurandom and fed it directly to the program this wouldn't be very useful. So, many years ago, ten years ago or something like that people started fuzzing system calls. And so, let's take a map. And it has a bunch of parameters as the protocol that specifies how the mapping should be done. And it's an integer so it has four billion possible values. All of those values, only two or three are meaningful. And for any other one the program of the function being tested will immediately return a result that the input parameters are invalid. And this is just not interesting. But so, if we feed random data to the function we will only pass this one parameter. I mean, useful value for this parameter once in every three out of, let's say, four billion times. So, we will spend a lot of time doing nothing interesting. But once we have the feedback coverage we learn that, well, without knowing it up to hurry we learn that there are a few useful values and once we hit them, we keep them and then we build solutions, test samples on top of that. So, how does it work? We have a very simple entry point function that takes a blob of data, so an array of bytes with some length and runs the code to be tested. This interface is very, very simple and this is nice because it allows us to completely separate the stuff being tested from the whole testing framework because we can do this kind of function in any program for any library being tested and then, of course, once we have a fuzzing engine that gives us inputs, we can attach it to any of those programs being tested and it's a very nice split. So, as part of the code being tested we write a function like this and we consume it in two ways. The first one, the straightforward one is that we just write a simple main function that takes a file, reads the file into memory and runs this function. So, it's just a way to test our code once we have a single sample. So, we compile this, run it and then, well, maybe it crashes and maybe it doesn't crash. If it doesn't crash then we go back to the beginning. We haven't found the bug, but if it crashes then, of course, we have something to fix. And this, if we found the crash input then, of course, it's interesting and we save it to the file. So, before I said that this fuzzing test builds on top of the other tools. So, when we compile and run our program if we just do a straightforward compile and run it, it won't be very useful because, well, the problem with memory bugs in C is that you can scribble some memory and your program can crash, for example, a week later. It's not so easy to actually always have an invalid program crash. So, we compile the program with all the memory sanitizers, leak checkers, and so on and try to make the program crash as fast as possible if it actually does something wrong. And, of course, we turn on all the asserts and the bug checks and so on, so on. So, that's one way. And the other way is that we take the same function but we link it to a fuzzing engine. So, it's something that we'll call the function over and over with different inputs until it finds one that actually causes a crash. So, yeah. I mean, we want to apply, well, whatever we can to find the bugs to make the program crash as fast as possible and what kind of bugs do we find? I mean, this is... I think that the first ones, the ones at the top are pretty obvious, right? I mean, if we write past the end of the buffer and our program is compiled with address sanitizer, this will be called because there is a guard inserted and so on. Invalid freeze, unaligned access and so on. And then hopefully we can also find more subtle bugs like buffer over reads and use after free. But also memory leaks. And what it wasn't... At least for me it wasn't obvious and I wasn't expecting this, is that fuzzing tends to find either hangs or very slow runtimes in different parts of the program. So, for example, in system D we have this case where a contact file was being parsed and the location of the starts of the different headers in the sections, section headers in the file, were put in a list. I mean, this code was completely fine. There was nothing wrong with it, but once you had a thousand sections, the access to this list became quadratic and if you had a million, it was essentially an infinite loop. And a human would never write a conflict file with a thousand section headers, most likely, very unlikely, and this is exactly the kind of thing that randomized input gives you. And it's not so unlikely to happen, because, for example, we do automatic configuration with Ansible and maybe somebody writes the generator in a way that actually creates a file like this and the... I mean, occasional people would have a conflict file with a hundred section headers and the program would run a bit slower, but not enough for somebody to notice, but using fuzzing, we kind of nicely discovered this. Right. So there's a few different ways in which the coverage feedback can be done and the one... I'm going to be talking about deep fuzzer and it uses a sanitizer plug-in to annotate the code. So... there's two... two ways that we can run. I mean, once we have the... our entry point function, we can either run it as a single program invocation and then generate a new set of data and start the program again and again and again. And this works okay, and it is very... I mean, it is the most general, but it takes... it has the overhead of starting the program over and over. And... well, a slightly better way to do it is to do it in process. So... but for this to be meaningful, we need to have no state left, right? We need to have... well, no threads, no global variables, nothing that... I mean, we call the function once and then we call it again and then we want to start from... a clean slate. And... I wrote that 100 milliseconds is not bad, but we probably want to do thousands of samples per second, so I think we want to go below a millisecond for a single run. And there is a number of fuzzing engines, so engines which generate our inputs. Lipfuzzer and AFL are the most... are the two most popular ones. And... they both do coverage-guided input generation using genetic algorithms. There is also Hong Fuzz, which is a new thing. I think it is kind of the same idea. O3 have the same ideas and provide the same functionality just in slightly different implementations. There is another one. There is Radamsa. It's a tool that outgrew out of a protocol testing... network protocol testing project and it tries to... it doesn't use coverage feedback. It just tries to analyze your input sample using some syntax analysis and then generate another one that is... similar syntax, but... well, mutated. And... it generates surprisingly interesting input samples. Okay. So we have that and now we need to... well, we actually need to do the fuzzing, so we need to apply CPU power to this and there's a very nice project that provides this. It's a Google-funded thing that grew out of Chromium testing. So, essentially, the idea is that you have an open source project, you hook it up with OSS Fuzz, you provide a number of fuzzers and they will run the fuzzers for you. So they... a number of fuzzers, a number of configured sanitizers and they run it in a cluster. Report bugs with kind of like responsible disclosure, so they are... only accessible to project members initially and then after they are either fixed or sometimes they are opened up to the public and it kind of works automatically nice in this way that they will clone your repo and if a fix is pushed to the repo, they will close the bugs automatically and so on. So, in the project we have to have the fuzzers and a way to build them, I mean, essentially a build target and on the OSS Fuzz side, we define a Docker file to prepare a build environment and build our program, a script to build it and if we have the target, we call the build target and some small metadata. So, let's do an example. I mean, I said it's in ksync, but this part is actually pretty generic. So, the entry point function was called doIt in my previous slide now it's called llmvfuzzer test one input, but essentially it's the same thing, right? So, it's a blob of data with some size and does whatever the code needs to do and returns zero if everything is okay. So, a practical consideration is that in this case I want to test a compression function so the function takes two parameters, a blob of data and a compression algorithm specified as an integer and I want to have both of those things in my single blob of data that is given from the fuzzing engine and somehow I need to encode those two things. So, I treat the input data as a header and then use the first byte as the algorithm and the data as the remaining data but also I put a few bytes of reserved space. So, the idea is that as the program as the fuzzing progresses and we find crushers, we save them to files, but as we try in the future we might need to add more parameters and then we want to keep the data stable. So, this is a way to have the input samples preserved and usable in the future even as we add more parameters and mutate our this function I mean this part here. Right? Oh, I wanted to add the if we if we are running under a fuzzing engine we don't want to output any logs or anything like that we want to run as fast as possible so there is this I mean something like this is there to make it possible to turn on the back logging because it is useful when you are actually debugging a crash but otherwise you want it to be completely disabled and so the actual testing we get the data we split it up and then essentially the interesting part is that we call a decode function on the buffer and either it crashes or it doesn't crash okay, so this is this is the actual testing part so I said that there is also a docker file and this is the ugly part because building each iteration of the code takes I don't know why it likes to download so much stuff but it's like a 100 megabytes per iteration but you know it's relatively simple stuff I mean you copy some different project adjust it a bit and then you're done so how does it work we have a helper to call docker for us build image, build the fuzers and then run the fuzer and wait for it to crash so so this is the this part installs the all these dependencies were specified this part essentially calls our build target to build the fuzers and well and then this runs the binary so this is an example output it well it's it is running with lead fuzer and it is using the address sanitizer and there's some limit on memory and the maximum time for it to run and so on and it just feeds samples on the screen it gets up to 2 million samples so it's pretty boring but at some point it crashes and this is in white and black but the nice thing is that if we this is actually pretty colorful output from address sanitizer that includes information about what was the bad memory access that we did and so on and we have the trace back and the important part is here at the end that we get a file so this is the file that now if we run our send alone fuzer with this file we get a crash and we can start it gdv or whatever or bargrind and actually find out what the problem was and I don't know I put some links in the slides for tomorrow documentation and I know I'm out of time yes so I have time for questions please go ahead 2 or 3 20, 30 so I wanted to say this that if you look at the advertisement page of any of those projects in fuzer and AFL they have a long list of projects in which they fuzz and for pretty much every of those projects they have a list of bugs and another thing is that if you start fuzing you immediately pretty quickly find a few easy bugs and then it quickly falls off so if you don't fuzz and there is somebody who wants to find a bug a hole in a project to do simple fuzing and easily find an issue if you take the preventive step of doing it yourself you actually make it much harder because then you don't need to fuzz for 50 milliseconds you need to fuzz for 5 days and that's a completely different problem more questions? ok