 OK. So my name is David, David Kaczynski, and I'm from Adelogix, a small security company in the UK. And today I will be talking about securing fluent bit by way of fuzzing. And I would like to thank both the fluent bit maintainers, Calypcia, and also the CNCF for making this work possible, which means that they essentially funded this work. So thank you very much for that. And obviously also develop the tools such that we could do this work in the first place. So a quick overview of what I will talk about today is I'm going to first give a brief introduction to what is fuzzing, kind of like from a developer's perspective. And then I'm going to talk about how we fuzzed fluent bit and what the results were that we found, and then also discuss what future work we have. So in brief, how many here has heard about fuzzing before? Oh, one. I think there was two people. Three people. Keeps going. OK, so essentially fuzzing is a way to automate test case generation. And test case generation in this sense means input to kind of like a proxy test that will explore code coverage in many different ways. And the idea here is that fluent bit is written in C, which makes it susceptible to memory corruption issues. And the problem at hand that I will be talking about in this case is how we can use fuzzing to generate test cases for fluent bit and then find bugs in that way. And in short, the solution is to implement a lot of fuzzers and running them continuously with a project called OSSFuzz that I will also get in depth with in this talk. So this talk is about essentially finding bugs in fluent bit using various automatic test case generation approaches. And in this case, fuzzing. So in short, this talk is about fuzzing fluent bit to find bugs continuously with OSSFuzz. And the idea here is not just looking to find bugs for the sake of finding bugs, but the idea is if you keep finding bugs, eventually you will stop finding bugs because you have found all the bugs that are present. So fuzzing introduction. Fuzzing is essentially a way to generalize unit tests in many ways. This is certainly not the whole truth, but I think that this is the way we have done it for fluent bit. And on the left, you have an example of if you were to test API with many different inputs, you would just have a sequential set of statements testing the given API with input 1, input 2, input 3, and so on. And the way you would extrapolate this into a fuzzer is to have a loop that calls into my API. And then instead of using fixed inputs, you will just ask the fuzzer, give me some input. And there is then a lot of underlying technologies that means the input the fuzzer will give you is kind of reason about to come up with in a semi-random way that makes sense in this context. I will get into what makes sense means in the next few slides. And the way it actually looks in code is we have a very simple fuzzer down here in the lower right corner. There's a fuzzing stop and that gives you a buffer of data. And this is the kind of like random data that is given to you by the fuzzer, which you will then have to propagate into the target code that you are analyzing. So here's an example of a fuzzer that has for the, it's a library called JSON-C. And the example fuzzer here essentially explores all of the passing code in this given JSON library. And you can see that the only thing it does is it takes the buffer of on-site-integer-8 buffer, it interprets into a char pointer. So just typecast it and then it creates some type of struct here from the JSON library and then it calls this JSON token parse EX. And that will essentially, after a while, generate data such that all the data passed to this function will explore all this like code in this JSON token parse function. And this is essentially what we are looking to do for Fluentbit. So one of the myths that I would like to debunk is that fuzzing is just random testing and it's not going to work for our project because we have complex data structures and so on. We have just given a talk a few minutes ago about fuzzing a lot of CNCF projects and this is something we often hear from CNCF maintainers that random testing is not gonna work in our example. The truth is that fuzzing did start as random testing. That's how it started 30 years ago. But since then there has been a lot of academic work, a lot of practitioners and so on really trying to optimize this strategy or this technique and modern-day fuzzers are not really random testers anymore. It's more accurate to call them genetic mutational algorithms. So it's essentially ways to, and also here it should say that modern-day fuzzers refer to coverage-guided fuzzers. And there are hundreds of academic papers on how to improve fuzzing in the last decade or so. So just a little heads up if you have this kind of view or have heard that it's random testing. It's a lot more than that as well. And I'm going to try and argue a little bit about this. The way a fuzzer works is that it has this corpus set, we call it, which is just a lot of inputs which corresponds to the buffers that you will get as input to this function, data buffer here. You have a lot of inputs in this corpus set and then the fuzzer will over and over again take a seed from this corpus, mutate it a little bit and the mutation here will have a random element and then it will execute your target program with that given input. The program will then have been built or compiled in a certain way that includes a lot of instrumentation that the fuzzer will use to determine what was essentially the coverage of the program when you executed it with a given input. And the idea is then whenever you execute a program, you track the coverage. If it found coverage that the fuzzer had not previously seen, it takes this mutated input back into the corpus, yeah? And if it had already seen this coverage, it just discards the input. So it keeps going, generating, it has this corpus set, takes an input, mutates it a little bit, executes the program, sees what the coverage is. If it was new, save that input into the corpus set. If it's not, throw it away. And this will drastically reduce the complexity it takes to find inputs that explore the code on the analysis. So say for example, if you have this symbol C program on the right, composed of four if statements, it will check the first four bytes of a given buffer if they're equal to A, B, C, D. And if we just were to do random testing, we essentially had one in two to the 32 chance of guessing this right because we would have to guess four buffers, sorry, four bytes, which is 32 bits. But in code coverage, sorry, using fussing, code coverage guided fussing, sorry, this will be reduced to two lifted to the eight. So we have to guess each byte one at the time, which will be one in two to the 56 chance of guessing each byte times four. So let me show you what this means in practice. So at the first, we start with no seed and we have to just guess the first byte, okay? We have a one out of 256 chance of guessing it right. So let's say we guess it right after 256. Then we save that input, the A, and put it in our corpus. And now we will just start guessing the next byte because we have already advanced through the first if statement here. And again, it will take two to the 56 chance of getting it. And that just keeps going. And eventually we have guessed all four bytes in 1,024 times. And this is really where, so like the coverage guiding element of fussing, which kind of became a thing around 10 years ago, kind of changed the whole way and changed the whole very way of fussing. And there are even examples online where you can generate, so example if you were to fuss an image parser, the parser will start to generate so like valid images of arbitrary looking. And there are like examples of where you just have a corpus of random PNGs generated by coverage guided fussing. So in this sense, this is like why it reduces the complexity of guessing inputs. Okay, so when we are to do, fussing requires a lot of management because we need to save the corpus. We need to like keep track of all the box. We need to make sure whether box are found or reported and all this stuff. And this takes a lot of management to do. So when we integrated fussing into fluent bit, we needed to have some way of doing this at an infrastructure level rather than just running our fuzzers a little bit every Monday or so. We need to have a big infrastructure that will take care of all of this for us. And we just write a few fuzzers and then the infrastructure will take care of the rest. And all of these things, so like running the fuzzers continuously, deduplicating any bugs found, managing running of the fuzzers themselves and also all the other resource management related, we have this tool called OSSFuzz to do this. And OSSFuzz is a service which comes in the form of a GitHub repository that is run by Google. And the only thing you do there is you essentially integrate your project into OSSFuzz by implementing a bunch of fuzzers, writing a simple Docker file and some build scripts to be put in the GitHub repository. And then Google will start running all of your fuzzers continuously over time. So you integrate and then it will just run your fuzzers indefinitely. Whenever the fuzzers find bugs, it will report it to you and they will also deduplicate, remove any sort of false positives and so on, some false positives. Okay, and so Google will take care of building and running the fuzzers, reporting when bugs are found, verifying when fixes are found. So whenever we get a bug in Fluembit, we get a stack trace, reproducing input, and what we will then do is we will fix the bug on the Fluembit side and then Google will also verify for us that our bugs have been found, sorry, our bugs have been fixed. So they simply take care of all of the management and we only have to write the actual code that tests Fluembit and also do the fixes themselves. So in terms of fuzzing Fluembit, the workflow, the whole kind of like procedure that we have had over the last two years approximately, the first step was to integrate Fluembit into OSSFuzz, implement a bunch of fuzzers that hit the Fluembit code, then allow OSSFuzz to run these fuzzers for a while, then we would see bugs start to appear. We would then fix the bugs and then we would simply rinse and repeat, write more fuzzers to explore more code and fix all the bugs that they report. So here's an example of a fuzzer for the Fluembit code. It's essentially all of the fuzzers that we have written are very similar to the unit tests that you will find in the Fluembit repository. And you can see that, so this is the full source code of the code by the way and this will essentially explore all of the code inside of the FLB serve P time function. So we take the data given to us by the fuzzer, which is this argument up here. We then convert this data into two no-terminated strings because it will just be a complete binary blob, the data that we originally get and Fluembit functions use no-terminated strings. So we kind of have to wrap it into data that Fluembit understands. And then it will call into the serve P time function. You can see the link here to the fuzzer. Another example is a fuzzer we have for the JSON parsing logic in Fluembit. And again, it's a very small stop, 10 lines of code where essentially only two of the lines are the important ones. We create a parser and in this case, it's a JSON parser. And then we simply pass the data, data and size here that is given to us by the fuzzer. We simply use that as input to the parsing routines. And this will explore almost all, I believe, of the code in the flb underscore pars underscore do routine. There are a few caveats of some code it won't be exploring. I'll get into that later. But those are the small stops that we write in order to fuzz the Fluembit code. So very simple, very much like unit tests. The only difference is that we don't have specific data here. We kind of have data provided by the fuzzer. So all of this code is available. All of the fuzzers for Fluembit are available in tests slash internal slash fuzzers. And there's also a PDF report in the Fluembit repository, I think from a year ago or so, where we documented a lot of the findings at that stage and the fuzzers that we wrote. And the focus so far in terms of fuzzing Fluembit has been on fuzzing the code in the SRC repository. So the code coverage is visualized achieved by the current fuzzers. I think that there are around 15 fuzzers. And the code coverage we have so far is just about 44% of the code in SRC. And this is, you can see a snippet of some of the code here that is being, some of the functions, sorry, the files that are being targeted. And you can see, for example, this pick one file, let's say the flb underscore parser. We have achieved 91% code coverage in that file, which is achieved by I think three or four fuzzers together. And this essentially just shows that we write few stops of code and it will explore a lot of, so like the underlying code. Now, why do we not get necessarily 100% code coverage on some of these? And this is simply because we can't, if we don't do a bunch of tricks. And there will be a lot of error routines that checks if malloc failed, which it never really does. But for example, that code, we will essentially never explore unless we do some very trickery stuff to make malloc fail eventually. And there's a lot of these kind of cases in Flurembit, a lot of error checking, where we necessarily don't come into the case where a system function should fail. And this is the reason why some of these code, some of these kind of files do not have 100% code coverage because all of the code that is, it's like actually handling on the data and not just the logic of failed system routines. All that code we essentially explore in the, so like these various files, I showed. So what are the results that we have got so far? And the results in this case, so I have shown essentially the code coverage is also a result in itself. But in terms of box that we have found, first and foremost, what box are we finding? And in this case, we will find all the box that sanitizes essentially give us. So sanitizers are these kind of block oracles that will be compiled into a program in this case C and will sort of do heuristics on whether you find buffer overflows, whether you find null dereferences, we will detect that in any way, and also memory leaks and all this kind of stuff. So heap of overflows, stack-based buffer overflows, null dereferences, memory leaks, and integer arithmetics, those are the majority of the box that we find. And a quick note on this before I show the numbers is that not all box are necessarily true box. And what I mean by this is that memory corruption issues are often thought of as major issues. These are exploitable box that we can use to circumvent the application of flowing bit. But because of the way fuzzers work, they will often have some level of over-approximation on the target code, because the data that the fuzzers generate might not be constrained in a way when calling into a given API that fluent bit only, so fluent bit will do some manipulation on its data and may not be able to call into certain APIs with arbitrary input. So in this sense, some of the fuzzers that we have are over-approximations in terms of what fluent bit actually will use that given routine for, which is good from a security perspective because we kind of over-approximate our security analysis, but it might also give a little bit more box than what is actually real when you deploy your fluent bit. Also, I should say here that fuzzers themselves can have box and in the numbers that I'm going to show you, that also includes, like they are also included because they are a little bit difficult to filter out and even the underlying fossing engines can also have box. So the box that we've found so far, security relevant box from this at the end of this slides are a bunch of links and you can sort of like reproduce these results. They include 32 heap buffer overflows and the majority of these overflows will just be off by once that essentially not going to have a major impact or in essence any impact when you deploy a fluent bit. There will be some stack, there are seven stack-based buffer overflows. A bunch of heap double freeze and also use after freeze and then also 22 null dereferences. So this graph shows all the issues that we have found with fluent bit in terms of closed issues and open issues. What I mean by that is when OSSFAS finds an issue, it will put it up on a database, in this case, Monorail and what we have then tracked here is the number of closed issues and issues get closed when they get fixed and number of open issues. So you can see that it often goes such that whenever new open issues happen, closed issues slowly or right after increase and that's because whenever a bug happened, we fix it. And the reason that the red line will keep going up is essentially because closed issues will accumulate whenever a bug is fixed and you can also see that the blue graph here remains fairly low whereas the red one will keep increasing and that's because whenever OSSFAS finds a bug, it will keep running into the same bug over time. So we fix that bug and then it will start to advance further and thus it will find a new bug. So therefore the blue line can somewhat be constant whereas the red line will keep increasing else we fix bugs. Let me give you a few examples of bugs that it finds because some of them are pretty interesting. So in this piece of code, it found an issue and it was an interesting one because I couldn't see it when I just audited the code and can everybody see what the bug is here? More or less. So SNPrintDef will take a format string and copy all that content into a buffer. Now the point is here that we use this val len argument to routine later on this FLB message pack gilf value. We use that as an argument next to its val buffer and val here will hold the content of temp which is at the destination end of the SNPrintDef function, okay? Now the point is here that SNPrintDef, it returns, so upon successful return, these functions return the number of characters printed to this nation. However, it says down here in the second paragraph, if the output was truncated due to this limit, some limit, then the return value is the number of characters which would have been written meaning SNPrintDef can actually return a value that is higher than the buffer holds and it will just return a number, so like a high number and you will assume that's the, if you're not aware that if the value is higher than the size of your buffer, if you're not aware that that's an error case, then you know you will pass out, so like you will use this val len will be a size indicating what was copied but it's actually larger than the buffer itself. And this was a pretty tricky one because it's easy to assume that the value returned is just the values copied and that's all fine. So the essential check point here was val len should be checked for whether it's larger than the size of temp and that was essentially the fix. And this essentially returned in a stack-based buffer overflow because val len would be passed down in the code with a size larger than temp and in the code it would be passed down to, it would indicate the size of the given buffer which was larger than the size of the buffer. Another example of a bug it found was in some of the sign before code which is I think related to some Amazon logic, we had this routine that would compare key value pairs and in this case what actually happened was that values could have the null value and therefore we would have a null point of dereference at the second string compare here. So the fix was really just to check for null values. Those are kind of the issues that it will find and a final one here, what's the bug in this case? The bug is here that, so here we have the logic of it will be sir and dop and well, this is also at null point of dereference because if malloc fails, it returns null. FLB underscore sir and dop will return null and essentially there's no check on that in the code which resulted in a lot of null point of dereferences in the very rare event that malloc would return null but we have implemented some characteristics that actually can force malloc to return null which is good from the perspective of we have gotten to a stage where we need to really force random behavior and forcing malloc to return null in order to find bugs because we have eradicated bugs in all the other parts of the code. So what's future work? The future work is to essentially increase to a lot more code coverage. We're aiming to have 90% code coverage. The process to have 90% code coverage throughout the code by this year and we have to do some little bit exotic techniques to actually reach that which is for example to force malloc to fail and that kind of stuff. And then we have also mainly focused on the code in the slash src folder and the next goal now is also to target a lot of the plugins. And then also come with a little bit more fuzzers that are not just like unit test like but more integration into any type of testing. And then essentially we will continue in the current process of find gaps in the code that are not being hit by the existing fuzzers, develop fuzzers that target that code, fix the bugs that may come up and then just rinse and repeat. So that's it for my talk. If you have any questions, I'll be happy to answer them. Yeah. Thank you. Thank you.