 So welcome to this talk, fussing, finding bugs and vulnerabilities automatically. I'm David, and this is Adam, and we're from a niche company called Adelogix, where we specialize in various forms of advanced software security topics. And the way that we are presenting in this talk is mainly what we have done in collaboration with the CalNative Computing Foundation. And effectively what we will talk about is how we manage to fuss a lot of CNCF projects, bit about what the results were, and that's kind of it. So, Adam, take off. Yeah, so we have been fussing a bunch of the graduated and incubating projects in the CNCF landscape, and including these ones, I believe there are a few missing, for example, Istio, but these projects are all being fussed. And so what we have done is we will go into this later, but we have taken a bunch of the amazing open source capabilities in fussing and brought this into these projects in various ways. And each project has been different in our approach because the projects are so different in nature. But part of this talk is to present that fussing works for so many different types of projects, including these ones here. And that even includes different languages as well. So the projects are different in nature. Some of them are, well, I don't know if any of these are necessarily libraries, maybe Hallam has some, but they are very different in nature, but also different in architecture. So language is different and so on. So again, what is fussing? Why do we fuss? How do we fuss? And then we're going to present a case study focused on Istio. So in a very general terms, fussing is a way to automate test case generation. And that's kind of the origins of it. And what it is in practice as well from a more pragmatic perspective is a way to find bugs in software or a way to ensure that there's no bugs in software. And around 10, so fussing is a technique that was used maybe introduced, sorry, 20 years ago. So but around a decade ago, 10 years ago, maybe a little bit more, there came this improvement in fussing which is coverage-based feedback-driven fussing which essentially relies on instrumenting the target program in a certain way, observing how the target program behaves and kind of driving this automated test case generation based on what you observe in the target program. So fussing is often referred to as throwing a lot of random stuff at a program whereas these days it's more like accurately described as like a genetic mutational algorithm and genetic in this case means that it improves over time. It mutates, it improves and so on based on what it observes. So the way testing is in practice is very closely related to how you write unit tests in essence. So what you see on the left is you have, what you see on the left here in the test column is if you haven't given API called my API, the way you would test it is you would give it just say three different inputs and you would hard-code the inputs as such whereas in the fussing world what you have on the right is you have a continuous loop that will in essence run forever or however much time you allocate to it and you will call this same function my API but instead of giving a fixed input you will ask the fusser to give you some input and the main point here is that what is returned from generate input will change and it will change based on what it observes in the target program and it will change that according to exploring as much code coverage in my API. So the goal is to generate a set of inputs that explore optimally all of the code coverage in your given target. So it's a code coverage exploration technique in that sense and what you see in the lower right corner is actual goal and code of such a fusser. Like it's really that simple at least from writing the software point of view. Fussing is something that is quite integrated into like development infrastructure now and it's not necessarily a third party tool that you download and install and then run it but for example in C it's actually integrated into the Clang compiler. So if you want to see a fuss C and C plus plus code you have all you need in Clang. There's the link here to lip fusser is integrated to Clang and there's the following link will take you to the exact implementation of the fussing within the compiler infrastructure. There are also other fussers for C and C plus plus but this is the tooling so this is an overview of the tooling you would use for each different language and you can see that we have both support for C and C plus plus Golang, Python, Java and Rust and this also encapsulates a lot of essentially the projects in the CNCF landscape perhaps with exception of JavaScript and TypeScript which is relatively new in the fussing world. So if you have projects in any of these languages you should be able to fuss them with the given links that are on this page. So in memory unsafe languages so fussing explores code and the reason you want to explore code is that you want to identify certain corners of the code behaves unexpectedly and what I mean by that is you aim to find bugs and the type of bugs that you find depends a lot on the given language that you are fussing. Traditionally fussing is mainly focused on memory unsafe languages because you want to find memory corruption vulnerabilities. So if you work in C and C plus plus or even in Golang for example where there's some native code Rust also has that and so on or even Python if you have native modules for your Python code you are going to look for memory corruption vulnerabilities and this works in collaboration with sanitizers we call them. Sanitizers are these types of buck oracles such that when a given piece of code is executed you compile the code with the sanitizers support and it will check whether you read out of a given buffer whether you have a buffer overflow. You can see there's a list here you'll check for use after freeze. It will check after double freeze which would sec fault but it will also give you a little bit nicer report. So we have different sanitizers in particular in memory unsafe languages. However in memory safe languages. Yeah, so like David said there is a lot based on which language you're forced in and in memory safe languages we are looking for some overlaps in terms of crashes and panics taking Golang as an example we have out of bounds and out of range issues that can be caught as in recovered but they can also if they are not then they can be issues for projects that have security relevance. Nail the references also can be caught but they can also have security implications. In addition to those language panics we can also fuss for logical issues in a program. This can be done via something like property based fussing where we set up a set of rules logical rules that we kind of set up inside the fusser in the fuss harness and which could for example be that we expect a certain return value from our target API. And if we don't get that then we consider it a bug and then we tell the fusser to report that as a panic for example. Then race conditions are troublesome in the CNCF landscape. The big one C, CVE many years ago was from a race condition if I'm not completely wrong here but that's something that fussing can also find. It's also an area where we can do better and we will get into the future work in fussing but we do catch race conditions with fussing. Then off by once self-explanatory and timeouts can have many root courses. They can be severe and they can have security implications but they can also just be a matter of runtime differences. So if you don't allocate enough resources to your fusser but we have had examples of timeouts being assigned CVs, type confusions as well are issues in memory safe languages but not as big ones as in memory unsafe languages. And just to jump in here, so these are the very specific issues that you run into when you fuss memory safe languages. Fussing of memory safe languages is pretty new maybe like a year or two old or so where you really got into the mainstream and what the security implications of most of these are usually denial of service. So that usually is like the dust like you can usually catch some type of like availability issues in your programs. That's kind of the security implications of that. We'll get to something more where you can find, for example, RCE and that type of stuff using fussing but traditionally as in the last two years mainly looking for denial of service is what you can find with fussing in memory safe languages. Yeah, and so we touched on it briefly that we want to be better at finding bugs with fussing. There are like David said traditionally there are bugs that we don't find with fussing but we find by other means and we want to bring those bug finders or bug detectors into fussing to, now that we have the capabilities that we have really mature fussing engines we want to write new bug detectors to find for example command injections and SQL injections because traditionally a fusser would not catch if you can somehow execute commands in from untrusted input but we can write and we will write bug detectors that detect these things. And they do exist, like we are writing them so there has been instances of where these custom bug detectors have found RCEs. We recently have an issue, a CVE in Golang based on custom bug synthesizers but this is very modern as in just the last few months where this has become a thing in the Golang fussing landscape. Yeah, so David mentioned the last or the two versions ago that were three CVEs in Golang and one of them was found by other data logics with a custom box and ties up with fussing. So it does have promise and it will go further in that direction. Of course quickly disclosure of sensitive information. Traditionally if you somehow disclose sensitive information in your logs or on disk a fusser would not catch that but that's something we want to do. We want to be better at handling files so arbitrary file writes and reads is something we also want to catch. Those are security issues for the cloud native landscape and of course race conditions can also be done better as well. So let's have a short demo of writing a fusser. Say we want to fuss this piece of code here. We have selected this from Kubernetes from the client go part. And let's see here. We have, we are in the director here and the API we want to fuss is this one here which is the same one as in the slides just flown from this morning. And this is an API that takes two strings a name and a text creates a new parser based on the name and then it parses the text. So if we wanted to, for example, fuss this API for panics we want to see if there's any input we can give it to cause a crash. This is what we would write. And it, one second. And go through it real quick. We declare the package, the imports and then we have this standard fuss signature here. We are using the native go fussing engine from which was available from 1.18, well. And that means it's in Golang? Yeah, so in a second you will see we'll run this by way of the go binary itself and which makes it very easy for everyone to fuss locally. If I was, for example, to be contributing to this pass, I would be coding this function here and I wanted to check, have I ruined everything and have I done anything to, that can cause a crash. Then I would write a fusser like this that takes a testing.f and then we run this f.fuss that takes a function here with a testing.t and two and then we ask the fussing engine to give us two strings, one we call name and one we call text. And then we pass that to pass. So this parameter here will create the parser and this is the part that will be passed. Which of these variables come from the fusser? So with the go 1.18 we can get as many, or we can get many parameters of different types. So both of these come from the fussing engine. So we tell the fussing engine, give us two pseudo random text, sorry, strings and then we use those here. So name and text are random stuff provided by the fusser. Yeah, exactly. And if we were to run that, we would use the go binary like this and we would run the fusser like so. And now the fusser is running and it is mutating over the corpus and it really is that easy in Golang. And this is a completely valid fusser. It is something that you can use to test your code for crashes, panics, and so. And we might commit this to Kubernetes later. And this will keep running forever if we don't touch it. You can see that it runs, how much is it, two million executions? Yeah, this part here, 109. 109, 100,000 executions per second which means it calls the parse function 100,000 times per second. And each time name and text will kind of be different to what it previously were. You will have collisions and so on but in general it will just be 100,000 different types of input each iteration. So why do we fuss? We fuss to find bugs and do we actually find bugs? So this graph you see here is the amount of issues opened and closed of the projects that were showed on the first slide with all the projects which means the more issues we have closed corresponds to issues that have been reported and then fixed, okay? So in June, 2022, 1200 issues had been closed. That means the fuses of all of these CNCF projects had reported 1200 issues and they have all been triaged and handled by the developers and fixed and done deal. So that means that alone, the fussing of CNCF projects up until June, 2022 had found more than 1,000 issues that were also fixed. Some of these will be false positives and this is also because fuses essentially if they can be issues in the fuzzer you can also over approximate when you call into an API because it will be completely, it will be all sorts of input that the fuzzer gives you meaning if you don't actually call the API in the right manner, you might break some things and it might be a false positive in a sense. So you kind of have to fuss according to the spec according to the threat model of the given target that you are attacking. This is really important as it's easy to, some APIs may not be so well-defined and therefore it's easy to over approximate. For example, can you give a given application any arbitrary string or does it actually not want to satisfy certain strings and so on? That's the point that there will be a bunch of false positives and this is kind of dependent on who wrote the fuzzers on how the project is itself developed and so on. So this is a bit different from project to project how they like to fuss their project. What is important also, what is important as well to note here is 1200 issues takes a lot of time to triage. So the CNCF projects have put in a lot of investment in terms of time to actually handle fussing. It's a serious effort in terms of time investment. So another reason why we fuss, here are some quotes from some maintainers from some of the important well-used CNCF projects around. The quotes come from a blog post we wrote which is linked at the bottom. And Harvey from Envoy Proxy says the following. Fussing is foundational to Envoy's security and reliability posture. We have realized the benefits via proactive discovery of CBEs and many non-security related improvements. Fussing is not a right once exercise for Envoy. And there are some points to take here which is it's really important for Envoy to have the fussing running. Envoy is written in C++ so there's memory on safety. The second point that I highlight here is they also found a lot of non-security related issues. So you will find reliability issues, not all issues find by the fusser, not all so like sec faults and so on are actually security issues. And the third thing is fussing is not a right once exercise. Envoy has put in hundreds if not perhaps a thousand hours into their fussing architecture. This is really important to keep in mind. And it's a continuous effort. It's not something you do, you know, set up once and then forget about it as the project evolves. It's kind of like in parallel to unit testing and integration testing for that matter. Second quote is from Jan Fischer from ICOCD. Not only did the fussers find quite a few hard to catch and serious box in our code base, we also learned a lot from analyzing and fixing the box, especially at the assumption we make while writing the code are not always correct, even if we think there's a proper unit testing in place. I think the point that I really want to highlight from here is that it also teaches the developers a little bit more about their code. It kind of lets them think differently about it because when you throw any sort of arbitrary random input on your application, weird things can happen. And as he says, a lot of the assumptions that you may have are not necessarily true. There's more quotes in the block which are quite interesting from a developer's perspective. Okay, so in terms of how we set up forcing for all the CNCF projects that we showed in the second slide, we start by writing a bunch of forcers for the project, a bunch of tests, and in the case study later, we give an approximation as an example, but it varies. But the approach is write a bunch of forcers and run them locally to see if any immediate things come up. After that, we merge these forcers into OSSFos and we also build integration for these projects. So OSSFos is a project run by Google that will run all the forcers for critical software projects, open source software projects continuously. And some of the forcers will run for hundreds or thousands of hours. And it's something that all the, we want all the CNCF projects to do. We want them all to be integrated into OSSFos. And then we let them run, we let the forcers run. Anvay as an example has been running the forcers. Anvay was one of the first CNCF projects that integrated, they have been running now for two years, I think. Five years or so. Five years. And the same goes for Kubernetes, R goes well. They are running for years, really. And there are cases where bugs come up after six months of running, after 20 billion executions a bug is found. And of course, that takes a lot of infrastructure and CPU power and the OSSFos project offers that. And whenever a bug is found by OSSFos, the maintainers get notified and with a bug report and a stack trace. So in terms of getting started, maybe you saw we were down at the project pavilion with the CNCF forcing booth. And we get together with the CNCF projects in a meeting, we usually catch the maintainers in a community meeting and talk about how to do this. How do we approach what we said in the last slide? How do we approach writing a bunch of forcers and integrating the project into OSSFos? There are different opinions from project to project, different ways of doing this. It's usually not a big issue in terms of getting it done. Usually it's something related to a release coming up and it might be too much to add a new thing to the project. And then yes, after that we do all the stuff, write the forcers and integrate into OSSFos. So let's do a case study about Istio that we did around a year ago. I think actually a year ago we were writing forcers. So I assume everyone knows Istio. If not, it's a service mesh under the CNCF that is widely used. And what we did was we over two, three, four months wrote around 60 forcers and integrated those into OSSFos. And just full disclaimer, like Istio maintains really, really quality code. Just full disclaimer that when we go into this case study. So we wrote 60 forcers random on OSSFos and over the course of these four months there were almost 300, all the forcers combined ran for almost 300 billion times and for a total of almost 60,000 hours. And of course that takes a tremendous amount of CPU resources. And this is why you should integrate to OSSFos rather than running them locally on your machine for an hour every day or so. OSSFos will throw a lot of CPU on it. Right. So one of the findings, we found a bunch of issues. I think we have a blog post on this on our website. I think around 40 crashes that were mostly reliability related. But one vulnerability was found, this one unauthenticated control plane denial of service attack. And it was assigned CV of severity high. And it was found by one of the forcers that we wrote. And in this, the issue was found in this API here extract JWT out that takes a string, splits that string by dots and we then need to have end up with a slice of three different strings. And if not, we return. Then we assume that then we want the payload, which is the second item of the slice. And we decode that into this payload bytes parameter. Then we create a JWT payload strut here. Call structured payload. And we pass the bytes into that strut here. If that fails, we return an error. And then finally, we return the structured payload dot out item. And the CV was in here. And if you can see it, and hopefully you have read the blog post that we wrote about it. But if not, it's a really interesting case. And in fact, the Istio maintainers found this, the same mistake done in a bunch of other high profile projects. So the issue was that if this payload bytes, payload bytes ends up being the abides, by slice of N-U-L-L, then- Still a string, right? Yeah, yeah, so by slice here. So if we have the by slice N-U-L-L and pass that to Jason on Marshall, and we pass a double pointer, if you see here, we pass a double pointer here, then the structured payload will end up being a nil value. That's a feature of the Golang, and it was reported to Golang, but that's a feature and not a bug. So in fact, here, Jason dot on Marshall will not return an error. It will just create a nil value here. So here, down here, we end with nil dot out. And obviously that results in a nil de-reference. And that was the CV assigned 7.5. And the fix, don't pass a double pointer, pass a single pointer. I think the main point of that is the Istio team had no idea that passing a specific type of byte slice could lead to that kind of very anomalous behavior. And this is where the passing really came in and tried all forms of byte slices. When it tried the nil or the in-U-L-L, the Istio hit. But Istio would never have identified this themselves by writing unit tests or whatever. Right, so I think this code was one year old, so it had been sitting for one year. And again, I mean the Istio team was like they maintain really, really quality code. And luckily, this was the most severe issue. So the next thing, when you have started fuzzing your project, you have developed a lot of fusses. The question is really, have you done enough? This is not so trivial to assess. You can use code coverage as the main aspect of it. But even code coverage can lie, because you can also reach different states of a given piece of code, depending on which entry point you hit. So we have this tool, Fossil Interceptor, which comes from OSSF. OSSF is Open Source Security Foundation, I think. And we're just listing it here because it will tell you a lot about the threat model of your project, how to attack it, where your complex code is, where the entry points is everything statically reached. You might have a Fossil that statically reaches something as in if you do static analysis, like technically your Fossil should reach an API, but it might be blocked dynamically because of some configurations or so. So it can overlap the dynamic analysis element of fuzzing with static program analysis. Looking ahead, well, more projects need fuzzing. So if you are involved in a CNCF project, visit github.com slash CNCF slash CNCF fuzzing and we'll write an issue. You should write an issue that you'd like the project to be fuzzed and will come and help you. There's also a lot of work in terms of maintaining the existing projects. Also reach out on CNCF fuzzing repository if you're interested in getting involved. Improved tool support is also one of the major ones, such as the one I just mentioned, Fuzzing Transpector. And finally also improving the ability to identify security issues in memory safe languages is a really high priority for a lot of organizations because, well, you want to capture these types of command injections and so on using these various buck oracles. One of the main points you should also take from this, the fact that we are improving buck oracles mean that all the fuzzers that are written now, whenever we push a new buck oracle to, for example, OSSFuzz, all the existing fuzzers on OSSFuzz will benefit from that buck oracle. So even if you choose to invest in fuzzing now in your open source project, you'll get a lot of rewards from a large community that are improving fuzzing. So even though it might not, you will find some denial of service issues now, but the fuzzers themselves might actually find a lot more in a few months because we are doing a lot of work from the back end. If you are CNCF project, reach out to us. And we would like to acknowledge a few organizations here. First of all, the maintainers of the various projects for collaborating, the CNCF as well for sponsoring this work, and then also the open source technology improvement fund, which also helps sponsor some of the fuzzing work that is going around. That's it from our side. Just leave it at me if you have any questions. Thanks for the talk. I just wanted to know what you thought of, how do you keep the types in check? So your example you gave today of the parse, like it's just two tech strings or two strings. What do you do when the fuzzy generator could generate a string that's a million bytes and then your function could actually check for that and say if it's over a certain amount, we return error. So all your functions end up having to do lots of bounds checking because the fuzzy gives absurd values that necessarily wouldn't be kind of a normal case. So you could do that and you could argue well the code's not right, the code can be improved versus keeping it in check, does that make sense? Yeah, I mean in your given example, if you have something that assumes a string of maximum X size and you give it a string and if you don't put that constraint in your fuzzer, so where is the issue? The fuzzy will give a false positive in a sense. Did the API document it? If not, the API has perhaps a documentation error and stuff like that. So now it comes to debate, where's the bug? And I mean this can both be in the fuzzer, it can be in the description of it, can also be a lack of check in the function and so on. So this is the case where I was referring to some of them are false positives, some of them are not. And usually in the situation you describe, you will talk to the developers. It depends so much on what their view is basically. So this is almost a political issue in a sense. But in terms of if we say, if we remove the political aspect in that sense and then say your fuzzers should also take a lot of, they should take into consideration the code that it's testing. So in a sense you should put the constraint up. You should constrain, you would say, what you would do in this case, you would say if the input given by the fuzzer is, your fuzzer code itself, you'd say the input given by the fuzzing engine, if that's longer than x, just return don't call into the API type of thing. So you know some fuzzers will be like hundreds of lines, 600 lines of code, just to prepare this random input the fuzzing engine gives you, do a lot of stuff on it until it calls into the target. They can get very complex like the fuzz tests themself. And that's acceptable kind of, because that's just one function. You would have to do that throughout potentially your whole code base, right? Could you clarify a little bit what you mean here? Well if you're dealing with strings that are super long, you might have to do that throughout, you have to write a library or something that says handle these long strings, because those 600 lines you're talking about, that long bit, the next function needs that same thing, like oh I need to check if that string is really big, and that one needs to check in the sort of, Yeah, I mean it is a lot of writing the fuzzers. You must study the code that you're attacking in a sense. What you would also, what you also often try to do is fuzz the functions that are very high level in that these function reach all the rest of the code, and if you just satisfy the spec of that high level function that reach the rest of your library, then I mean you should be good. So you're trying to identify those large, like high level functions, and then manage, ensure that what you're giving it is what you actually should. What I also mean by that, so for example, you could also fuzz stir length in C, and you could give it just an arbitrary, if you just give it an arbitrary piece of memory, you're gonna find a lot of bugs, and that's because it actually expects an all terminated string. So for example, if you fuzz that function, you should ensure that it's a null terminated string, so what you would do is you would take the input by the fuzzer, you would add a null byte at the end, and then you'd pass it in. So you do a lot of that stuff. You do a lot, a lot of that stuff. We even have some, we have some libraries for how to do this in Golang. If for example, you were to fire fuzz inputs, sorry, functions that accept structs as input, how do you kind of take the raw byte given by the fuzz engine and convert it into a large data structure that is essentially random before you then pass it into your API, to your target, and Adam has a library for, for example, converting random bytes into a nice structure that you can then pass into your library. Great, thanks. A quick follow up, but you can go on others. How do you, your fuzz patterns or your scripts in sync with the code, you know? Yeah. What's the challenges there? The challenge there is, it takes effort. You know, how do you keep, you should think of it similar to how do you keep your tests in sync? And the answer is more or less the same. The challenges that there are from the fuzzing perspectives is that less people know about fuzzing. It's a little bit counterintuitive, like it's not as intuitive as testing. So usually it's a mix of varies from project to project. Sometimes we will do it for the project. If they don't have the resource available, sometimes they will do it. Sometimes no one will do it. And it will, it's not working, you know? So, and then think it can get very bad in the sense that it's not working. Because it's not working, it will start to throw a lot of issues because it's doing things wrong and stuff like that. But for some projects it's difficult even to maintain that the fuzzer's still built. Cause I mean, it is an effort and stuff like that. So. All right. There's one over here. But. That is actually time. So I would request to continue the discussions. And thank you everyone for attending. Thank you for the speakers. Please come up and ask him in case. Yeah, exactly. So please come over to the speakers after the session and continue the discussion. Thank you. Thank you very much. Thank you. Thank you.