 Yeah, so I'm David Last time I saw you guys. I was in better shape. I'm surprised. I made it here More on that later. I'm gonna talk about nosy neighbor. Nosey neighbor is a tool. I've been working on Open sourcing for everyone to use Hoping it makes some waves in the Golan community Talk a little bit about the challenge finding bugs in open source go projects and a little bit of the motivation so Ethereum really loves go Some stats here get the counts for 82 percent of execution clients 42 percent of the consensus layer clients are running prism and this is as of like last week These metrics actually used to be higher like 90 plus on the execution side and like 60 70 plus on the CL Mev boost is also written in go. It's the only production open source Mev client for flash bots right now It as of like I think last week 48 percent of blocks on main net were Flash blocks, so I won't go into a mev and all that fun stuff But what I'm trying to kind of hit home here is that go is like critical in the ethereum stack So somewhere I would say like maybe three quarters of nodes or more are using go at some point So it's super important. Let's see Lots of code got to run two chains the execution chain the consensus chain We've got the entire EVM all kinds of stuff. So there's like a huge amount of code here How big is it exactly the pure go ethereum stack is? 583,000 lines of code so as a security researcher like a deer in headlights I'm not gonna menu review all of that right. We do do menu reviews here All of these things have been reviewed at least once most likely if not twice or more times, but they're moving targets. We've got Every six months or so we've got hard forks on both the EL and the CL So it's just kind of something that if we can automate anything we should we should automate finding issues here my task There's a few of us at the ethereum foundation working on the consensus layer security research team My task is to kind of like look right now and focus on the go stuff. This is kind of a daunting thing So a few months ago, I guess about a year ago started looking at this and trying to understand the problems that we have So talk a little bit about it. First of all Go is memory safe, which is awesome. We don't have remote code execution issues stuff like that very often The memory safety stuff helps also a lot in the I'll talk more about in this talk But making it possible for me to make a tool like this a lot of common mistakes and go though There are some that are queryable, right? There's a lot of these like little issues like the colon equals sign this like quick variable assignment It makes writing go really easy and and human friendly But it also means like you can have these shadow variables where if you like declare a variable in a loop You override it and you might reference it later thinking you're talking about the first version of the variable if you call it go routine and and You kick off like which is basically like for non-go people It's like a it's like a P that create under the hood The variables that are passed to that are not the same variable that you might think they are so if you have Like racy type things later. You don't really know and go doesn't complain So we can query for certain things like this. This is a good example of like one query go sec It just says like hey, you need to manual review All these uses of unsafe probably not the best example But we have code ql sim grep and go sac or like just major tools You can look up for like automating querying static analysis, right? Another big one is race conditions. So goes I think the reason it was designed and the design decisions They made was to make parallelization very easy So they have like this concept of channels and they have this concept of go routine So it's very simple to be like hey Just go on another thread go do something and then come back and Google made it to be memory safe But also efficient and human readable but to process massively large data sets And so we can do a lot of parallel parallelization very easily, but the problem here is that It's so easy that it's really easy to introduce race conditions So like one thing we have been doing is running these thread sanitizers This little terminal prompt right here is an example of actually a main net Geth bug a race condition that was causing some memory corruption, which if you see memory corruption and go It's usually something race-related or you're using like seago or some native library because go is relatively like almost 99 I would say percent safe if you're not in one of those conditions as far as memory goes There's also some other sanitizers. There's a san msan UBSan things like this. We're running nodes on Robson, Sepolia The Prater girly test net and also on main net that have all these sanitizers running So we're like we've kind of automated, you know this querying we've automated Like sanitizers running stuff like using dynamic analysis. So we've kind of checked both of those boxes But you know, what else can we do and that is where nosy neighbor comes in? So how else can we cover these? 583,000 lines of code the solution Let's talk a little bit about the problem. I'll rewind just a bit Obviously already mentioned the huge attack surface one thing I do want to point out here is that denial services are like critical for us Not just like for other go repos necessarily, but a blockchain Cannot have a denial of service. So usually a denial of service on the common vulnerability severity scale is like a three It's not like a nine. It's not considered critical There's no information disclosure. There's no remote code execution and thus there's not usually privilege escalation So people aren't like lifting keys, but if you have, you know, more than 35 percent of the network running some kind of go Under the hood and there's denial of services in those goes Go routines or or these repos then you end up in a problem where the Ethereum network could like be brought to its knees We obviously have a multi-client architecture So these other clients would be kind of like carrying the network during that time But it's not something we want we wouldn't have finality for those of you that are like really familiar with proof of stake It would be pretty big deal So we kind of have this like weird issue where we the worst kind of bugs We don't really see but like we care a lot about these smaller bugs The good I mentioned RCE is rare. We have the source. This is really great I've been a security researcher for a long time and I've not had this source very often in my career So this is like a whole new ball game Go strongly tight all of the panics and stack traces and failure reporting is like excellent So if you write a fuzzer and it finds a crash It doesn't bring down your fuzzer and your fuzzer doesn't commit suicide So that's really helpful and I'm gonna talk a little bit about the tooling This is a big deal. So go 1.18. I guess like maybe six months ago They released native fuzzing support. So this is an example of like in the testing library I can fuzz this function so the function on a test on the right side And I lifted this straight from like the go fuzzing like native fuzzing design doc Foo is the function under tests here. So we can fuzz foo We can say hey f dot fuzz give it a function interface I say I want an int and I want a string which are the two argument types to foo We can add a test corpus of like right here It says five and hello any time you want to like prevent like regression You can add like previous bugs into your test corpora this way and it will automatically tell you if the bug is reintroduced So that's really cool It's automatically covers guided all the test cases when there is coverage They get added to the test corpora and they get mutated on Errors are super descriptive don't need health checkers because all this stuff is built in natively. It's awesome This is kind of the next piece of the puzzle Discovering this kind of like open my eyes to the possibilities of what we could do here The AST is exposed the parser library and the go types library Expose everything about the source so when the compiler reads your code and then compiles it Everything that it sees you can see right here. So like this is an example of the AST It kind of is pretty printed you can see like there's a variable you can see at the top It says like AST if statement that's saying there's an if statement here The first directive is x the value is Two there's all this information here. There's more information than you ever want here So that's really cool. So what can we do here? We can parse all the go code in a repo We can basically collect all the dependencies for the packages and the types We can collect all the function declarations the package declarations the type declarations all the interfaces We can see what every function looks like like does it does it take like, you know Arrays slices bytes strings complex structs also if it takes a complex struct What is that struck made of and recursively down the whole thing? So you can basically go all the way down to all the built-in types In go and you can see all the information you want about something so using this we can generate valid fuzz harnesses for all these functions that we have typed and We can fuzz them we can round robin them we can fuzz them While you know the target's running and I can talk a little bit about that more later But then yep find their bugs profit. I know this isn't like a security specific conference, so this is a reference to An old frack magazine Smashing the stack for fun and profit. I'm not profiting off these. I promise I want to clean up all of these bugs That's what I'm paid to do. That's how I profit Oh, and then you can repeat on every commit which is kind of why nosy got its name It's the nosy neighbor. You can integrate it into your CI Hopefully people will do this and then when bugs are introduced like the moment they're introduced We can just like automatically fuzz a function before even the tests are written for the function and we can find these bugs So it's like really annoying and really nosy to developers because there's like this old granny across the street That's like always looking in your business. That's where the name comes from I just call it nosy nosy neighbors way too much. So I'll refer to it as nosy from now on so nosy in action It basically has three main stages. There's an initialization There's the harness generation and the fuzzing you can put this all into like one seamless, you know action if you're integrating into your CI But for the purposes of like the tool as I made it You might want to debug stuff in the harness generation. You might want to add test corpora There's all kinds of like interesting things you can do in between these steps So I have it broken up into these three steps just for like kind of sanity reasons This is the input every time you run nosy no matter which of the three like actions you tell it to do You're gonna give it this YAML file and this YAML file Contains a bunch of stuff. The most important thing though is the URL to the repo. So it'll pull down this repo You can say I want this particular branch You can specify different go versions Like prism for instance is like one of the big repos that I always look at and it won't build with go Night 1.19 right now So this is really easy as long as it's like 1.18 and above because that's what supports native fuzzing You can kind of use older versions and that sort of stuff There's also these like ignore declarations. I put in there like maybe you have a bunch of test functions Maybe you have things that use networking stuff that writes to the file system that you don't want to fuzz because you don't want to pulverize your file system You can declare at the package the function and the object level to ignore these kinds of things It also has substitutions what this will do if you're familiar with go if you put a substitution in here It's you just put both packages in there and it will put a little replace directive in the go.mod file And this is really nice if you like want to knock out all of your like signature checks You get a lot more coverage this way like obviously your fuzzer is not going to be like signing ecdsa signatures correctly That would be a whole another talk and and we'd be having bigger problems if that was the case The initialization It uses docker It basically makes this little fuzzing environment and building environment all in docker And there's kind of a lot of reasons for that one is that we don't want to pulverize our host file system We have a lot of ease with dependency If you're like looking through and you're trying to Dynamically write code writing all these fuzzing harnesses then it's a lot easier to not use your own go route So this makes a valid go route inside of a docker container It makes everything it adds all the dependencies for nosy for doing the The source parsing and the harness generation and all the fuzzing and all this stuff That was it in this little docker container and initializes it inside this repo Or initializes the repo inside this container It has a shared like host file that you'll see like in a little bit It's an asset directory that keeps everything that the host needs to get So you do all your fuzzing inside this like protect protected environment If you're a security researcher and you're familiar with like jailing a target This is just like a charoute under the hood except that we get to stand on the shoulders of docker and you know We get to like potentially neuter all of the the networking we can like control things and like really jail stuff It also makes it where you could like run this on your host computer and like only give it a few cores And you could still like work and you know, maybe you're you're kind of like dual purpose using a fuzzer That's also like your desktop for research Let's see generate harness generate harness will Copy all the assets in and then it will spit out this one liner And if you double click this one liner and run it it'll start generating all of the harnesses And so you can kind of see like the lower half of this terminal It's spitting out all of these fuzz nosy test.go files All of those files are placed into the respective package under test directory The reason I do that instead of having them all in one is that one bug doesn't prevent the whole thing from Compiling and having go complain. The other thing is that we can fuzz internal Functions you might not want to fuzz internal functions. So nosy has a flag for that But if you do want to fuzz everything and get like this serious breath first coverage of your target This is the best way to do it that I found So this gets to the fuzzing You same thing spits out a one liner it creates it adds all the assets to this asset directory And then you start fuzzing so you can see on the right side here It started fuzzing it found a crash like right off the rip It minimizes the test case that produces the crash and then it spits out the like panic output. So Kind of funny that one found one like right off the rip This is an example repo that I have that comes with nosy so that you guys can all test this Um, I don't provide yaml files For all of the targets i'm testing because there's I don't want to give you guys free bugs and uh nosy's still kind of like a work in progress So I will release Other stuff further down the line as I've like hammered out all the bugs that are there Um, I think what I'll probably do is kind of like have this like private repo That's maybe like three months ahead and then as we've kind of like shaken out all the bugs so that this fuzz I can find I'll open source the other parts of it But you can copy the example yaml file for this target repo that I've made and you can point it at the Go standard library. Um, I haven't had time to do that. I'm sure there's tons of bugs out there like I have One thing nosy does do is it causes like a decent amount of false positives, but it does find bugs So I have like a ton of crashes to look through. Um, before I'll release all this other stuff Um, oh that was the this is an example of like the round robbing So when there's not a test when there's not a crash found this was what the output will look like So it'll fuzz for like 10 seconds on each thing The yaml file has a little variable there where you can say like how long how many seconds you want to fuzz for as you um Generate like larger test corpora that are getting better coverage You might want to bump this up to like, you know, six minutes per function or something like that Uh, this is just a little shot of like what the script looked like like what does the round robbing? This is a shell script that's just like kicked out into the asset file that's run on the target So there's some reasons that this is not the best way to do it So I don't think I'll be doing this way forever But if you can see right here it basically calls the go test fuzz on the function you want If there is a test data fuzz inside of that package now that means that we found a crash Copy that in the asset directory so that it's available on the host If the fuzz or user either commits suicide or if you're done fuzzing You don't lose this a lot of people like know that use docker your container might not be persistent If you don't have some like asset directory where you save stuff off too So this prevents you from kind of losing work that you've done. All right example findings I made this little repo so that everybody can kind of like see nosy in action and have an example So they can point it at their own repos These are all the root cause of all of these bugs are copied from real bugs that nosy did find And I'll talk a little bit about like the type of bugs it finds because it doesn't find everything And it's really good at finding a few things and and I'll talk about that Looking at these here. So This is just showing like the panic line like this is identifying the type of issues that we have It looks like there's two index out of range The second one actually has another bug. There's two bugs in that function. So I included the wrong screenshot It should be like a divide by zero So these four functions on the right Show the vulnerable functions So these are the kind of things that nosy like if you say like spend three seconds on a function nosy will find these things like immediately And if you notice what these are their panics, they're not like remote code executions. They're panics where things stop So if you have some uh blockchain software, it's highly social It's listening to all these peers every time it receives a packet If you have like a g rpc handler and it's meant to like panic gracefully and it just panics in that go routine You're fine But if you have this in like core code This panic can like make the panic go all the way up the stack And just completely bring down your note. So these like true like packet of deaths They're a big deal and this is kind of the thing that like keeps me up at night Because panics are not always handled gracefully, especially in uh like these huge systems So like the evm for example if you found a panic in it You might actually crash that part of the process and then get would just be like completely worthless, right? So let's talk a little bit. I've only got five more minutes left So i'm going to talk really quickly about like what these functions look like So this is the most basic, right? So uh, if you look at this, this is the same native supported testing The input to the top is is it it's actually a test function. It takes in the testing object from go Uh, you hand it a function interface, right? So this function here all I'm saying is hey, uh, I want to fuzz this Log validator web off, right? I just picked some random function. Uh, it takes a string a string and a string, right? So You know, this is the kind of thing I need to tell go fuzz. Hey, I want strings when you mutate like I need a valid type string Super simple This is something that go testing does not support complex structures So if you see the second line there, what it accepts is actually a byte array Um, what this does is I use right now nosy ships with the open source version We'll ship with go fuzz utils from trail of bits There's some reasons that and I won't go through them unless I have time at the end why That's when it ships with the biggest one though is that you see all of these fill errors It will return if you don't have enough data So if I have like, you know a bunch of nested structures and like you can imagine My function or tests like needs basically like 2000 bytes to fill all the data correctly It'll it'll say hey Return there's no issue or anything. Give me something larger. It doesn't really say give me something larger But it it keeps letting the fuzzer mutate until it finds, you know, further pass and this is coverage guided So the fuzzer will very quickly make it all the way to that last line So what happens here? That last line is acm import. It takes a context variable I need a valid context variable I want to basically test import, but I need acm. This is a this is an actual like object So it doesn't just fuzz functions. It fuzz is like methods on receivers, which is goes version of an object So methods on objects. We need a valid object created. We need the argument there That kind of stuff yet more complicated Here we actually have a constructor. So why make like an object and fill it with random data when there's custom constructors made? So like, uh, you know, these these large blobs for the evm Um, maybe like different peer structures, maybe like beacon blocks, things like this We've already got constructors for them. So like why make them ourselves? We're going to get a bunch of like no pointer D refs and false positives So what nosy will do and this is hugely inspired and borrowed from fc gen even like probably like 60 in the code for this type of interface is Borrowed from that But what we do here is we go look and say hey Is there a function that returns this object and only this object that doesn't take that object as an input? And you know, it can either just be that object or that object in an error And if that's the case we say oh that is a constructor sometimes you see false positives for this But in reality, um, they're actually really good at still generating valid objects So in this case, I need a new key manager. I didn't have to write new key manager Some developer that made the new key man the the key manager object wrote this constructor. That's what he uses It takes, you know, this configuration deal and then whatever c1 is a context variable So what happens here is nosy recognizes this builds not only its own version But a second function defuses that relies on the constructor and whichever one gets, you know, more coverage can find the bugs Um, so right here, we we basically find the constructor and we know how to see a function interface So we build everything we need for that constructor And then we make the constructor hand us the object and then we also provide everything for the function under test Which in this case is fetch validate Something prove key. I can't really read from here, but this is just like a random example They're splitting to choose from uh one thing I forgot to mention out of the 50 583,000 lines of code there's Over 15,000 functions the in the in those five repos which is basically all of the dependencies for mevboost Geth and and um prism that are supported. So that's 15,000 functions that I can get coverage in That's coverage guided fuzzing that I don't have to write the the harness for that's kind of like the value out of this tool here um and and Yeah notice here that like no nosy didn't only like create the valid arguments to fuzz here It created the object and by doing that The way that it did that was it created the valid arguments for the constructor So everything that needs to happen here to try to get like as close to like a real test case as possible We have mistakes learning. I'm gonna go really fast. I got a minute and a half Um version one actually there's like a version point five. Uh shout out to tyler holmes One of my teammates, uh, he did a big code ql query for various things and we Found like a bunch of stuff that just accept byte arrays. So we wrote some python that would like basically Generate harnesses to fuzz those that was version point five then version one Was in python. This is just like to show you my pain I'm grepping for regex here in python all that gibberish. I felt like jr token like basically like writing elvish or something This was no good. Um trying to do this for for complex structures and stuff was like a total pain So moved on to the ast objects that you get from the mcgill parser library But you guys have seen this this is kind of cool. It looks sort of pretty printed. It still sucks All that stuff is like you don't really know like this ast I didn't on the fourth or fifth line that x is you got to like do a type check on all those So you end up with like this massive parsing thing that's got like, you know A gazillion nested case statements still better than python, but still really ugly Then I uh ran into fc gen uses the go type libraries to write that like more complex function interface that I showed I it blew my mind. I can't believe I wasted all my time for six months on these other things So I basically grabbed all the code that works from there threw it in here. It was a minimal rewrite for me Cool, uh talked about why we use docker these fuzzers will find the go binary and delete it They will write all kinds of crazy stuff to your file system if that happens You can just restart nosy and your host isn't screwed. Um, let's see various fill libraries I talked a little bit about trail of bits. I have a proprietary version That I'll talk a little bit more about here last slide that really matters. I know I'm out of time Things that we want to do auto corporate bootstrap. So you can imagine here We already know how to dynamically cut write code. We can dynamically rewrite code So if I point it towards a repo and I say hey, I want to run go ethereum You know, let's say I support 7500 functions in it Then what I can do here is I can say all of those 7500 functions instrument them run go normally run it on main Net if you want save off every valid call to all of those functions and and receivers and then Mutate on those so I can I can bootstrap a corporate that way I can automatically fuzz in a separate go routine in a docker image or something In real time you can be like continuously fuzzing mutating on real valid test cases Let's see auto object fuzzing you can find race conditions this way You could say hey, I know I support this constructor. It's got 10 methods on it Write a fuzzing function that will kind of round robin those there's some work like this in fc gen that I'd like to copy as well I think if you run that with the thread sanitization, you'll find a ton of race conditions that way Lockdown networking you can do ast walk to say like hey I want to like look at all the reachability from this function if it writes the file system Excluded because I'm tired of something something's pulverizing my file system and it's destroying my fuzzer that kind of stuff final task test case minimization at the end of a run And all the coverage analysis this would be really great if I had this done today because then I could say hey look You know prisms testing library has this much coverage And I added this much coverage automatically with nosy neighbor. That would be really cool. Maybe I'll have that in six months for you guys All right, uh, I will open source this in the next 24 hours. That's my promise to you My creative excuse as of all procrastinating engineers Has a creative excuse. I got bit by one of these snakes like four or five days ago I wouldn't be here if my wife didn't like do so much to get me here I spent a lot of time in the hospital. I've been elevating this foot Hence why I came up on crutches. I'm starting to be able to put weight on it But yeah, I actually have like a real excuse this time the doc didn't eat my homework One of those guys like legit bit me. It was a whole thing follow infosec you all on github or twitter I'll drop the repo links probably later tonight will latest like this time tomorrow depending on how the rest of the day goes Real quick. I do want to thank fc gen trail of bits for that. Uh, phil repo Zinshada and and justin traglia for various things that they added to this repo into this project The go fuzz folks and then everyone in the go for slack. That's been like super helpful Any questions? Yeah with that david I am going to say you can go ahead and take questions over to the side But thank you so much for a making it here through all of those different hurdles and for giving us your great Presentation. So thank you so much. If you do have any questions for david, please feel free