 Alright, thank you guys for coming and spending this fine sunday afternoon with me. I'm gonna guide you on an experience of 50 slides, 3 videos and 1 tool. Today we're talking about something called firmware slap. It's something that I've uh spent about the last year and a half working on. It's a combination of concoct analysis and semi-supervised firmware function cluster uh uh that in the end produces a series of explo- or uh of vulnerabilities that we know are exploitable. So my name is Christopher Roberts. I'm a researcher and engineer at Red Lattice. I am a CTF player in George Mason's competitive cyber club and in general I'm pretty interested in finding bugs. I like learning about program analysis frameworks and I like trying to stitch the two together whenever I get the chance. And so for this talk we're gonna talk a little bit about exploitable bugs. Just a real quick background on what they are and a little bit of the uh more recent history to kind of set the framework for what we're gonna be talking about today. A couple years ago here there was something called the DARPA cyber grand challenge. It was a combination of these automated cyber reasoning systems that were designed and tasked with finding exploitable bugs in binaries, writing exploits and throwing those exploits. And on top of that once it found those vulnerabilities they were also tasked with patching them. It was a tall order. And unfortunately they those systems were not quite at a point where they could analyze real world systems. They had a couple things competing against them today. They have a series of mitigations and protections that you as developers employ every day. You have source code level protections through things like static analyzers using like uh clang and LLVM. You have your compile time protections that's offered by your compiler. You have your non executable stack, your stack canaries making your uh relocation offsets read only. You have a whole another option that doesn't even let you use a number of uh very vulnerable uh functions. And finally if all those fails you have your operating system to lean on just a little bit with something called ASLR. And so fortunately we have all of these protections and so none of these automated reasoning systems can exploit us at all except in the embedded case. We've got a couple things that we're uh we're missing. We've uh you can use static analyzers. Sometimes you might have a non executable stack but all in all you're really missing all of those big things that make it hard to exploit vulnerabilities. Um and I'm coming to you with backed up information. I went on Amazon. Uh this is right during prime day and I typed in router. These are the top five most sold routers. And if we know that they have none of these exploit mitigations obviously this is just a show of confidence. They're telling us that these things are coded to perfection. That there is no possible way to exploit them. And so you don't need a way to mitigate exploits, right? It's really bad. There's almost no exploit mitigations on these top five routers. So a very first demo we're going to talk about something called the Almond 3. It's this smart home device that can hook up into your network. It can act as a gateway. It can act as a router. It can hook up into your home security system. Set off alarms. It can tell you what the weather is. It can do everything. And it's got a touch screen. And so this first video is going to be uh finding a couple bugs in it. There's a full screen bug. Alright so here we have got this device on the network. I'm browsing to it and on our left we can see that we can see all of our status information. We can see where the DHCP is on. What our host name is. We can see what our IP address is. This thing is living on my network. And I've gone ahead and I've downloaded the firmware online. You just Google Almond 3, secure a firmware download. And you get it right there. And I'm going to use this little script I made called PwnFirmwareCGI against this thing. And what this is going to do for me is it's going to go in. It's going to extract out the file system. It's going to extract out all of our shared objects. And it's going to go in and iterate over every single binary in this firmware. It's going to recover a series of function prototypes. It's going to recover hundreds of these. It's going to build a set of analysis tasks that we can run in parallel. That's using a system called Anger. And we're analyzing those individual tasks as individual functions for vulnerabilities. We're looking for memory corruption. We're looking for command injections. And finally this tool is going to take that information. And it's going to set it into a templated Python script that I have that just sends data to a web server. So we're using information recovered from that function prototype. We're grabbing information that takes us to that exact URL on this firmware. And we're generating the exploit. And so here we've got a whole bunch of fun colors that just tell us that it's found an exploit. And it's writing those to these files. And so here in a second we're going to see that I have two Python files that were generated automatically from Firmware Slap. And so very first thing that I like to do is I like to see if Telnet's open. I'm going to see if Telnet's open in my very favorite port, 1, 2, 3, 4, 5. And unfortunately Telnet is not open. I don't just get my free back door. So we're going to go in and we're going to open it up ourselves using these automatically generated exploits. I'm going to go in. I'm going to launch my Telnet daemon. I'm going to tell it to listen on port 1, 2, 3, 4, 5 without credentials. I'm going to run this exploit across the network. I still have to provide the IP address. And then finally we're going to connect in. And it runs everything as root. The video had some segments sped up at two times speed. We took the demo time from five minutes to three minutes. All right, so concoct analysis has been a talk that's been talked about a whole bunch. I'm going to give you the 5,000 mile high view on what it is. In essence it combines something called symbolic analysis with concrete analysis. And a way to kind of visualize this in your head is it's representing a program as a set of equations. Each path, each possible program state is represented as a unique set of equations. And we can ask really specific questions for these program states. And so I'm using anger for this tool, which is a concoct analysis tool that was used in the third place team in the DARPA cyber grand challenge. It's perfect for concoct analysis. It's got some reverse engineering applications. And in essence, we can tell it to do this. We have some code over here on the left. And we can tell it to treat a variable as symbolic. It can be anything ranging from whatever. And so as it's going through and looking at all these individual pieces of the program, it's generating constraints. It's finding each of these program paths that print out you did it wrong or the other wrong. And it's created a different set of program constraints. And so we're not limited to just doing this for some variable where it's get user input or make this thing symbolic. We can represent a whole lot more of the program state. We can represent registers as symbolic. We can represent our stacks as symbolic. We can represent our files, our network reads, our environment variable reads, all of that as symbolic. And instead of asking, can you reach this point of the program, you can query for a lot more interesting conditions. You can ask if one of those network reads or if one of those file reads corrupts the program counter, a buffer overflow. You can ask if some of that input taints a system command, a command injection. And you can also ask it if we can track all of our reads and writes required to trigger that vulnerability. Holy cow, this thing is awesome. Why isn't it used more? Well, if you load up a little web server and you run these three commands, it takes about five minutes before you run out of RAM. So it tends to fail on really big code bases. And this web server that I'm looking at is 200 kilobytes. That's absolutely minuscule for what I just showed you at the very beginning where we're looking at an entire firmware that's megabytes worth of size. And so we need something that can digest this large amount of information more quickly and more efficiently. And so we have a couple challenges before us. We need to model more complicated binaries. We need to model more complicated environments. We need to model when there's a hardware peripheral in a firmware that we can't understand. Or when there's an NVRAM value we can't understand. And just for kicks. Let's also try and find binaries and functions that are similar to one another. So in general, when you have these large system services, they tend to follow a pretty similar, uh, control flow. You start them up, they parse some massive config file, maybe they read some environment variables, maybe they accept some command line arguments, um, maybe they set up some sockets, maybe they parse some user input. Anyways, it's a whole lot of information that's going to consume all of your resources, all of your memory. It's going to take forever to analyze if you're just trying to do this in anger. And so I'm only interested in that very bottom piece where that vulnerable code is. And then I want to work my way back up. So there's a really cool technique that's been talked about just a couple times called under constrained concoct analysis. I'm trying to throw as many complicated words at you as possible, but all it means is we're starting from the bottom and we're working our way back up. We can tell our analysis tasks that we don't know what those hardware peripherals are giving us information wise. We can tell it we don't know what NVRAM values are being given to us. We can also skip past a lot of that initialization that comes from parsing config files, from reading environment variables, from reading our command line arguments. And we can jump past that and just look at the interesting code. We can start down here. And when you start at that low of a level, you can break up each of those actions into separate analysis tasks. And so not only are you analyzing substantially less code, it's going to be faster, you can split it up and run it in parallel. So you're using less resources and you can run it faster if you just add more cores, more tasks. It's horizontally scalable. The challenge comes in trying to understand and model what each of those actions at the very bottom are. And so our friends at the NSA released a tool recently called GEDRA. GEDRA has the ability to go into a given binary or a given firmware and it can recover a function prototype. So this is information that we can use to understand what arguments, what values, what registers go into a given function, into that very bottom piece that we want to understand if it's vulnerable. And so we can combine the two. We can use GEDRA and we can use ANGER together to build these analysis tasks with this information we recovered from GEDRA. Part of what I'm releasing later is a GEDRA plugin that goes through a given binary and it will iterate over every function, it will run the decompiler on each of those functions and it will dump the given function prototype telling us what it returns, what arguments it uses. And so we're interested in finding bugs in all of these binaries. We can use ANGER to build these program states, we can run each analysis job in parallel. So let's find some more bugs. This video has sections that are sped up at about two times speed. So here, I'm using something a little bit different. It's another Python script I cobbled together called Discover and Dump. I've got a binary here that is a CGI binary for that very same Almond Securified device. And it's going to be a lot faster now because we're looking at just a single binary instead of the whole firmware. We provided that binary to this script and behind the scenes, it's handing it out to GEDRA. GEDRA is going through, iterating over every function, recovering those function prototypes. It's building those program analysis states for us, passing those back to ANGER and then we're running our ANGER analysis tasks all in parallel and before I can even finish that sentence, it's found a vulnerability for us. So let's dissect a little bit about what that vulnerability is. Up here at the top, we have a couple arguments. We have A0, A1, A2, these are the argument registers that are provided for this function. Our tool has gone in and it has told us that we have three pointers here, three strings and we have a string at the top, we have a string in the middle with another value required for this and we can see that this very first one correlates to that first argument being passed into our function. We also have a series of tainted memory values. So these are all the instructions in the program that took that information that user provided input and touched it in memory, concatenated it together or did some sort of operation that eventually made its way down into a system call. This information, this user provided information, went down the path and was finally a command injection. And because I like to live dangerously, I made this tool always make my command injections reboot the system. It's a very visual way to see that yes we won. So we talked a moment ago about function similarities, function differences and wanting to compare one function to another. These techniques for binary and function diffing have been used time and time again to identify where patched vulnerabilities are. When Microsoft releases a patch you can go in and compare a Microsoft patch or a Microsoft patch versus a older version and you can find out where that code that was patched was and you can go back and reverse engineer this to find out where a vulnerability or a CVE was introduced. And the state of the art tools to do this are generally Bindiff and Difora. They're very visual, they're very good at looking at one thing compared against the other. And they use a set of basic block counts, basic block edges and kind of a couple heuristics. All kind of tethered together. And unfortunately they're pretty slow if you're trying to do these at scale. They're really, really good for one off comparisons for when you have that patch and you're trying to find out where that vulnerability that was publicly disclosed was. And so I wanted to take this information that I had found from this concoct analysis pass and I wanted to see if there are other places in code that couldn't be analyzed that might also be vulnerable. And so I decided to take a data mining approach. I decided to go for clustering. So for you and me we can see that there's obviously two clusters there. But we need to convince our computer that there are two clusters there. It doesn't have eyeballs the way you and I do. And so a classic way of doing this in data mining is extracting features. So on a grid up here we would call them X and Y components. And we could find the distance between all of these and all of the other points. And a very popular form of clustering is something called K-means clustering. And so you extract these features. You have these X and Y components and you pick two random points. Any two. And you categorize all the points based off of that. You ask which points are closest to which point that I had just picked. And you get this two very distinct clusters, these groupings. And what you do is you take another set of two points. Two points that were in the very middle of those groups. And you repeat this process a couple of times. And it works very similar to the way that you or I might look at one of these graphs and be able to say here in the top right we've got red, here in the top left we've got blue. And it works really well for visual information like this. Fortunately you don't need to use X and Y components and break out your algebra one knowledge. You can use the existence of information. You can use string references, data references, function arguments, basic block counts. All sorts of information that you can pull out of reverse engineering tools. You can convert this into information that can be used for clustering. And you can develop a method of really quickly finding similarities between functions and binaries. There's one hang up with K-means clustering. If you don't guess the right number of clusters you get a really bad result. Obviously there's four here but if we told it there was two we got a whole lot of blue on one side and a whole lot of red on the other side. So one of the ways we can fix this is called cheating. I mean supervised clustering. Supervised clustering is where you go in ahead of time and you say I already know what everything is. I can tell you that there are four clusters right here right now and you computer go find me four clusters. Unfortunately all of these firmware writers they don't just tell us which functions are vulnerable. They don't tell us which functions aren't vulnerable and so we can't really do supervised clustering. So I made up a word just for you guys today called semi-supervised clustering. So the idea is you use some known values to cluster data. So if you had some sort of concoct analysis pass that found vulnerabilities for you or maybe if you had CVE information you could start with some known good set of data. You could say I know these are vulnerable. I want to find functions that are really similar to these bad patterns. And so we can set our cluster counts to a number of known vulnerable functions. And we can pull a whole lot of information back out of these functions. Ghidra has a lot that you can pull out of a function. It has a lot of information that lets you convert it to the existence of data. You can turn a data string call reference into a zero and a one. You can normalize the information to make it so that one feature, one set of that data is not more important than the other. Or maybe it should be. But being at offset eight million and memory shouldn't be more important than having two function arguments. And so there's a really cool way of doing that called the cheese squared test. And if I had a whole lot more time we could talk about it. But what you can do is you can import from SK learn use cheese squared test and it works. Oh man. It was the easiest two lines in this entire program. And so what it does at a super high level is it takes all of these features, all of these zeros and ones, all of these, this normalized information and it tells us what's good and what's bad. Having the same file name as another function might not tell you whether they're especially similar or not. Or everything being the exact same architecture might not be useful information. So let's us take this information out of these binaries and just throw them away for all of our clustering. But we can make it even more complicated. We can take it a step further. We can use this thing called the silhouette score. And I'm not kidding you. The way silhouette score is defined is it's how good is a cluster. Like you Google it and you get how good is a cluster again. And what it's designed to do is it tells you how similar every function in a cluster is. And I noticed as I was increasing my cluster counts on this entire firmware that I was starting to get really eerily similar groupings of functions. I was finding functions that did a lot of string operations where they were concatenating or they were s and print f-ing or they were just doing something with strings because they had a lot of string references. I found C++ destructors and constructors where all these objects are getting created and destroyed. I found file manipulations. I found lots of web request handling. And so that was really cool. And so if we go back to our horrible mismatch from before where we just guessed two clusters, we can apply our silhouette score and we can think about it as big circles. We want the smallest circles possible because we want everything that's been clustered to be really, really similar. And so if we have a really large number, we know that our clusters, our clusters of functions are not really similar and we should increase our cluster count. And we can do this over and over and over again until we get a really good number where we have a very small distance between all the functions. And so for my third demo, we're going to do just that. This video is running in real time. It is taking in over a hundred binaries across this firmware. It will cluster all of them and produce a result of how similar every function in every binary is in about a minute. And so we're providing a function that we're interested in. I know that this function is a command injection and I want to find functions that are really similar to this. And so I'm providing a path to my extracted firmware. I'm telling it to go through and iterate over every function inside of this firmware. It's recovering that set of information out of Ghidra and now we're clustering. So over here we have a set of numbers and those are the number of clusters that we are testing against and those other numbers are the silhouette score. We want the smallest silhouette score possible. And so right now we are mapping those cluster counts to their corresponding silhouette score. And so we're going to just test out 50 clusters and we're going to see that there was a spike right around 13 of the lowest silhouette score. And that is the best grouping of those functions and how close they were together. And so in less than a minute we ran across an entire firmware and compared every function to every function. And so I did this a lot. I did this for just way more than just this firmware. And I had a lot of really interesting results. Up near the top we have an impeccably spoiled number. Zero. It means we did a good job. It means the first function that we're comparing against is the exact same as the first function. It was a really good test to let me know that I was doing the right thing. But right below that I have a really really really small number again. And what this showed me is that there were code clones in this firmware I was looking at. Someone went in copied and pasted code from one binary into another. And because I did this clustering I was able to identify it automatically. And then just below that again I had another function that was really really really similar to that very first one. And as I dived farther and farther into the data I started finding similarities across calling patterns where data might be loaded in from an argument s n print f and then used in a system call. I'd find similar function calls, similar data references, similar file accesses, similar ways of interacting with sockets and networks. And so looking at that very first function, the original function we provided in that third function that was really really close it's actually really hard to tell that they're different at all. You could have just had one function in there outside of maybe the function name and the string it referenced. So many patterns were identified automatically using this clustering and feature extraction technique. So I showed you a really complicated graph earlier. We were interested in that value. The trick for finding our cluster count was finding the biggest drop. The smallest value returned out of our silhouette score. So I've just thrown a whole lot of information at you in about 30 minutes. I hope you were taking notes. I'm just kidding. I don't expect you to go out and implement this on your own. I don't expect you to try and use the first script or the second script or try to hobble through my poorly documented python. I've built a tool. This tool is called firmware slap. The tool that we're talking about today, it automates every single one of these steps. You provide a firmware. It'll locate the system root. It will locate all the shared objects. It will iterate over every function in every binary, recover those function prototypes, build and run those analysis jobs. It'll identify those vulnerabilities for you. It'll extract out the features using SK learn, iterating over all of those features to see if they're good. It will cluster those features and it'll dump it into JSON. And because I know not everyone is a big fan of black backgrounds with green text, it exports it to Elasticsearch and Kibana. You're not limited to just Elasticsearch and Kibana. It's JSON loaded into whatever you like. I've included a series of scripts to load that information into Elastic Search and Kibana if you want. But this tool is designed to talk about something a little bit bigger. You have protections in your code. If you're writing firmware, use that extra flag on GCC. You have these protections that can be so easily used to prevent a tool like this from automatically building exploits. Enable your ASLR. That'd make it so much harder for any of these exploits to land. And if you're a consumer, buy a better router. It's time to bring a lot more automation into our embedded systems. You have IoT fridges and washers and dryers and routers and smart home systems. And not all of these systems are getting vetted. Not all of these systems are going through the proper QA. They're trying to produce the cheapest and fastest product that they can and get it to market. And that is creating a series of very, very vulnerable devices. Don't blindly trust and run these third party systems. I'm giving you the tools to find these bugs yourself. Today this morning, I released firmware slap. You can find it on GitHub. I'm releasing the Ghidra Function dumping plugin. And I'm also releasing all of the pox associated with this presentation. And that's where you can find it. Thank you.