 for today is a computer science PhD student at UC Santa Barbara. He is a member of the shellfish hacking team and he's also the organizer of the IECTF hacking competition. Please give a big round of applause to Nilo Radini. Thanks for the introduction. Hello to everyone. My name is Nilo and today I'm going to present you my work, Caronte, identifying multiband vulnerabilities in embedded firmware scale. This work is a co-joined effort between me and several of my colleagues at the University of Santa Barbara and ASU. This talk is going to be about IoT devices, so before start let's see an overview about IoT devices. IoT devices are everywhere. As research suggests they will reach the 20 billion units by the end of the next year. And a recent study conducted this year in 2019 on 16 million households showed that more than 70% of homes in North America already have an IoT network connected device. IoT devices make everyday life smarter. You can literally say to Alexa, Alexa I'm cold and Alexa will interact with the thermostat and increase the temperature of your room. The way we interact with IoT devices is through our smartphone. We send a request through the local network to some device, router, door lock, or we might send the same request through a cloud endpoint which is usually managed by the vendor of the IoT device. In other way it's through the IoT hubs. The smartphone will send a request to some IoT hub which in turn will send the request to some other devices. As you can imagine, IoT devices use and collect our data. And some data is more sensitive than other. For instance, think about the data that is collected by our security camera. As such, IoT devices can compromise people's safety and privacy. Things, for example, about the security implication of a faulty smart lock or the breaks of your smart card. So the question that we asked is are IoT devices secure? Well, like everything else, this slide is a bit bad, are not. Okay, in 2016, the Mirai Botnet compromised and leveraged millions of IoT devices to disrupt core internet services such as Twitter, GitHub, and Netflix. And in 2018, 154 vulnerabilities affecting IoT devices were published which represented an increment of 15% compared to 2017 and an increase of 115% compared to 2016. So then we wonder, so why is it hard to secure IoT devices? To answer this question, we have to look up how IoT devices work and they are made. Usually, when you remove all the plastic and peripherals, IoT devices look like this, a board with some chips laying on it. Usually, you can find the main chip, the microcontroller, which runs the firmware, and one or more peripheral controllers which interact with external peripherals such as the most of your smart lock or cameras. So though the design is generic, implementations are very diverse. For instance, firmware may run on several different architectures such as ARM, MIPS, X86, PowerPC, and so forth. And sometimes they're even proprietary, which means that if a security analyst wants to understand what's going on in the firmware, we'll have hard time if he doesn't have the vendor specifics. Also, the operating environments will limit the resources, which means that they run small and optimized code. For instance, vendors might implement their own version of some known algorithm in an optimized way. Also, IoT devices manage external peripherals that often use custom code. Again, with peripherals, we mean like cameras, sensors, and so forth. The firmware of IoT devices can be either Linux-based or a Blob firmware. Linux-based are by far the most common. A study showed that 86% of firmware are based on Linux. And on the other hand, Blob's firmware are usually operative systems and user applications packaged in a single binary. In any case, firmware samples are usually made of multiple components. For instance, let's say that you have your smartphone and you send a request to your IoT device. This request will be received by a binary, which we term as border binary, which in this example is a web server. The request will be received, parsed, and then it might be sent to another binary called the handle binary, which will take the request, work on it, produce an answer, send it back to the web server, which in turn will produce a response to send to the smartphone. So, to come back to the question, why is it hard to secure IoT devices? Well, the answer is because IoT devices are in practice very diverse. Of course, there have been various work that have been proposed to analyze and secure firmware for IoT devices. Some of them use ecstatic analysis, others using dynamic analysis, and several others using a combination of both. Here I wrote several of them. At the end of the presentation, there is a bibliography with the title of these works. Of course, all these approaches have some problems, for instance. The dynamic analysis are hard to apply to scale because of the customized environments that IoT devices work on. Usually, when you try to dynamically execute a firmware, it's going to check if the peripherals are connected and they're working properly. In the case where you don't have the peripherals, it's going to be hard to actually run the firmware. Also, current study analysis approaches are based on what we call the single-banner approach, which means that binaries from a firmware are taken individually and analyzed. This approach might produce many false positives. For instance, so let's say again that we have our two binaries. This is actually an example that we found on one firmware. So the web server will take the user request, will parse the request and produce some data, will set this data to an environment variable, and eventually will execute the handle binary. Now, if you see the parsing function contains a string compare, we check if some keyword is present in the request, and if so, it just returns the whole request. Otherwise, it will constrain the size of the request to 128 bytes and return it. The handle banner in turn, when spawned, will receive the data by doing a get-amp on the query string, but also will do a get-amp on another environment variable, which in this case is not user-controlled, and the user cannot influence the content of this variable. Then it's going to call a function process request. This function eventually will two string copies, one from the user data and the other one from the lock path on two different local variables that are 128 bytes long. Now, in the first case, as we have seen before, the data can be greater than 120 bytes, and this string copy might result in a bug, while in the second case, it will not, because here we assume that the system handle its own data in a good manner. So throughout this work, we're going to call the first type of binary, the set binary, which means that it's the binary that takes the data and set the data for another binary to be consumed, and the second type of binary is we'll call them the get-amp binary. So the cutting bug finding tools are not great, because other bugs are left undiscovered if the analysis only considers those binaries that received network requests, or they're likely to produce many false positives if the analysis considers all of them individually. So then we wonder how these different components actually communicate. They communicate through what they're called inter-process communication, which basically it's a finite set of paradigms used by binaries to communicate, such as files, environment variables, MMIO, and so forth. All these APCs are represented by data keys, which are file names, or in the case on the example before here on the right, it's the query string environment variable. Each binary that relies on some shared data must know the endpoint where such data will be available, for instance again, like a file name, or even a socket endpoint, or the environment variable. This means that usually data keys are coded in the program itself, as we saw before. Therefore, to find bugs in firmware in a precise manner, we need to track how user data is introduced and propagated across the different binaries. Okay, let's talk about our work. Before we start talking about Caronte, we define our threat model. We hypothesize that attacker sends arbitrary requests over the network, both LAN and one, directly through the IT device. Though we said before that sometimes IT device can communicate through the cloud, research showed that some formal or local communication is usually available, for instance, during the setup phase of the device. Caronte is defined as a static analysis tool that tracks data flow across multiple binaries to find vulnerabilities. Let's see how it works. So the first step, Caronte find those binaries that introduce the user input into the firmware. We call these border binaries, which are the binaries that basically interface the device to the outside world, which in the example is our web server. Then it tracks how data is shared with other binaries within the firmware sample, which we will understand in this example that web server communicates with the handle binary, and it bears what we call the BDG. A BDG, which stands for binary dependency graph, it's basically a graph representation of the data dependencies among different binaries. Then we detect vulnerabilities that arise from the misuse of the data using the BDG. This is another view of our system. We start by taking a packed firmware. We unpack it. We find the border binaries. Then we build the binary dependency graph, which relies on a set of CPFs, as we will see soon. CPFs stand for communication paradigm finder. Then we find the specifics of the communication, for instance, like the constraints applied to the data that is shared through our module multibunary data flow analysis. Eventually, we run our insecure interaction detection module, which basically takes all this information and produces alerts. Our system is completely static and relies on our static taint engine. Let's see each one of these steps more in detail. The unpacking procedure is pretty easy. We use the off the shelf firmware unpacking tool bin walk. Then we have to find the border binaries. Now, we see that border binaries basically are binaries that receive data from the network. We hypothesize that we contain parsers to validate the data that they received. In order to find them, we have to find parsers which accept data from the network and parse this data. To find parsers, we rely on the on rated work, which basically use a few metrics and define through a number the likelihood for a function to contain parsing capabilities. These metrics that we used are number of busy blocks, number of memory comparisons, operations and number of branches. Now, while these define parsers, we also have to find if a function, if a binary takes data from the network. As such, we define two more metrics. The first one, we check if binary contains any network related keywords as soap, HTTP and so forth. And then we check if that exists a data flow between read from socket and a memory comparison operation. Once for each function, we got all these metrics. We compute what is called a parsing score, which basically is just a sum of products. Once we got a parsing score for each function in a binary, we represent the binary with its highest parsing score. Once we got that for each binary in the firmware, we clustered them using the DB scan density-based algorithm and considered the cluster with the highest parsing score as containing the set of border binaries. After this, we build the binary dependency graph. Again, the binary dependency graph represents the data dependency among the binaries in a firmware sample. For instance, this simple graph will tell us that a binary A communicates with binary C using files, and the same binary A communicates with another binary B using environment variables. Let's see how this works. So we start from the identified border binaries. And then we taint the data compared against network related keywords that we found and run a static analysis to detect whether the binary relies on any IPC paradigm to share the data. If we find that it does, we establish if the binary is a setter or a getter, which again means that if the binary is setting the data to be consumed by another binary, or if the binary actually gets the data and consumes it. Then we retrieve the employed data key, which in the example before was the keyword query string. And finally, we scan the firmware sample to find other binaries that might rely on the same data keys as scheduled for further analysis. To understand whether a binary relies on any IPC, we use what we call CPFs, which again means a communication paradigm finder. We design a CPF for each IPC. And the CPFs are also used to find the same data keys within the firmware sample. We also provide Caronte with the genetic CPF to cover those cases where the IPC is unknown, or those cases where the vendor implemented their own version of some IPC, say for example that they don't use the set amp, but they're implemented their own set amp. The idea behind this generic CPF that we call the semantic CPF is that data keys has to be used as index to set or to get some data in this simple example. So let's see how the WDG algorithm works. We start from the border binary, which again, we start from the server request and with parts the URI. And we see that here, it runs a string comparison against some network-related keyword. As such, we change the variable P. And we see that the variable P is returned from the function to these two different points. As such, we continue and now we see that data gets tainted and the variable data is passed to the function set amp. At this point, the environment CPF will understand that tainted data is passed to set to an environment variable and will understand that this binary is indeed the set binary that uses the environment. Then we retrieve the data key query string and we'll search within the firmware sample all the other binaries that rely on the same data key. And you will find that this binary relies on the same data key and we schedule this for further analysis. After this algorithm, we build the WDG by creating edges between setters and getters for each data key. The multibinary data flow analysis uses the WDG to find and propagate the data constraints from a setter to a getter. Now, through this, we apply only the list street constraints, which means that ideally between two program points, there might be an infinite number of parts and ideally, in theory, a different and infinite amount of constraints that we can propagate to the setter binary to the getter binary. But since our goal here is to find bugs, we only propagate the least strict set of constraints. Let's see an example. So again, we have our two binaries and we see that the variable that is passed to the setter function is data, which comes from two different parts from the parts URI function. In the first case, the data that it's passed is unconstrained. One in the second case, a line eight is constrained to be NMOS 128 bytes. As such, we only propagate the constraints of the first BI. In turn, the getter binary will retrieve this variable from the environment and set the variable query, oh, sorry, which in this case will be unconstrained. The insecure interaction detection run a study print analysis and check whether tinted data can reach a sink in an unsafe way. We consider a sink's memcopy like functions, which are functions that implement semantically equivalent memcopies, string memcopy, and so forth. We read alerts if we see that there is a difference of a tinted variable and if we see there are comparisons of tinted variables in loop conditions to detect possible dose vulnerabilities. Let's see an example again. So we got here, we know that our query variable is tainted and it's unconstrained. And then we follow the taint in the function process request, which we see will eventually copy the data from Q to ARC. Now, we see that ARC is 128 bytes long while Q is unconstrained and therefore we generate an alert here. Our static tint engine is based on bootstone and it completely based on symbolic execution, which means that the taint is propagated following the program data flow. That's an example. So assuming that we have this code, the first instruction takes the result from some seed function that might return for instance some user inputs. And in a symbolic world, what we do is that we create a symbolic variable ty and assign to it a tainted variable that we call taint ty, which is the taint target. The next instruction x takes the value ty plus five and in a symbolic world, we just follow the data flow and x gets assigned taint ty plus five, which effectively taints also x. If at some point x is overwritten with some constant data, the taint is automatically removed. In its original design, bootstomp, the taint is removed also when data is constrained. For instance here, we can see that the variable n is tainted but then is constrained between two values 0 and 255 and therefore the taint is removed. In our taint engine, we have two additions. We added a path prioritization strategy and we add taint dependencies. The path prioritization strategy valorize paths that propagate the taint and de-prioritize those that remove it. For instance, say again that we have our, that some user input comes from some function and the variable user input gets tainted. Then gets tainted and then is passed to another function called paths. Here if you see, there are possibly an infinite number of symbolic paths in this, in this world, but only one will return tainted data while the others won't. So the path prioritization strategy valorize this path instead of the others. This has been implemented by finding basic blocks within a function that return non-costant data and if one is found, we follow its return before considering the others. Taint dependencies allow smart and taint strategies. Let's see again the example. So we know that user input here is tainted, is then parsed and then we see that its length is checked and then it's stored in a variable n. Its size is checked and if it's higher than 512 bytes, the function returns, otherwise it copies the data. Now, in this case, it might happen that if the string-lan function is not analyzed because of some static analysis and precision, the taint tag of command might be different from the taint tag of n and in this case, though it's not tainted, it's not tainted and gets untainted, common is not untainted and this string copy can raise, sorry, can raise false positive. So to fix this problem, basically, we create a dependency between the taint tag of n and the taint tag of cmd and when n gets untainted, common gets untainted as well, so we don't have more false . This procedure is aromatic and we find functions that implement string-length semantically equivalent code and create taint tag dependencies. Okay, let's see our evaluation. We run three different evaluations or two different data sets. The first one composed by 53 latest firmware samples from 7 vendors and the second one on 899 firmware gathered from the related work. In the first case, we can see that the total number of binaries considered are 8.5K, if you more than that, and our system generated 87 alerts of which 51 were found to be true positive and 34 of them were multi-binary vulnerabilities, which means that the vulnerability was found by tracking the data flow from the set to the gathered binary. We also run a comparative evaluation, which basically we tried to measure the effort that an analyst would go through in analyzing firmware using different strategies. In the first one, we consider each and every binary in the firmware sample independently and run the analysis for up to seven days for each firmware. The system generated almost 21,000 alerts considering only almost 2.5K binaries. In the second case, we found the border binaries, the parsers, and we statistically analyzed only them, and the system generated 9.3K alerts. Notice that in this case, since we don't know how the user input is introduced, like in this experiment, we consider every IPC that we find in the binary as a possible source of user input. And this is true for all of them. In the third case, we run the NBDG, but we consider each binary independently, which means that we don't propagate constraints, and we run a static single binary analysis on each one of them, and the system generated almost 13,000 alerts. Finally, we run Caronte, and the generator alerts were only 74. We also run a large-scale analysis on 899 firmware samples, and we found that almost the 40% of them were multibinary, which means that the network functionalities were carried on by more than one binary. And the system generated 1,000 alerts. Now, there is a lot going on in this table, like details are on the paper here in this presentation, I just go through some of it. So we found that, on average, a firmware contains four border binaries, a BDG contains five binaries, and some BDG have more than 10 binaries. Also, we plot some statistics, and we found that 80% of the firmware were analyzed within a day, as you can see from the top left figure. However, experiments presented a great variance, which we found was due to implementation details. For instance, we found that anger would take more than seven hours to build some CFGs, and sometimes they were due to a high number of data keys. Also, we found that the number of parts, as you can see from the second picture from the top, the number of parts do not have an impact on the total time, and as you can see from the bottom two pictures, performance heavily affected by firmware size. Firmware size, here we mean the number of binaries in the firmware sample and the total number of basic blocks. So let's see how to run Caronte. The procedure is pretty straightforward. So first, you got a firmware sample, you create a configuration file containing the information of the firmware sample, and then you run it. So let's see how. So this is an example of a configuration file. It contains few information, but most of them are optional. The only one that are not are these one, firmware part, that is the part to your firmware. And these two, the architecture of the firmware, and it is addressed if the firmware is a blob, is a firmware blob. All the other fields are optional, and you can set them if you have some information about the firmware. A detailed explanation of all of these fields are on our GitHub repo. Once you set the configuration file, you can run Caronte. Now we provide a Docker container, you can find the link on our GitHub repo, and I'm going to run it, but it's not going to finish because it's going to take several hours, but all you have to do is merely just run it on the configuration file, and it's going to do each step that we saw. Eventually, I'm going to stop it because it's going to take several hours anyway. Eventually, it will produce a result file that, around this yesterday, so you can see here. There is a lot going on here. I'm just going to go through some important information. So one thing that you can see is that, oh, sorry, is that these are the border binaries that Caronte found. Now, there might be some false positives. I'm not sure how many they are here, but as long as there are no false negatives or the number is very low, it's fine, it's good. In this case, wait, oh, I might have removed something. Oh, it's here, perfect. In this case, this guy, HTTPD, is a true positive, which is the web server that we were talking before. Then we have the BDG. In this case, we can see that Caronte found that HTTPD communicates with two different binaries. File access, .cgi, and cjbin. Then we have information about the CPFs. For instance, here we can see that, sorry, HTTPD. So we can see here that HTTPD has 28 data keys and that the semantic CPF found 27 of them, and there might be one other here somewhere that I don't see. Anyway, and then we have a list of alerts. Now, thanks. Now, some alerts might be duplicates because of loops. So you can go ahead and inspect all of them manually, but I wrote an utility that you can use, which basically is going to filter out all the loops for you. Now I have to remember how I called it. This guy, yeah. And here you can see that in total generated, the system generated six, seven, eight alerts. So let's see one of them. Oh, and I recently realized that the path that I'm reporting on the log, it's not the path from the setter binary to the getter binary to the sync, but it's only related to the getter binary up to the sync. I'm going to fix this in the next days and report the whole path. Anyway, so here we can see that the key content type contains the user input and we have, and it's passed in an unsafe way to the sync address or this address. Now, and the binary in question is called file access CGI. So we can see what happens there. If you see here, we have a string copy that copies the content of haystack to destination. Haystack comes basically from this getter, and if you see destination comes as parameter from this function and v10 and this, and this buffer, it's as big as 0x68 bytes. And these turn out to be actually positive. Okay. So in summary, we presented a strategy to track data flow across different binaries. We evaluated our system on 952 firmware samples and some takeaways. Anizing firmware is not easy and wouldn't really persist. We found out that firmware are made of interconnected components, and static analysis can still be used to efficiently find vulnerabilities at scale. And finally, that communication is key for precision. This is a list of bibliography that I used throughout the presentation and I'm going to take questions. So thank you, Nilo, for a very interesting talk. If you have questions, we have three microphones, one, two, and three. If you have a question, please go ahead to the microphone and we'll take your question. Yes, microphone number two. Do you rely on imports from LibC or something like that? Or do you have some issues with like statically linked binaries, trivial binaries, or is it all semantic analysis of a function? So, okay, we use anger. So for example, if you have an indirect call, we use anger to figure out what's the target. And to answer your question, like, if you use LibC, some CPFs do, for instance, the environment CPFs do, and it checks if this attempt or get them function are called. But also we use a semantic CPF, which basically in cases where information are missing, like, there is no such thing as LibC or some vendors re-implemented their own functions. We use this CPF to actually try to understand the semantic of the function, understand if it's, for example, a custom setup. Okay, thanks. Microphone number three. In embedded environments, you often have also that the getter might work on a DMA, some kind of vendor drive on a DMA. Are you considering this? And second part of the question, how would you then distinguish this from your generic IPC? Because I can imagine that they look very similar in the actual code. So if I understand correctly your question, you mentioned a case of MMIO, where like we, where some data is retrieved directly from some address in memory. So what we found is that these addresses are usually hardcoded somewhere. So the vendor knows that, for example, from this address A to this address B, it's some data from this peripheral. So when we found, when we find that some hardcoded address, like, we think that this is like some red from some interesting data. And this would be also distinguishable from your sort of the CPF, the generic CPF from a DMA driver by using this fixed address, as you mean. Yeah, that's used. That's what the Sematic CPF does, among the other things. Sure. Another question for microphone number three. What's the license for Caronte? Sorry? I checked the software license. I checked the git repository and there is no license. That's a good question. I haven't thought about it yet. I will. Any more questions from here or from the internet? OK, then a big round of applause to you again. Great talk.