 So our next talk is called exploring the hidden attack services of OEM IoT devices upon thousands of routers with a vulnerability in real tech SDK for ECOS OS. We all know about real tech, so, but give a big hand to these guys, this is our first time at DEF CON, give them a big cheer, and hopefully you'll learn something and probably patch your routers at home. All right, bye. Well, welcome to you, Henry. Yeah, welcome to exploring the hidden attack surface of OEM IoT devices. Today we'll be sharing with you a vulnerability we found in real tech SDK for ECOS OS, but with vulnerability we managed to found multiple router models from many different vendors. So we'll be starting with how we pick our initial target for this research, and then we'll move on to the initial reconnaissance phase, and we'll talk a little bit about ECOS OS, which is the operating system that these devices run. After that, we'll talk about how we analyze the firmware and how we found this vulnerability in question, and then we'll discuss exploitation and post-exploitation strategies on these kind of devices. Then we'll talk about automating firmware analysis to detect the presence of the vulnerability in other router models, and finally, we'll be closing with some takeaways. But first, let us introduce ourselves. My name is Octavio Genotiempo, and I'm a security researcher at Faraday, and here with me is Octavio Aland, who was also a security researcher at Faraday at the time of this project, and now he's a research intern at the Maxbank Institute in Germany. And also Emilio Cotto and Javier Aginaga are part of the team and contributed to the research that we'll be sharing with you today. Emilio couldn't come, but Javier is over there. Well, Octavio and I were the main researchers on this project, and we are computer science students at the University of Buenos Aires in Argentina. I'm from Argentina in the audience, and I am also a biologist, but that's a long story maybe for another time. And we are CTF players with our team for net injection. Our engineers will understand the plan in the name, and we mainly focus on reverse engineering and bound categories. And the most important thing is that when we started tackling this project, we had no prior hardware hacking experience. So our motivation to choose an IoT device was a reputation for being secure, and we thought it would be a great opportunity to put to test our skills in reverse engineering, and hopefully, if we got lucky, our expectation skills too. So how did we pick our initial target? Well, for us, a router was an obvious choice, because if you managed to find a router, you can access to a local network. And in this area of working from home, this may also be an opportunity to be brought into an enterprise network. And we decided to choose a popular target to maximize the impact of our findings, and also a relatively cheap one, because we thought that for a vendor that is designing a cheap router, maybe security is not a priority. And keeping this in mind, we looked for the top selling router in a locally commercial site, and we settled for this one. It's called the NEVILA 300 Pass. The brand is NEX, and NEVILA 300 Pass is the model. And it's a pretty standard 300 megabits Wi-Fi router that is based on a real tech stock, the RTL-8196E, which has a 32-bit MIPS processor that can handle also 16-bit instructions to reduce program size. And it's configured in big Indian mode. And as you can see here, at the time of making this slide, this router had almost 40k sales in Argentina only, in this marketplace. And here it says that it's the second top selling router, but actually the first one was a repeater, so this one is the top selling. And it's even recommended by the e-commerce itself, and because it has pretty good reviews from the customers. But, well, a typical customer doesn't have the tools or the skills to reverse-engineering the firmware of the router and find the vulnerability. So what do they know about security, right? So we bought this router and downloaded the firmware from the vendor's website. And the first thing we noticed was that it had a bootloader and a compressed kernel image. We ran that through BingWalk, and we managed to decompress this kernel image, but we couldn't guess the loading address to start the reverse-engineering process. So we decided to crack open the device to hook to the UART interface, but there weren't pins on the board or place to solder them. But luckily, this sock from RealTech, the RTX 8196E, has UART capabilities and has some pins assigned to UART. So since we were working from home and on a budget, we designed this little contraption with the cork because Argentina. And if you don't know, wine is very good in Argentina. And two thin wires, and this device hovered just over the sock, barely touching the pins. And with that, we managed to get our first UART output from this device. And luckily for us, it had a lot of addresses printed on the screen. And among them, there was this start address, which is the address of the first function that gets called within the kernel code. And with that, we guessed the loading address, and we could start our reverse-engineering process. So the first thing we noticed was that this giant binary, which is the kernel image, is composed of software from many different origins. It has a real-time operating system called Ecos, a live-scene implementation, a web server called GoAhead, and a lot of custom code, mainly written by the vendor, but it also has code from other sources. So let's talk a little bit about Ecos. This is an open source, real-time operating system. It's POSIX-compatible, and it's designed to be lightweight and customizable. The idea is that the developer can choose which modules and packages of the kernel to include it in the build process and to bundle that up with their code to achieve a tailored solution that can run on an embedded device that has limited hardware specs. So another key characteristic to achieve this lightweightness is that Ecos has only a single process, but to be able to achieve concurrency, this process can spawn multiple threads. And these threads can access the whole memory space. There is no built-in memory, there are no privileges, and every time a thread crashes, an exception handler gets called. And for this device that we were looking at, this exception handler just reboots the device. So once we knew the approximate composition of this image, we started reverse-engineering the custom functionalities of this router. So since many parts of the software stack were open source, that was good news. We had, well, the operating system, the libc, everything was open source. We wanted to build those components ourselves with the back symbols enabled and then apply those generate function signatures and apply them to the binary we already had so that the reversing process will be easier. Unfortunately, the vendor did not provide a release for this device, so we couldn't download a zip package and run make on it and have it work. So we looked for the compiler being used within the firmware image. There was a string indicating the version, of course. And we tried to use that to build the firmware with the back symbols, but we couldn't, without the exact build configurations, such as the compiler options, optimizations enabled and so on, we couldn't generate matching function signatures. In the slide, there's a footnote with how that approach would look like, had it been possible. So we had to do without function signatures. However, as I just said, we had the code for the operating system, the web server, which was go-ahead. The libc was used in libc, all of this is open source. And there were many parts which were not actually open source, but the code was leaked online. So that helped a little bit with the reversing process. So we basically went about the reversing process as usual, basically, but reading the source code as reference. And now that we knew we had the source code and we had the code and device. And we noticed that there were a few functionalities in the device which were not available in the upstream code. One of these functionalities was a shell that was exposed through your internet. Since this device runs ECOS, it's not a Linux shell. It doesn't have many things that one will expect from a shell. But it basically allows us to inspect the configuration of the device, inspect the threads, the networking options and so on. And it also provided us with a great starting point for the reversing process because we could just look up within the image strings relating to those commands and work our understanding of the device from there. One of those commands or rather a group of these commands were particularly interesting because they allowed us to read and write memory. And this sounds pretty basic, but this was a very low level primitive. Like it was a command on which you could just plug an address and it will try to read from that address or write to it without any checks. So we could use that to modify the code that was running on the device or we could make it crash if we tried to access an invalid address. And this will be very useful throughout the talk. So when we moved on to trying to inspect the threads running on the device, we also noticed that this is kind of a design decision by ECOS. Basically every functionality resides on its own thread as Octavio said before. We only have one process so we cannot have multiple processes or surfaces. Everything has its own thread. And this is really interesting because as you can see for instance the DHCP server is a thread just as privileged as the network support thread which implements all the network stack. So basically everything lives together in the same space and there are no privileges whatsoever. And in order to communicate among themselves, these threads can basically exchange messages among themselves, the messages being just C strings. And while there are a few API calls that any thread can make using an ID and the message that they want to exchange. And on the slide you can see an example for instance when the reset button is pressed for long enough, a message gets sent to a thread which restores the factory settings. And one last thing that we tried to do during the initial reversing stage was to debug the firmware because that will have been really useful when trying to build up knowledge about how this thing worked. But luckily there were no JTAG interface on the board. But when we looked at the SOX documentation we noticed that there were a few pins that had provided JTAG functionality but they were used for GPIO on this specific device. They had the two functions and when we tried to switch them over to JTAG mode the device crashed. So we had to do it without the JTAG which was somewhat hard. Now this is what happens when the device crashes. We got, well, a damp indicating which thread caused the crash, the type, like the reason for the crash or the exception, and a damp of all the states that the contents of all the registers is a stacked trace and also we didn't include it here but there's also like the contents of the top of the stack. So even though we cannot debug the firmware properly using JTAG and attach to it via GDV or anything, if you think about it, getting such a damp is kind of the functionality that one will expect from a debugger when the execution hits a break point, right? When you use a debugger and you hit a break point you usually can expect the state of the program and the processor. And while a real debugger will also allow you to modify those values and receive execution that was not possible in this case but it was good enough. And more importantly, it was the only thing we had. So in order to set these break points of sorts what we did was we overwrote the desired address where we wanted the execution to stop with an invalid address, an invalid instruction. And when the execution hits that address the thing will crash, we will get the damp and then after we reboot, we reverted it back to a clean firmware so we could use that as a sort of rudimentary debugging mechanism. Well, with that out of the way we were able to build an initial understanding and moving on to trying to find a vulnerability. So during this reversing effort we identified a lot of libc functions. Remember that we had to do this manually and as you know, as you might know many of these functions are dangerous or potentially dangerous. So we decided to write a script to search for calls to string copy man copy and such functions with the destination argument located on the stack and the search argument that was not hard coded and shifting through that list of results. We found this piece of code that is very interesting because it uses string chart to search for space two times in an input line and then as you can see, it uses string copy to copy from there onwards to the stack without checking its size. So this is a classic stack buffer overflow as you might see in CTF. But before we can understand what this function does in the context of the router, we first must talk about voice over IP and SIP and SDP protocols. So every time a voice over IP call is made first session must be established using the SIP protocol and alongside this, the session description protocol or SDP is used to negotiate network metrics and media types that will be used when the option call takes place over another protocol such as RTP. And both SIP and SDP protocols are application layer and are text-based. So here you can see an example SIP message and it has two parts, a SIP header that resembles HTTP and it can have SDP data alongside. And the important thing for us is that it has IP addresses and ports even though this works on layer seven. And the IP addresses and ports on the SIP header will be used, for example, in this case by the colleague to respond to this message and the ones on the SDP data will be used to establish a session, a media session. And in this case, it is an audio session such as it is described in the field that starts with the M equals audio and the IP in the C field will be used to make that connection. So what happens when a device like this is in a local network behind the router that does network address translation? Well, these IP addresses and ports in the SIP message will be local ones. And as this message traverses through the router, the router has to change them to the external one IP address of the router and an external port to ensure that the colleague can respond. When this fails, the call might not ring or one of the ends might not have audio in this case, for example. So here you can see the same message before and after this functionality rewrites it. And this functionality is called SIP ALG or application layer gateway. So now we can go back to the vulnerable code and understand it better. The code starts reading lines from the SDP part of this message and it will use scanF to try to match the media description field. And from there, it will try to extract the port in an attempt to rebuild this media field and replace this with an external port. It will search for the two spaces and then it will copy the rest of the information that includes the protocol and the format to the stack. So this function that is part of the SIP ALG feature of the router and rewrites SDP data in SIP messages has stuck by flow flow. And the router should crash if we send a message that has, for example, a lot of face after the media port in this media description field. And this functionality has to rewrite both incoming and outgoing packets. We might crash the router with an incoming packet too. So we sent a new DP packet crafted like this with a lot of face as the report to a random port on the router and using the router's one IP address, the external IP. And when we looked at the UART interface, we saw that the router had crashed with a lot of face on the stack and with control of the program counter. So this means that no open ports are required to trigger this vulnerability and that it can be triggered from one. And more importantly, this is a hidden attack surface because there's nowhere, no place on the documentation of this router that mentioned that it has this CPLG functionality. And it can be disabled by the router's web interface. We found that it can only be disabled by the command line that is available through Telnet and UART, but there's no way to pursue such a configuration. And every time the router resets, it will become vulnerable again. And also, port scanning wouldn't have revealed the presence of this feature. So once we knew that the router had this hidden attack surface and that it was triggerable for one, we decided to try to exploit it. Okay, so the upside of trying to write an exploit for an Ecos device was that at least on this particular device, there was no ASLR nor any kind of prevention from executing a writeable memory or the other way around. And well, that implied that all the addresses were deterministic. For instance, we knew where our shell code would land, like everything we sent on the packet will arrive at the specific address. So we could just go with the usual approach that we'll be familiar to a lot of people of just writing a shell code on the stack and then using the overflow to overwrite the return address to make it point to our shell code. The two caveats are that the shell code cannot contain null bytes because it will be copied over using a string copy. And that in this architecture, we have two separate caches, one for data and one for instructions. So we cannot write self-modifying code, which was our first approach to try to avoid using null bytes. So we can do that because it leads to cache coherency issues. So what we do is we've sent an otherwise completely normal packet, only that after the audio port, well, we include some padding or shell code and as you expect, the address of the word shell code will land. So when we send this payload or shell code executes, within the shell code, we enable telnet and send a message to the firewall service in order to turn it off. And then we continue execution normally. And it's very important that we do this continuing execution because if we fail to receive execution after the exploit is done and we crash a thread, not only will the thread crash, but the whole device will go down and the exploit will not work. And after that, well, we connect to telnet using a backdoor password, although that is not strictly necessary because at this stage, we have full control of the device and we could set the password if there was no backdoor. So that was it for the exploitation. At this point, we have shell, which isn't strictly necessary. We could do everything with shell code, but it's easier this way. We cannot use a second stage binary, like W get a binary and run it because there is no file system, this is not a Linux system. So this time we're sorted back to the memory modification command that we talked about earlier. So if we look at how commands are handled in ECOS, we notice that there is a global array which has one entry for each possible command. Each entry consists of a pointer to the command name and to the function responsible for handling invocations to that command. So what we do is we look for an unused memory region and we inject a custom code in there and then we modify the global array with the commands to make one of the handlers point to our code. Again, there is one more caveat here. The code we are injecting here, it's not a binary, it's just a raw machine code. So it has to be self-contained or otherwise only depend on functions available within the firmware, provided of course that we know the addresses. So within this code, we have access to basically everything that's available on the device, what we used for our second stage for the POC was the ECOS API, which includes thread management functions and the libc. And using this, we implemented a multi-threaded TCP connect port scanner. SIN port scanner will have been better but ECOS didn't provide support for raw TCP sockets for doing that. So we had to do a TCP connect scanner and we used the multi-threading that ECOS provided to reduce scan times. All of this needs to be built statically in a self-contained binary. We use a custom linker script in order to be able to specify the loading address so that all the jumps will make sense in the context of the router. Using a compiler which is compatible with the one used to build the image in the original device, and we kind of fake the library calls with the addresses that we already know. We can upload this using Telnet with the command for writing memory. And from there we can just execute it. All of this is open source and will be uploaded to a repository shortly. And one thing we didn't go into but is interesting is gaining persistence and there's a footnote which you can check to see about that. So I think it's time for the demo of the full payload. So as you can see, we start by entering to the admin panel of the router. Here is the one IP address. And if we go to the administration part, we can check that Telnet is not enabled by default. However, we can try to use Telnet, but obviously it will fail. And we can also check if the Telnet port is opened. But it's not. So now we run our exploit that will begin by building the second stage and then we'll send the zip message to enable Telnet. Now the Telnet port is opened on the device and we can use this shell to upload the second stage. So this is rewriting the command handler and when it finishes, we'll have a new command on the router. And as Octavia said before, this command is a port scanner. So we can use it to scan the router itself that now has Telnet enabled also alongside this interface, this web interface. And we can choose another device on the network and scan it with the map as a ground truth. So this device has some open ports and we can replicate this scan using the router this time. And it works. So once we managed to bound this router model, we decided to try to bound other models, but using the same vulnerability. The first thing that caught our attention was that among these commands that were available in the command line that was available through Telnet and UART, there were a subset of them that were called Tenda's commands and as you recall, this router maker is next and Tenda is another manufacturer. So this was interesting and we decided to search for the hardware specs of next and Tenda devices. And we found that many of them are based on socks from RealTech from the same family, the RTL-819X. Here you can see on the left the device that we were doing our research on and on the right the Tenda AC5 also has a sock from this family. So we downloaded the firmware images and we found that they're around Ecos too. And we managed to found another vendor that uses socks and ran Ecos on their devices and when we looked at the user interface to configure the router through the browser, we found that they were very similar and only differing on the branding. Moreover, many of these devices are very similar physically, even on their packaging. So all of these suggest that these are OEM manufactured devices, maybe manufactured by one or two companies. So all these routers are built alike and we wanted to know if there could be pound-like. So we manually searched for the presence of the vulnerability in these firmware and it was on many of them. But before we moved on to applying all the routers, we decided to disclose this vulnerability and we reflected on the fact that this vulnerability was shared by many different vendors, but this feature, the CPLG functionality is kind of low-level, it's part of the network stack. So we thought that it was unlikely to have been written by one of the vendors and we decided to contact Realtek directly and they quickly confirmed that the vulnerability was part of this very SDK for Ecos-based routers, access points and repeaters and this meant that all vendors that use this SDK and running calls on their devices might have this vulnerability if they don't review the code that Realtek provides. So this motivated us to automate firmware analysis to try to detect more vulnerable devices. So if we take a look at the vulnerable snippet again, we can see that it has a pretty recognizable structure. There are basically two calls to string char looking for spaces in a given input and there are string copy which copies everything after the second space to a buffer on the stack and we thought it may be possible to create a signature for this. So if we think in terms of the pattern that we want to detect, we basically want to detect calls to string copy, but again, given a raw firmware image, we don't know which function is a string copy so we just want to detect calls to any function which takes two arguments, the first one being a buffer on the stack and we can check whether a call has the first argument as a stack buffer using Ghidra's intermediate representation API. And from there, we can check that the second argument comes from a call to a different function. Again, with two arguments, the second one being constants. We repeat this last step again and we check that the first argument to that previous call also comes to a call to the same function which we hope will be string char. And lastly, we check that these calls values equals a hex 20 which is asking for space and if we find such a code pattern, we basically assume that F corresponds to string char, G corresponds to string copy and that the firmware is indeed vulnerable. So we end up trying to detect this pattern using a Ghidra script and basically scanning the whole code for this code pattern will be very time consuming on top of the Ghidra analysis that needs to run first. So in order to narrow down the search space, we only look for this pattern within all the functions that reference a SIP related strings such as M equals audio or SIP invite or any of those. But there's a big problem that needs to be sorted out first and that's that in order to be able to get the string references right, we need to be able to calculate the loading address for the kernel. In our case, when we manually reverse this device, we got the loading address from the UART output. But if we want to do this statically, we cannot, I mean, if we want to automate this, we must do it statically. We cannot go out and buy any device that we want to scan. So if we look at the UART output once again, we can see that at some stage in the boot process, the kernel needs to be decompressed and someone is responsible for both decompressing the kernel and deciding where the kernel will be loaded in memory. So we reverse engineered the bootloader and we found this piece of code. Again, the names were added by ourselves. And we can see that there's a function which we have called decompressed kernel, which takes the kernel loading address and it gets called right in between the calls to printf which prints the bug messages we were seeing earlier. So once again, if we try to detect this code pattern, we can make use of the fact that there are several calls to printf and that we know the offsets between these strings that are being referenced. So we want to detect a code pattern that looks like this, several calls to the same function using those strings as arguments and in between the first two calls, a call to a different function which takes at least one argument. But because we don't know the loading address for the bootloader either, we cannot get the string references right. So we need to rely on the offsets that I just said we already know. So we want to detect this code pattern and we must make sure that the difference in offset between the calls to the function f match the difference in offsets between the strings that it should be printing. And if we find a matching piece of code, then we assume that the first argument to the second call is the kernel loading address. And in order to do this, we use capstone which is basically works in disassembled instructions. It's much lower level than Gidra's IR API, but it was good enough because the analysis that we were conducting was rather primitive. And by the way, there's also an alternative approach for figuring out the kernel loading address statically which is in the footnote but we tried that and it didn't work in our device. So we automated all of this and then called it like a higher level script which invoked the capstone script first to detect the loading address, load the binary into Gidra, and then run the second script which detects the vulnerable function call. And all of that is open source and again will be available in the repo in a short time after this talk. So we run this scan against the models that we through basically googling for devices using this chip or devices using this OS. We identified four vendors. We run the script against all models from these four vendors. We identified 13 vulnerable models which had at the time of making this project we noticed had amounted to over 100,000 cells in Latin America alone, in one e-commerce alone. Not only that, but they were actively being sold because a few months in, 30k more devices were sold. But then the guys at Faraday with the help of Daniel Del Fino and Federica who's here in the audience basically figured out a way to detect more potentially vulnerable devices provided that these devices expose the HTTP interface through one. So this gave us 63,000 more devices to look at or devices in the wild to look at not different models. But again, these are only the devices that are exporting the HTTP interface through one which is much lower bound. We started digging up through those models individually and again, we noticed that there were many brands, many devices from those brands and they all looked alike. They all used the same chip and everything. More so, they physically resembled one another. And after running the script on more of these devices, we managed to identify 31 models from 19 vendors, including, well, Tenda, of course, Dillings, Xexcel, and well, a few more. Well, in case you have a device that looks like that or if the web interface looks like that, you can download the firmware from the vendor's website or hit that endpoint which these devices provide to dump the firmware and run it through our tool and please let us know if you find more vulnerable devices. So with that being said, we can move on to the takeaways of this talk. So as a recap, we started researching on a router that was top selling in Argentina and we found the vulnerability in un-documented functionality. This vulnerability can allow an attacker to achieve remote code execution without user intervention in this router and through the one interface. And it can be disabled via the router's web interface. It can only be disabled via the command line which is kind of difficult for a normal user. And even in that case, this configuration does not persist and when the router resets, it becomes vulnerable again. So why does this matter? Well, because it was a hidden attack surface, there was no place on the documentation that mentioned this feature. And the fact that it ended up being on real text SDK meant that it affected very small from many different vendors. And it also shed light on the fact that vendors don't do source code review because the majority of these devices that use this SDK ended up being vulnerable. So you might be wondering, well, you found this stack buffer overflow on a cheap router, but expensive routers should be more hard, right? Well, at least for these vendors and especially for Tenda, which is the one that has the highest number of devices affected, the expensive router models might offer the users more functionalities such as configuring your router with your phone, using the cloud and things like that. But they are also based on this SDK and have the vulnerability. And you might be also wondering, well, enterprise routers should be more hard, right? Because these are all home-rate routers. And for that, we refer you to the latest flashback talk team where they found a vulnerability in the VPN functionality of a Cisco router and it's stack buffer overflow, pretty similar to the one we found, and they discuss other similar vulnerabilities, but in enterprise routers. So although the security of internet-connected devices has improved recently, buffer overflow can still be found on 2022. And you might be wondering, well, why hasn't this been reported yet despite being a classic stack buffer overflow? Well, these are our thoughts from a manufacturer's point of view. They don't have a security mindset. In fact, when we reported this vulnerability to Realtek, they thought that the only thing an attacker could achieve by exploiting it was to reset the router. And it would be very hard for an attacker to achieve code execution using this vulnerability. From a vendor's point of view, it is clear that they don't review the source code provided by Realtek. From a researcher's point of view, we think that the fact that this binary image is a giant blob composed of software from many different origins and in which applying function signatures is difficult might be a little bit daunting. And from the user's point of view, well, they don't even know that the routers have this feature. So after we reported this vulnerability, it was assigned CVE 2022, 27255, and Realtek patched this vulnerability on March 25th. But to the best of our knowledge and up to this date, no vendor has released patch versions of their frameworks. And even after that, users would still have to update their devices to fix this issue. So we think this vulnerability will be around for some time. So to conclude, IoT devices can have vulnerabilities and undocumented functionalities, and this makes it harder to audit them. And code introduced on the supply chain might never get reviewed by the vendors. And when these devices are OEM manufacturers, well, they end up sharing code. And this means that they also share vulnerabilities. And from an attacker's point of view, this is a perfect scenario because they can find high impact bugs with a little prior knowledge and with little investments. So here are some references, if you want to dig a little bit deeper on the topics we covered on the talk. And well, thank you very much. And if there anyone has a question.