 Hello everyone, and thank you for joining my talk today called Don't Be Silly, it's only a light bar. Before we begin, just a brief introduction. My name is Eyal Itkin, I'm a vulnerability researcher working at Checkpoints Resource Group. In my research projects, I tend to focus on embedded devices and network protocols, and in which case I combine these both together. You can easily read more about my recent publications on my Twitter account. The motivation for this specific research started a few years ago in 2016 when I heard of a new smart IoT device, smart light bulbs. I was already quite familiar with many smart gadgets and even smart air conditioner, which allows you to activate it remotely on your way home so the temperature will be just fine when you arrive. But why would I need a smart light bulb? I only need a light when I'm physically in the room and I'm fully capable of toggling the switch on and off when I enter and leave the room. Why would I need an app for that? Well, I figured out that it won't catch, people won't bite, and that's it. And apparently I was wrong. Because according to a report from the end of 2018, more than 400,000 households in the UK were already using the smart lighting solution, so people are apparently buying this gadget. Okay, but if I connect my light bulb to the mobile phone or to the internet, there will be some security risk about it, right? But when I ask people about it, I always receive the same response. Don't be silly, it's only a light bulb. It's fine. Obviously, I'm not the first to study this test case, and there is some great firework before me. We will start with a light bulb worm by Colin Loflin and Aya Ronin, which presented their research on blackhead USA on 2016. Later on, they developed the research into IoT Go's nuclear, creating a ZigBee chain reaction. This time they were joined by Adi Shamir and Achiyo Vangarten, and you can read about it on their website. I really recommend you reading about the research because it was so cool they actually took over all of the light bulbs on campus in a war-flying demo from their drone. Since I already was familiar with Aya Ronin's work, and both Aya Ronin and Achiyo are all colleagues of mine, I talked with Aya Ronin and together we decided that with his help I am going to continue on their research and take it to the next level. So, what did they find? Well, it turns out that attackers can remotely steal a light bulb from a given ZigBee network and force it to join the drone. Once it joins the network, attackers can send the light bulb a malicious firmware update to fully take over it. And if that wasn't enough, it turns out that even a regular light bulb can steal other light bulbs and you don't need any fancy RF equipment or an antenna in order to do it. Using these three primitives together, the researchers claim that you can take over a selected set of light bulbs in a given city. From each light bulb you take over the other light bulbs in close proximity and so on and so forth in order to propagate and take over all of the light bulbs in a given city in a nuclear-like chain reaction. Sadly for us, the vendor did fix the last vulnerability and we were only left with the first two. If you put it visually, we can see the network diagram over here. On the right, we can see a smart light bulb which communicates with its bridge, the controller, over ZigBee radio. The bridge is connected to both the radio network and the commuter network with an Ethernet cable so we could control the light bulb from our mobile phone through the controller. The attack itself will start when the attacker will transmit ZigBee factory reset messages to the light bulb in order to confuse it and convince it to join their own network. Once you join the network, attackers can send the malicious over-the-air firmware update to the light bulb to fully take over the light bulb itself. And while the original research only focused on the light bulbs, we are focused on a much broader target and we want to take their research to the next step. We want to infiltrate the network. In order to infiltrate the network, we are hoping to find a vulnerability so that our light bulb will be able to remotely exploit the ZigBee vulnerability in the bridge itself in order to take over it. Once we took over the bridge, the bridge can launch attack inside a computer network and hopefully allow us to take over the computers inside the network itself. Once we understood our attack vector, it's time to get started. But first, some preliminary slides on ZigBee. ZigBee is a suite of high-level protocols for closed-box AMD networks. It's an IEEE 802-154-based specs that defines a low-range low-power radio. Now to be confused, we have IEEE 802-11 which stands for the more commonly used Wi-Fi. The maximum transmission unit, or the MTU, over ZigBee is only 127 bytes long. And I'll repeat that. The maximum packet we can transmit over the air contains only 127 bytes. And with this harsh limit, we should be able to somehow exploit vulnerabilities that we hope to find in the bridge. ZigBee has a full network stack of its own. Like any other network stack, it's out to the lower levels, which is the physical layer and the medium access controller, which acts as the internet acts in normal computer networks. On top of that, we have the network access layer, which is responsible for both routing and encryption. Above that, we have the application sublayer, which is responsible for routing our packet to the respective application in the upper layers. This application could be ZDP, which stands for ZigBee Device Profile and is responsible for managing different aspects of the ZigBee protocol. It could be ZCL, ZigBee Cluster Library, which is responsible for managing the configurations of the different devices. And it could be whatever other application that the vendor decided to implement. Now that we understood how ZigBee works, it's time to meet our target. The target for this research is going to be the Philips use smart lighting, now under Signify. Signify controls approximately one third of the market in the UK, so it's a broad target. And we're going to focus specifically on the bridge, because as we mentioned earlier, the bridge is connected to both the radio network and the computer network, and we hope to bridge the two. Specifically, we're going to focus on the second generation of hardware for the bridge, which is the rectangular version of the bridge as seen on the right. If you remove the plastic cover from the bridge, we can see the board. When looking on the board, we can see on the right the main CPU, which is a Qualcomm CPU usually found on Wi-Fi enabled devices. Over the left, we can see an Atomil CPU, which we'll refer to as the ZigBee Modem. And like any other embedded device, we have a clear pinout for serial debug, which turned out to be quite useful throughout this research. The main CPU is a MIPS architecture, which is quite unique, and the operating system is Linux. This is quite refreshing because in prior research projects, we mainly had to handle real-time operating systems, which are harder to debug. This time, since we have Linux, we hope to be able to route it and to access it directly using an SSH connection in order to extract the file of the firmware itself, and in order to hopefully install a remote GDB server in order to debug the device. And luckily enough, Colin of Lin details in his blog exactly how to route the bridge with a step-by-step guide followed by a video. Once finished, we will get a route SSH connection. When I started this research, Aaron told me that the bridge is using a single huge process that's pretty much doing everything, and this is the IP bridge. The IP bridge is a single process that acts as the brain of the device. It is responsible for parsing incoming signal messages, maintaining the different signal state machines, and pretty much doing everything it needs. And like any other embedded device, IP bridge runs with route privileges, which means that once we took over this process, we won't. We don't need any other vulnerability. Now that we saw all of this data, it's time to start looking for vulnerabilities, and IP bridge is going to be our target. We had quite a slow start to begin with because due to technical issues, I couldn't route the bridge. It turns out that I don't have the right hardware equipment, as my younger brother pointed out. And the package we ordered got delayed, like pretty much any other research project I've done thus far. Instead of sitting idle, we decided to start working on the old firmware version, which we received from the original research team. This is a firmware version, which is a few years old, and we really hope that it didn't change a lot throughout the years, but we really don't know. We simply have to cross our fingers and hope for good. When analyzing IP bridge, we soon enough saw something quite odd. The code expects strings to be found inside and coming messages instead of bits, because we can see the strings as such as ZDP or ZCL or even connection, but the maximum message size is only 127 bytes long. We don't have enough room for textual strings. We have enough room for mere single bits, and that's it. So what's going on? And things got complicated. It turns out we forgot about the ZigBee modem. The ZigBee modem uses Atmel's BitCloud SDK in order to parse incoming ZigBee messages before they arrive to the main CPU. Effectively, it acts as a co-processor, but handles the ZigBee lower network layers instead of the main CPU. It converts the parse messages to textual form and sends them over to the main CPU to be handled. And, of course, we don't have the firmware for it, so it's a black box. It reduces the attack surface on the main CPU because complex parsing is offloaded to the modem, and if the main CPU doesn't handle a specific task, it can't be vulnerable when doing so. And we don't fully control the messages which are sent to the main CPU because they are constructed on the modem itself. And if that wasn't enough, it puts a huge question mark on everything we find because maybe the modem checks it. Maybe there's no check at all. So, we'll have to add this to the list of ongoing issues and really hope for the best. Our first vulnerability attempt focused on ZigBee cluster library, which is ZCL. ZCL is responsible for managing configurations and it really is similar to SNMP in traditional IT networks. It offers a read attribute, write attribute interface, and it supports multiple data types. The first data type is Boolean. Then we have an unsigned integer of 8 bits and unsigned integer of 32 bits and even a dynamic array. And a variable size data type in an embedded device is a true recipe for vulnerabilities. It looks promising. When we dived in to look how ZCL arrays are being parsed in the code, we saw the following code snippet. Here we can see that at first, a one byte length field is read from the message and then a fixed buffer is allocated over the heap with 33 bytes. Don't be confused by the delay slot in MIPS. The assignment into register A0 is done chronologically before the invocation of the malloc function and that's why it's called a delay slot. Effectively, we allocate a fixed size buffer over the heap and that's it. Later on, we read the entire blob from our attacker's packet into the fixed size buffer. And what does it mean? Well, a controlled one byte name copy into a fixed size heap buffer invets the definition of a heap-based buffer overflow. Sadly for us, it isn't a vulnerability until we have a working proof of concept to trigger it. We don't have the latest firmware and things might get changed. The modem might block large ZCL arrays we don't know and we can't test it because we don't have the radio equipment to transmit the attack and check it. Luckily enough, the package arrived quite in time and we can now root the bridge and extract the latest firmware and check if something got changed. And of course, it did. It turns out that now ZCL arrays are no longer supported and instead there was an additional support for ZCL strings. But if we look on the support of strings we can see that the following code was changed. Now we allocate a dynamic buffer on the heap using the one byte line field plus an additional one byte for the added null terminator. This means we don't have any vulnerability this time, which is bad. Of course, we still have some good news because as you can see the firmware was shipped with debug symbols. For some unknown reason the new firmware version supports and was shipped with all of the names of all of the functions in the firmware which was really quite useful for us for reverse engineering the new firmware. So what do we have? The latest firmware version got support for ZCL arrays because now they use strings instead and strings don't have vulnerability in them. It's a good time to search for other vulnerabilities and we did try. We really, really did try and we covered most of the firmware and still we found pretty much nothing. And then we remember the ZCL strings were handled correctly and sent to another thread. But what does this thread do with the incoming ZCL strings? So we traced it down to this code snippet. In this code snippet we can see the same memory allocation over the heap using the same fixed size we saw earlier. So that's quite suspicious. We can even see that later on the buffer is being copied to the newly allocated heap buffer using the length field stored in the struct itself without any side checks. The only condition for reaching this code snippet is to have the fixed size of code of 16 stored in our struct. And when we go back to how we initialize the ZCL string struct we can see that it indeed uses the same opcode 16. So it turns out that someone forgot to finish the migration from arrays to strings. Strings should have been marked with f for strings and then be duplicated using a call to strdup. Instead for some reason internally they are still marked with 16 for arrays and are handled using the same vulnerable code snippet we saw in the earlier firmware version which is quite good from our perspective. So the original vulnerability still exists it is just buried deeper, that's all. Time to start transmitting ZigBee messages and hope to trigger the vulnerability. In order to play around with ZigBee like the previous research we decided to use the ATmega256RFR2Expo board. It enables us to send and receive ZigBee frames over the radio. It should be computationally equivalent to a light bulb which is quite important because we are going to do all of our exploit from this board and if we will succeed then the same exploit could have been performed from a light bulb we took over using the previous research findings. In addition it turned out to be even more important because there are really strict timing constraints in ZigBee and the timing constraints dictate that we actually use C code from the board itself. We can't use Python from our laptop for example. This means that the entire ZigBee code the entire ZigBee stack that we need to implement should be in C and the entire exploit should also be implemented in C. So it's not going to be easy but we'll have to leave with this. And the vulnerable flow is only accessible during the process called commissioning. And what is commissioning? Well, commissioning is a process of pairing and associating a new light bulb. We have two types of commissioning. We have classic commissioning and we have touching commissioning. The Philips U app on Android which we use initiates a classic commissioning process and so we focus on this one. Technically in theory the ZigBee specs explain the entire process. In practice there's a lot of room for vendors to do as they wish and trust me they did. When analyzing the protocol we got quite stuck because there's no documented flow or at least no documented flow we could find. There's enough documentation for the structure of each specific message but we don't know what message is supposed to be sent and when as a response for what so we don't have any clear flow of commissioning. In addition, we can't sniff a full conversation from our light bulb and the bridge because too many messages are sent and they're sent too fast and we simply miss some of them. If that wasn't enough we needed cryptographic key. There is a transport key which is shared between all of the ZigBee light devices but we don't have the key inside so how do we get we need to decrypt the messages from an early phase in the commissioning we will need the key. Luckily for us we are not the first to tackle this issue and in this blog post you can find all of the information for all of the keys used in touch and in commissioning and in classic commissioning with the values of the keys themselves. So we took the keys we inserted them to Wileshark and Wileshark can now automatically decrypt the messages with sniff. The analysis and the implementation part took a lot of effort. A lot of work days. But eventually it worked. We managed to fake our own light bulb you can see a brand new checkpoint research light bulb with the model CPR123 so now our board is doing all of the classic commissioning phase faking our own light bulb so we could continue on. In order to help other researchers with this task we decided to open source the full pickup of the classic commissioning as we sent it from our board. So instead of other researchers having to manually analyze the commissioning phase we can now look on the pickup the decrypted pickup with all of the messages we have sent. The lessons we learned fast for are without user interaction the bridge won't accept new light bulbs which is a good security precaution but it doesn't help us. We will need somehow to bypass this limitation. In addition we have approximately one minute to commission as many light bulbs as we want there is no check about it. The user will only see the light bulbs in the app after they finish the specific phase we labeled ZCL phase which is pretty much the last phase in the commissioning. If we fake other light bulbs and we stop before the ZCL phase the user won't be aware of that which is pretty good for our attack. And the really good news is we managed to trigger the vulnerability during the ZCL phase so we have a success. We really have a vulnerability. There is no state machine check in place we can simply send whatever ZCL strings we want whenever we want without any request to get them so we can pretty much do anything we want with the long ZCL strings as long as we do it quickly during the commissioning. Once the light bulb was fully added to the network it can no longer trigger the vulnerability the vulnerability will only be accessible during the commissioning phase. Time to start the exploit. Just a basic recap on our vulnerability before we continue. We have a linear based buffer overflow over the heap. Our buffer size is limited to 70 controllable bytes and this is because we only started with 127 bytes which is the MTE on ZigBee and the ZCL is quite high on the ZigBee stack so we lost a lot of bytes due to headers. We will have to settle down with the 70 bytes we have. We don't have any byte constraints on our payload which is good. We can send null bytes non-printable bytes pretty much everything we want and the destination buffer is allocated over the heap using a fixed size buffer of 43 bytes. Time to focus the heap. The breed uses uclibc which is an old embedded libc implementation used in embedded devices. The chosen heap implementation inside uclibc is the malloc standard which is the lmalloc behind the scene. Effectively it is much like glibc but with way less sanity checks. This is because uclibc stands for microlibc it is meant to be used by devices without a lot of CPU power so it has to be slim. And being slim means less code and less sanity checks. All of our 3 buffers will fall into the range of what is called fast bins. There's a bin for each buffer size in multiple of 8 bytes starting from 16 being the minimal size. So we have a buffer for a bin for all of the buffers up to 16 bytes a bin for 16 to 24 a bin for 24 to 32 and so on and so forth. Inside each bin we will have a singly linked list of all of the 3 buffers. When examining the code for free in uclibc we serve a following check. If size is less than or equal to some configurable amount called maxFast we will use the fast bins and store our buffer inside them. In order to find the respective fast bin for our buffer we will use the fast bin index macro we can see over here. In short, the macro divides the size by 8 and subtracts 2 because 16 should be the minimal size. But there's no check in place that the size indeed is bigger or equal to 16 so if we will be able to corrupt a buffer over the heap and modify the size we could change the size to be 8 even 0 giving us the indices of minus 1 and minus 2 respectively. In order to understand what will happen if we will use negative indices we can see the malloc stays stuck over here and the variables before the fast bins array. So if we will use index minus 1 we will store our buffer on top of the maxFast before the fast bin array. Storing a pointer over the configurable size for fast bins will be too risky. It will simply ruin the heap. But what will happen if we will use index minus 2? Well, index minus 2 will store the pointer to our buffer before the malloc state variable and before the malloc state is stuck in memory it turns out to be an unused variable simply nothing. So if we free a buffer with size of 0 it will be stored somewhere in memory before the malloc state and no one will reach it. The next buffer will be joined to it creating a ghost link list we can refer to as the dev null fast bin. Since malloc is not aware of this bin it won't extract buffers from it and all of the buffers with free will be joined to this bin but won't be returned to the heap itself effectively enabling us to gradually shape the heap in the desired structure. The plan for our heap overflow is quite simple. We can see our buffer in blue and a buffer adjacent to it in memory in purple. The plan is to modify only the size and the pointer of the adjacent buffer. If we change the size to 1 which means a size of 0 bytes and 1 for the previous buffer is in use then the buffer will go to the dev null link list because size 0 will reflect it to index minus 2 and the pointer will be changed to be our arbitrary address. Effectively we want to get this structure. We want to corrupt the singly link list inside the fast bin so that an arbitrary buffer will be stored or at least malloc will think that an arbitrary buffer will be stored in our arbitrary address. The goal is we want to confuse malloc so it will allocate a buffer at an arbitrary address and then we let us write our data into this buffer. Effectively giving us a write what were exploit primitive. The heap shaping strategy is as follows If we overflow the free fast bin buffer then we won. This is exactly what we aim at. We corrupted the singly link list inside the fast bin and we will get our allocation. If we overflow the use buffer then when it will be freed it will go to the dev null fast bin and we won't see it any longer. If we overflow the use buffer that leaves forever oh well nothing would happen and no one will notice it. But if we overflow the free large buffer then we're in trouble. We'll probably crash soon so we really don't want to do that. If done correctly we will get the desired malloc wire printed. Malloc wire will grant us the ability to write over the got. The got is the global offset table. It's a global table of function pointers used to execute library functions such as free, malloc, sleep memcopy and pretty much every other library call. And luckily for us the got is a fixed address. The modified function pointer will jump to our shell code or at least in theory because it sounds easy in paper but it's way harder in real life. Time to construct the shell code. Location, location, location we need to store a binary shell code in a fixed global address. The problem is we only get textual messages from the ZigBee modem so we don't get any controlled binary data into the main CPU. We found only one good candidate for such a possible buffer and this is the ZigBee phone book. This is an array of ZigBee addresses seen by or advertised to the bridge. It can hold up to 65 records of 16 bytes each, giving us an upper bound of approximately one kilobyte for the entire shell code. It's not a lot of bytes but it should be enough for us. The enable record inside the ZigBee phone book is constructed as follows. The first 8 bytes are for the extended network address, which we fully control. After that we have additional 2 bytes for the short network address, again fully controlled. And our stock of lack ended, which is quite reasonable because now we have 6 bytes for miscellaneous fields and we don't really control them. So we have 10 consecutive bytes we control but we don't control the entire memory. Oh and about that, it turns out the bridge is unstable when it gets more than 20 records so this is going to be quite a small shell code. The initial plan was simple. It looks infeasible to restore the execution flow because we pretty much were in the heap. Instead, our shell code will try to patch the binary on the file system and install a backdoor. After the process will crash a daemon will reboot the binary with our backdoor and then we will trigger it from another light bulb. The problem is both the patch and the file path don't fit in 10 consecutive bytes each so we can't decode them properly in the ZigBee phone book. The solution for this problem is purely academic. An ideal shell code by reduction will be use the 10 consecutive bytes per record to build a decoder and the decoder will decode the rest of the records to be nicely ordered in memory. In MIPS 16 a jump to the next record will only cost us 2 bytes giving us 8 spare bytes in each record to construct our decoder. But there are problems with this plan. We need to clear the instruction cache before jumping to the unpacked shell code because this is a MIPS CPU and we crashed when trying and if we use sleep to clear the cache as people usually do the watchdog will kill the process because it doesn't respond. And we don't have enough records to silence the watchdog sleep and use a decoder so the ideal shell code is good on paper but it's not practical. Instead we want for a more bold design. We will restore the execution flow. We really don't have any choice. This means we will call MProtect to modify the permissions of the code in memory from executable to be executable and writable and we will install the backdoor in RAM during the execution. A few days and one hand crafted shell code later on we have a shell code that fully restores the execution. It restores the got, it restores the heap and it restores pretty much everything we need. In addition the shell code will only cost us 16 records which is well in budget less than 20. And here we can see the shell code in IDAR. From the first record we will jump to the second and then we will jump to the third and so on and so forth until we install the backdoor and restore the execution flow. When we connect all of the dots together the backdoor shell code will give us an arbitrary write primitive. Our exploit will fake an additional legitimate light bulb but we leverage this arbitrary write primitive we install and we will use it to write Scout's loader to memory. Upon execution Scout will load a full payload which in this case we chosen to use a thermal blue. Here we can see that Scout dropped the file on temp exploit with root privileges and this temp exploit file will use a thermal blue to attack computers inside a computer network which is a good time for our demo. Here we can see the light bulb and the user using it. We can see that it works the app works fine and the user can modify the color of the light bulb as he wishes. When the attack starts he will try to hijack the light bulb and steal it from the network and once he joins the attacker's network the attacker will change the light bulb to a hideous color to annoy the user so that the user will see that something is wrong with his light bulb. So we took over the light bulb and now it's ours. When the user sees that something is wrong you can see that he no longer controls the light bulb, he can change the light bulb's color and it seems to be unreachable when it sees it over the app. The only way to reinstall the light bulb is to delete it and tell the bridge to search for new light bulbs and this is where our attack will officially start. We will fake a legitimate light bulb so the user will be happy, this is our light bulb and we will fake it as the user actually configures the light bulb as he wishes so the user won't see that something is wrong with our fake light bulb. Behind the scene the full attack will take place. Here we can see that the bridge is connected to the laptop so it's connected to the computer network and now behind the scene we will use our vulnerability, shape the heap and take over the bridge and from the bridge we will use eternal blue to attack the computer inside the network and pop a cock. So our attack worked. Time for the coordinated disclosure. The vulnerability was reported to signify on the 5th of November 2017. The vendor confirmed the vulnerability on the same day which is quite impressive. It's even more impressive because not only did the vendor send its acknowledgement on the same day when it received our report it actually acknowledged the existence of the vulnerability which it found in the code according to our report so the vendor did pretty much everything in a single day which is very impressive. Later on we should attach via an automatic update on January and you can find full details and advisory in our blog post and the official CVE issued for this specific vulnerability is CVE 2020-6007 and if you have Philips you're lighting in your office or in your home don't worry all of the products should have received the update by now. Conclusion Even with an MTO of 127 bytes ZigBee vulnerabilities are indeed exploitable as we've shown in our demo. Security mitigations only work when they are on by default. The IP bridge binary was compiled to use static fixed addresses so we were able to target the got which was on a fixed address even without an information leak vulnerability. The binary was not compiled with any secondaries the got was writable so we could do pretty much everything we wanted to do once we had an initial vulnerability. Still there was a SLR in place for the heap for the stack and for the loaded libraries and that's thanks to Linux. So if it's on by default you will get it but if you need to manually do something in your Mac file you won't get it in most common cases and small devices are becoming popular by the minute and yet we can't even trust our own light bulbs so maybe we should do something about it. We can't finish this research and this presentation without saying special thanks to everyone that helped make this research possible. So we have Eyal Ronyan for the research idea and the active guidance Colin Oflin for the detail write-ups on routing the bridge and the entire hardware analysis of it for Peter for publishing the ZigBit transport keys for the light bulbs which we used and couldn't continue without and for Yaron Itkin for the crucial hardware support along the way. Thanks little brother. And last but not least for the entire CPR research team for their support alongside this very long research. Until the next time thank you for joining this talk.