 These gentlemen have come all the way from China to talk to you guys about breaking Google Home and exploiting it with SQLite. Let's give them a gigantic round of applause. Have a great time. Smile, I'm going to take pictures. Have fun. Thank you. Hello everyone. Thank you for coming. We are very excited to share our search at DevCon 27. The title of our talk today is Breaking Google Home. Explore it with SQLite measure and vulnerability. First, let me introduce my teammates and myself. We are senior security research at the Tencent Play Team and this is my teammate, Chen Wenxiang. Now he is focusing on brother security and IoT security and this is Nie Yuxiang. He is focusing on mobile security and IoT security and my name is Wu Huiyu. I'm a bug hunter and a gigapart runner and also speaker of DevCon, Harry and Pussy. Next, I will introduce the Tencent Play Team. Our team was founded by Tencent Security Platform Department at 2017 and now we are focusing on the security research in areas such as AI, IoT devices, mobile devices, cloud and blockchain. In the past years, we have reported more than 200 security vulnerabilities to many companies such as Google, Apple, Amazon, Microsoft and so on. Last year, we shared how to break Amazon Echo at the DevCon 26, so we can again this year. And you can connect us at blade.tencent.com. Next, let's take a brief look at the outline. First, we will introduce the attack surface of Google Home Smart Speaker. Then, we will share the details on how to find and explore SQLite and the curve. Finally, we will summarize our research. Okay, let's start with the first part. Smart Speaker are the most popular smart home devices of the past three years. Amazon, Google and some Chinese companies are the main player in this field. In the fourth quarter of 2018, Amazon and Google's market share has been very similar. After we shared how to break Amazon Echo and Xiaomi AI speaker in DevCon 26, we began to study the security of Google Home Smart Speaker. The Google Home family include four devices, all of which have similar hardware. We choose the best-selling Google Home Mini as the main test devices. The first part is about hardware analysis. We find that the Google Home Mini used the wireless CPU and the Toseba's flash chip, but we did not find any debugging and flashing interface. So, we can only choose to extract firmware directly from the flash chip. Similar to the Amazon Echo, we first use a heat gun to solder the chip from the board. By analyzing the dead shape of the chip, we find a test circuit of BGA67 chip. And the difference is that the pins of BGA67 chip are very thin. So, we have not found a adapter that can connect the test circuit to the programmer. So, we designed a new adapter. It's main function is to explore the pins of the test circuit to a large pitch. The pins so that we can easily connect the chip to the programmer. And this is the finished product that we finally produced. We connect it to the RT908H programmer, which is a universal programmer that supports reading and writing most of flash chips. Finally, we got the RAM image data in Google Home's flash chip. We also need to pass the OOB data and the ECC trig base according to the specification in the data sheet. And extract the complaint system firmware. By analyzing the system first, you can know that Google Home is using a net Chromium or it's... The main function are implemented by the Chromium or it's by the Chromium browser. And the update speed will be a meter slower than the Chromium browser. Okay, my part ends here. And my teammate, Niu Yuxiang, we will continue to share the next part. Thanks, Huayu Wu's introductions. Next, let's introduce the security of our views of Google Home. Firstly, let's look at the security of our views of Google Home. On the OTI mechanisms, the firmware's related results of Google Home is open to public. These results include the loader kernels and related binary programs and some even has symbols. Google Home used the HTTP request to download the firmware. We can also simulate the request for device updates. The latest OTI package can be obtained by the curse command in the pictures below. So we can easily to get the firmware and then try to analyze the firmware. Through the analysis of firmware's and related results, we believe that the security mechanism adopted by Google Home is worth learning. About lens, enabling security boot on IoT devices is a good example. The boot loader, boot image, system images is protected by the security mechanism throughout the boot process. The details as follows. The boot loader and boot image use the same structures methods and are insulated with SHA's and ISA's natures. The boot loader's seen that there is no logic to provide unlocking. In addition, Google Home's also pro-pro-former integrates variations on system image. Let's look at the main programs in the firmware's. The firmware's has memory directly as shown in these pictures. That's contain the main program called Cartagel. The program just like Chrome. So it's program is protected by sandbox. On Chrome OS, the sandbox mechanism's mainly included set URD, UNNN space and sitcom BF. In addition, the system also enable ASLRs and Cartagel's also add UNNX and stack generalists. Next, we will introduce the attack surface on Google Home's combined with existing security research and our taxings. Google Home's attack surface include the following four aspects. Google Home has multiple parts, a port opens. One of the port is the A0A0A is the UNNX TTP servers. We can control some basic operations of Google Home's through this port in the land. There is also a port of A0A0A9. The port is the target of our attacks using the Cart protocol. We will describe in the following sections. And the wireless protocol is also an attack surface. Google Home use the mobile chip, some idea about mobile Wi-Fi and BLE firmware's attack can be tried. It's also possible to try to find the vulnerabilities of the bootloaders and let it load malicious firmware. Finally, we can try to find another other hardware interface. In these sections, we will show you how to extend the attack surface of Google Home. First, let's introduce the basic knowledge about the Cart protocol. Google's allow developers to develop Cart APVs and public it to AP store. In general, the Cart APV includes senders and receivers. Sender devices may be mobile devices or PC running from. Receiver is Google's IoT devices such as Google Home. The entire architecture is as follows. The user assess the sender application URL and the sender application use the Cart protocol to find the receivers in the land. When the receiver is discarded, the sender application will communicate directly with the URLs, with the reader's applications on the receivers with the Cart protocol. In detail, the Google Homes will prove the URLs of each receiver application according to the Cart APV ID and accept the receiver application for the Chrome readers. After our attempt, we found that Cart protocol will have the following security risk including the Cart APV can be any web page. The Cart APV is the AP store may be malicious. And senders can directly trigger Cart protocol which may even require no user interaction. Based on this security risk, we can govern an attack on a Chrome or Google Home into an attack on a browser. Let's take a look at some special states. The attackers is registered as the Cart developers. Cart application can be developed and distributed. When you publish an application, you need to specify a cart receiver URL. However, Google does not audio link or cart APVs. Lands v can specify it as a web page in any context. Remote trigger the Google Home to set any web page. If the attackers and Google Homes are in the land, the attackers can also send a Cart protocol such as the launch APVs request. This request view directly triggers Google Home to set to the Cart receiver URL. To make matters worse, if the routers in the victim's home turn on the UPMP port, following the attacker can also complete the remote sign attacks on the Internet. The attackers modify the Cart receiver URL web page to a malicious page. Lands Google Homes may assist in vendors after visiting the page. So now we only need to Chrome assist to exploit the Google Homes. Now we will introduce other vulnerabilities. Okay, thank you for the introduction. Now I will continue with part three. Fuzzing and manual auditing, SQLite and lab curve. First, why do I audit these two libraries? Because third-party libraries are always sweet, they have less code and focused functions and almost every device has them installed. And the most important thing is Google Home or Google Chrome are using them too. Before introducing the code audit part, I would like to mention some previous researches. First, my close-ups case, Fuzzing has significantly improved the SQLite quality. And then there's a talk on Black Hat in 2017, which also explains the idea of exploiting SQLite. After that, there doesn't seem to be a lot of news about the vulnerabilities of SQLite, but we have found some. In the next part, we will introduce the code auditing and the exploiting of Magellan, which is a set of vulnerabilities in SQLite, and Diaz, which is a set of vulnerabilities in lab curve. The Chromium project come with a father for SQLite. We have made some simple changes to it, such as adding some syntax-based files. When we look back, we found that there was a lot of crash files generated. However, these test cases can only trigger empty pointer dereferencing. One of these test crashes is caused by duplicate primary keys. And when I was bugging, I typed the first three created table statements to see what was going on in memory. However, I was surprised to find that the dot table's command shows six tables. The question is, what those content CdR segments stand for? SQLite manual shows that the tables are called shadow tables. The five types of shadow tables, content CdR segments are for FTS 3 or 4, and state and doc size are FTS 4 only. However, because of those tables are treated like standard tables, you can create corresponding state and doc size table even you are operating on an FTS 3 table. You can create state and doc size by simple created table statement, because SQLite itself is doing this too. And FTS 3 and 4 is sharing some code, which means state and doc size might change the code flow in some conditions. For example, one of our exploits are simulating an FTS 4 table on FTS 3 environment. This is useful because some software like Chromium use only FTS 3 and explicitly disabled FTS 4. This would extend the attack surface by entering some code branch that should never be entered. The shadow table is used as a container to store the content of the full text search metadata. What is shown on this slide is the definition and meaning of each shadow table. We can see that almost every shadow table has a field of type blob. That's because to support full text queries, FTS maintains an inverted index that maps from each unique term or word that appears in the data state to the locations in which it appears within the table contents. It is complicated, but all we know is that compared to other fields, those blobs may have some important influence on FTS queries. In SQLite, the raw binary data is typically represented in the form of X single quotation mark and hexadecimal numbers. However, the blobs here are binary data to represent the entire B3. Since it represents such a complex structure, is it possible to create a memory corruption vulnerability by destroying the data of the structure? Let's read about the documents of the serialized data structure first. You can check the SQLite manual to easily get the definition. I will show you a simplified version so you can understand what I'm modifying when I'm trying to exploit them. Basically, I'm modifying the bytes with the different functional sections to mislead the code flow. The first segment B3 leaf nodes. The first term stored on each node, which is term one in the figure above, is stored verbatim. Each subsequent term is prefixed compressed with respect to its predecessor. Interior nodes, non-leaf nodes, have different structure. And then the dock list format. A dock list consists of an array of 64-bit signed integers. Serialized using the FTS var int format. Each dock list entry is made up of a series of two or more integers as follows. One the dock ID value until the zero or more term offset list. In general, those blobs are just serialized B3 data. When the SQLite wants to perform operations on those tables, it will simply deserialize or pass those blobs and get a complete B3. Then I have found some problems. The problems are mainly treated to merge and match because they are deeply related to the B3. Two of them merge the node of the tree and other is to traverse the nodes of the tree to try to find and match the content. Also, the last one is the crash I mentioned before. It was that brought these series of problems into my attention. And software must enable FTS3 support and support external SQL query to trigger these problems. Okay, the first one, and 20346, which is also the main vulnerability we use later to exploit Google home. It requires a lot of complex preconditions. That is, we have to carefully construct a lot of tables and content. But once the preconditions are met, vulnerability explosion will be very simple and stable. The vulnerability is located in the function FTS3 append to node. It can be triggered with a spatial semantic, the merge equals one two, which means the level to be merged. As you can see, this function will try to append a node to another. The node is stored in the blob, so this function and up function will first pass in the B3 and then get the start position and the length of binary data of the node that will be processed. And last, perform the memory copy operation to copy them into a new blob that represents a new tree. Okay, let's go to its caller function, turnkey node. It will get the binary offset and learns from the blob data that will describe the node being processed. Then the node information is returned in a reader object passed to the vulnerable FTS3 append to node. To control the memory copy in append to node, we need to control a dock list and end dock list which is returned from node reader next. The end dock list is the source of the char pointer to the blob data and end dock list is node size, which are second and third parameter of memory copy respectively. To control them is not a difficult thing. Let's read the code then you will know why I'm saying that it's easy. To save the space of storing an integer, SQLite is a variant integer algorithm. You can just consider FSGetVirus32 as a function to convert a bunch of bytes into an integer. It will store the result in N suffix. Then it will move the current cursor by adding the content of bytes and it will to eye off. The data for a node is stored as a lens following by corresponding data with exactly size of a lens. Normally they should appear in pairs, but we can modify the blob to make the end of the blob boundary. It has only the lens field but no data field. Okay, let's go to the append to node. Since in the last step, a dock list and end dock list is controlled, we can now offload the buffer of P node A in line 3 1 0. Or we can copy some raw memory data to it as long as it does not exceed the aligned value. Then we can query the new table to get the leaked raw memory. By setting up adjacent tables, we can offload the function pointers of the table very accurately to exploit a code execution. And here's another one, 506 in functions scan into real node and the precondiction of this one is rather simpler. All you need to do is modify the shadow table, set a node in sec dr to non-root node. You can change the blob data to change the code flow. Query the modified table by keyword match. Then the code will scan every node inside the B3. Then the code will trigger memory corruption because it has many constrained conditions. It was considered to be very hard to exploit. But this one is exploitable anyway. You can check the wonderful write up by Korean researcher named Anki Chen. And last, 20505. This one is very like to a combination of the previous two. The vulnerable function is set greater next. You can modify the shadow table and mislead the code flow. All these three vulnerabilities can be modified to leak raw memories. So we can also use this to leak address of, for example, functions, structures, global variables, constant variables to bypass the ASLR. And here's another one, lab core. Our target is a remote code execution, but lab core has already been used by a lot of users and their code entries very quickly too. To find a vulnerable function, here are some guidelines for finding problems by reading the code quickly. The first find a big functions. Those functions with a lot of lines is not recommended in software engineering and functions that are too long should be refactored into shorter function fragments. Usually such a large function is difficult to test and there will be a lot of attack surface. And most of the functions enabled in lab core is related to remote interactions and communicate with a remote server more than once. After carefully sifting through the protocols, we confirm that NTRM over HTTP was what we wanted to test. And here are those problems we have found in lab core. We named them as DS, a name of another famous navigator. The first one, 890 is a vulnerability in NTRM type 2 message. It can leak at most 16 4 kilobytes client memory per request to attacker. The result is very likely to be hard to believe, but it's the client version. And the second one, 822 is a vulnerability in type 3 message. It will result in a stuck buffer of flow. Lab core also wrote this in his blog and saw this very bad security issue. Okay, let me show you how this happened. The first one, the vulnerable function is in decode type 2 target. It reads a 32 bit integer from the type 2 message header. Then we know we can set target info offset with a larger value and target info length to a crafted value, which if you add them together, an integer overflow will happen. The overflow result is very small and will pass the check in line 1805. And next memory copy will copy data out of bounds. For example, if we use the value above, it will actually copy data from the front of variable buffer in 32 bit environment. And then the data will be sent to the attacker in type 3 message leaking maximum 64 kilobytes data per request to the attacker. And let's go to 3, 822. This vulnerability is located in a big function named core also created NTLM type 3 message. In the beginning, this function declares a lot of variable on stack memory. The NTLM buffer is a big buffer which has around 4 kilobytes memory. Then the function tries to read NT response length from the type 2 response which is sent from the server. Attack could return a value bigger than the buffer size to client. And next in line 779, this check should check if the size of the data is bigger than NTLM buffer's remaining size. But the inexplicit unsigned cast prevents the check from operating correctly. This check compares two unsigned variable and a macro NTLM buffer size which simply defines a number 1,024. But this value is a signed integer in the view of compiler. When unsigned and unsigned are compared in the same place, some of them must be casted in order to compare correctly. And this is a problem. The compiler prefers to convert signed to unsigned numbers. So if we have NT response length greater than the NTLM buffer size, the result will be a large unsigned number. And it is of course bigger than the remaining size will pass the check. In line 781, because NT response length is bigger than the remaining size of NTLM buff, hence here will be a stack buffer overflow. When the code triggers the stack buffer overflow, the overflow variant is in the middle of a lot of stack variables. Although the program may have stack cookies, an attacker could choose not to overflow that mesh bytes. But instead override stack variants and control the flow of the code. I marked the position where it triggers the buffer overflow. You can see in this big function, there are almost 18 lines after it. And many of them are operating heap or stack memory. And the operating is based on the value of those stack variables. That's the reason why I say big function is never a good coding practice. Okay, my part is done. And Lee Yu-shan will go ahead to introduce how to exploit it. Thank you for my partner's introduction. Now let's review the following vulnerabilities. The key to the vulnerability trigger is how to use the insert state to control the variables used by our memory, memory copy function include PN, P node A, P node N, and A doggalix and doggalix. The insert data is as follow. First, the entire data is stored in the buffer of P node A. We can control the size of the P node A buffers by modifying the length of the insert state. In heap function, we are able to allow P node A to the appreciate areas to cover the target memory. In this state, the ordinary colors indicate that the size of P node N and P N doggalist, which is, we are in type. And doggalist is used to control the length to be overflow. P node N, which is the offset of the right, our right turn memory areas. The green part is the A doggalist status, which can be used for our read or overwrite. The next sessions, we will introduce the ideas of using these vulnerabilities. We expect to find a function pointer that can be used on a heap. When creating our FTS3 tables, the tokenizer is created by default. And the tokenizer default is simple tokenizers. As shown in the following pictures, simple tokenizer is a structure on the heap, whereas it's a member's base points to the SQL tree's tokenizer structures. The member's P models or the SQL tree's tokenizer structure points to the tokenizer models. The tokenizer's models contain some interesting function, codec functions, such as the X create X opens. On our lens, when inserting the FTS3 tables, the X open codec functions will be triggered. If a function address of X opens can be modified to the attribute address, the PCs can be hijacked. And how to use the available functions pointers for PC hijacking? Here's two conditions need to be met. The first, after his overflow is necessary to be able to operate the FTS3 table. In addictions, when hijacking needs to be complete before the memory is released, otherwise the crash will be interrupted. By analyzing the logical memory copy to freeze, we found an interesting function, which performs a SQL update operation before releasing the memory, as shown in the blue tag code. Finally, we used SQL triggers to perform the FTS3 table insert operation before the SQL update operations and free memories. Since, triggering the X open codec functions so that the PCs can be hijacked. The next day is to make a memory layout. The SQL logical of the cartel shell, just team a lot. A feasible memory layout idea as follows. The first, by creating multiple FTS3 tables, we will create multiple simple tokenizer structures. Drop the previous created FTS3 tables at the appreciate times. The simple tokenizer structures will be free. The residing payloads of the same side as simple tokenizer, it has the high probability that payloads will be allowed to the previous release simple tokenizer structures. So our payloads has a greater chance of writing the simple tokenizer structures of existing FTS3 tables. With the SQL triggers that perform the operations of FTS3 tables before the free, there is a chance that the tokenizer model codec functions will be triggered to hijack the PC. When we have the ability to hijack the PCs and control the R0 registers, we only need to be able to list information and by part ASLRs. The previous sections are also introduced, we can try to adjust the PNOTE A and list memory after heap. We need to disclose the following two types of draft. Leaking the draft of cartel shells based on this draft and Opsets calculate the required RP gadgets. Leading the draft of the last heap, according to this address and Opsets, the probabilities of calculating the address of the heap spray. The latest heap sprays and RPs, after these three steps, we can ask this in Google Home Surrenders. Cartel Shell is like a binary program that can tell many of the available RP gadgets. So the RP gadgets used Cartel Shell is more stable and convenient. In the pictures on the right, the realized marks, the red light marks the heap spray units which face tokenizer models and RP gadgets. The X create X open address in the face tokenizer model structures will be assigned the state pyverse address. The blue light marks our face simple structure of tokenizer structures. When the FTH3 tables operate can be triggered. SQLize will get a malicious X open through the structures of simple tokenizers and then I see. Above is the screen source of our asset in Cartel Shells. The red blocks on the left show that the register we can control that they are R0, R11. The function of JET is read from R11 and finally it's go to BLX. Then as a result, we have been able to hijack the PC. The magic on the right show the results of our shell codes. We execute a JavaScript code for fetch URLs and navigator's app name in the EXP.html code. Normally, navigator's app name is read only and is next skip but our shell codes change the app name to AI. Let's look at some actual attack sense. There are three types of attack vectors that attack Google Homes remotely. First, the attacker is located in the LAN. The attacker send the launch ABP ID 1 command through Cartel Protocol and Google Homes will prove the leak .html on the application stop, AB stops according to the ABP ID and load it. At this times the leak days can be obtained by the attackers. The second, the attacker send the launch ABP ID 2 command Google Homes load .html. So that's the AC happen. The entire process does not require user interaction. In the other two sense, you don't need to be on the same on the same LANs to start the attacks. The attacker index the victims to assess the URLs of the same application and we can scan the networks on the routers that if the routers have the UPMP forwardings, we can try to launch remote attacks on the internet. Conclusion. Here is the timeline of the McLean. We have report length in November and it's quickly fixed by SQLite and Chrome users. In the December, Google's decide to get a total 10,000 reverse for the batch of vulnerabilities. And they have a LAN have a enhancement, there is the defense in deep flag. This allowing modifying shadow tables from untrusted source. For backwards compatibility is default off in SQLite. So if you are using SQLite with FTS, you may want to enable this one to prevent the attacker from the modifying your shadow table. But the good news is, it's default on in Chrome from commit in the last November. Here is the timeline of DR these quickly fixed and released by lab crews in two weeks. Also, we have noticed the sensor to urge vendors to disable the vulnerable FTA trees before the patch come out. If they don't use these features. And we have notified the security teams of app house, internet, Facebook and Microsoft about how to fix the program and how to mitigate the phrase in the son of their products. The other is the security advice. Once enhance your system with the new list available defense in different mechanism in a time. The tools keep your, the third pass libraries up to date. Please improve the quality of security auditing and testing of the third pass libraries for, for introduce security specifications into the developments and testing. Thank you.