 Hello everyone. Uh welcome to my talk called uh select code execution from just about anything using SQLite. Where we will gain code execution using malicious SQLite databases. So quick who am I? My name is uh Omar Gal. I'm a vulnerability researcher from Tel Aviv. I've been working in checkpoint research for the past three years. This is where I done this research and I've recently moved on to a new startup called Hunter AI. So the uh agenda for today will start with a little motivation and a backstory for this research. We'll then talk a little bit about SQLite internals. Uh we will examine the attack surface available to a malicious database and discuss some previous uh work done in the field of SQLite exploitation. We will then move on to memory corruptions and the possibility of exploiting them using nothing but SQL queries and demonstrate our innovative technique called query oriental programming or COP. That's a registered trademark. Uh we'll take it for a spin and a couple of demos and discuss some possible future work and conclusion. So let's get started. The motivation for this research is quite obvious. SQLite is one of the most deployed pieces of software in the world. Whether it's PHP 5, 7, Android, iOS, Mac OS, it is now built into Windows 10. It is practically everywhere. Yet querying a database uh is considered safe. And hopefully by the end of this talk you will realize why this is not necessarily the case. So it all actually started with password stealers. And it's a pretty popular type of malware. And there are several uh types in uh in this field but the story is usually all the same. First of all a computer gets infected. Then the malware collects the stored credentials as they are maintained by various clients. Some of these clients actually store your secrets within SQLite databases. Uh so the malware's just ship those databases to its C2 server where they are all parsed. And uh your secrets are stored in a collective database with the rest of the loot. So it all actually began when my colleague and Iomwe were looking at the leaked sources of a very known password stealer. And then we thought to ourselves these guys are just harvesting a bunch of our DB's and parse them in their own backend. Can we actually leverage the load and query of an untrusted database to our advantage? Uh if we could this might actually have much bigger implications just because SQLite is so popular and used in countless scenarios. And so began the longest CTF challenge of my life. So uh a couple of words about SQLite. So unlike most database engines it doesn't have the client server architecture. Instead it just reads and writes directly to the file system to files. So you have a complete database with multiple tables and indices and triggers and views and everything is contained within a single file. So let's examine for a second the attack surface available to a malicious database. Uh again this is the leaked sources of a very uh known password stealer and we have two main points of interest here. The first one is SQLite open where our potentially malicious database is about to be uh loaded and initially parsed. And the second point is obviously the query itself. Now do you notice that this query is uncontrollable to us right? It is hard coded within our target. Yet we have some control over the database so we should uh it should be beneficial to us to learn this process. So let's break the attack surface in two. First of all the SQLite open. So actually it's a bunch of setup and configuration codes, a bunch of loaded uh a bunch of modules are uh loaded. And then we move on to straightforward header parsing. And actually the header is not that long it's only 100 bytes long. And it was already fuzz to death by AFL. So probably this is not a very promising path to pursue. The second part of our surface uh the query itself and it might be a bit more interesting because using SQLite author's words the select statement is the most complicated command in the SQL language. And actually uh SQLite is a wonderful virtual machine. So every query uh is first compiled to uh some byte code. And this is also known as the preparation step. Uh so SQLite prepare would actually work and expand your query. So for example every time you write an asterisk it goes and behind your back replace this asterisk with all the column names. Uh so SQLite uh locatable actually verifies that all the relevant objects do exist in the and locates them uh in memory. So where does this object exist? Uh every SQLite uh database has an SQLite master table that defines the schema for the database. And this is its structure. You can see that it's actually a table and uh every entry here there has the type of the object. So like a table or a view and its name and root page because if you recall everything is contained within this file and a field called SQL. And SQL is actually the DDL describing the object. So DDL stands for data definition language. And you can sort of look at it like uh your header files in C. So they are used to define the structures and the names and the types that are used within the database. Furthermore also appear in plain text within the file. So let me show you an example. Here I create a very simple database. I create a table and then I insert a couple of strings into it. Then I quit the SQLite interpreter and I hex-dump the file. And you can actually see the DDL in the text the DDL of SQLite master in plain text within this file. Uh so let's go back to query preparation for a second. So we have SQLite locate table uh that attempts to find the structures describing the table that we are interested in querying. Uh so it reads the available uh in SQLite master. And if it's the first time that it's doing so it has a callback function for every DDL statement. This callback uh actually validate that the DDL is indeed valid and it builds the internal data structures of the object. So then we thought about what, what about DDL patching? Can we simply replace the SQL query within the DDL? And turns out that this is the uh uh callback function that I mentioned and you see that the DDL is verified to begin with create space and only then it moves on to preparation. So that's actually a constraint. Our DDL must begin with create. However uh it does leave some room for flexibility because uh to judge by the documentation actually many things can be created in SQLite. So you can create an index and a table and a trigger and a view and something we still don't understand called virtual table. And then create a view gave us an interesting idea because views are simply prepackage select statements, right? And they are queried similarly to a table. So selecting a column from a table is semantically equivalent to selecting a column from a view, right? Uh so we move on to the concept of query hijacking. We are going to patch SQLite master DDL with a view instead of a table. And our patched view can actually have any select that we wish. And now using our select sub query we can actually interact with the SQLite interpreter. And this is a huge step forward, right? Because the query was uncontrollable by us at first. So now we have some control over it. Let me show you an example. So let's say that the original database had just a single table. And it's called dummy and it has two columns inside it. The target software uh would actually query it uh the following way. We just select those two columns out of the table, right? But the following view can actually hijack this query. So if we create a view that it's called dummy, it has uh the exact number of columns inside and the same name. And every column can have any sub query that we wish inside it. So again let me show you a practical example. Here I create a database. And inside it I create a view with two columns. Every column is actually a function. So the first column would call the SQLite version function that simply returns the SQLite version in use. And the second column will utilize SQLite own implementation of printf. That's right they have their own implementation of printf. They must be insane. So now when the target software actually queries uh this uh what it thinks to be a table but it's actually a view, we can actually see our two function executing. And again we just gain some control over the query and this is a huge step forward. But then the question is what do we do with this control? What primitives does SQLite have? Do we have any system commands? Can we write to other files in the system? Maybe read some more files. Uh so this is a really good uh point to look back at previous research done in the field because we are definitely not the first one to know this SQLite huge uh potential in terms of exploitation, right? So a reasonable place to start is SQL injections, right? And there are a couple of known SQL injection tricks in SQLite. The first one has something to do with attaching another database and creating a table and then inserting a string into it uh so you are able to write your own web shell and this is all very nice but we can't do it because if you recall we can't start any statement with attach. Our DDL must begin with create so it's not a very uh uh useful trick for us. Another common thing to do in SQL injection uh is loading uh remote extension like here in this example you can see it actually loads the interpreter DLL. However it is also disabled by default so again no go with this trick. Let's talk about memory corruptions for a second because SQLite is such a complex piece of software and it everything is written in C. Uh so in his amazing blow post finding bugs in SQLite the easy way Michael Zalewski uh the author of AFL described how he found 22 bugs in just under 30 minutes of fuzzing and this is pretty amazing. And interestingly since then that was uh version three eight ten that was 2015 SQLite uh actually started using AFL as part of their uh remarkable test suit. Yet these memory corruption all proved uh to be really difficult to exploit without a convenient environment right but the security research community soon found the perfect target. And it was web SQL. So web SQL is essentially a web page API for storing data in databases. It is queried from JavaScript using SQL and it has an SQLite backend and it is available in Chrome and Safari. Here we see an example of how to uh query SQL to SQLite using JavaScript. It's very straightforward. But in other words what I'm hearing here is that we have untrusted inputs into SQLite available uh reachable from any website on the internet uh in two of the world's most popular browsers. Uh and now every memory corruption bug could actually be leveraged with the comfort and knowledge of JavaScript exploitation. So there have been uh several impressive researchers uh that were published uh about web SQL. From very low hanging fruits like CV 2015 70 36 that was an untrusted pointer dereference in FTS tokenizer uh to more complex exploit presented in blackhead 2017 by the awesome chai team that was a type confusion in the FTS optimizer uh to the recent Magellan bugs that were just uh uh presented by the Tencent team that was an integer overflow in uh FTS segment reader. And if you guys are paying even a tiny bit of attention here you might see an interesting pattern arises right. All these functions starts with FTS. So what is FTS? I've never heard of it and googling it just even left me more confused. Well after some time I came to the realization that FTS actually stands for full text search and it is something called a virtual table module and it allows for textual searches on a set of documents. Again using SQLite offers words it's just like Google for your SQLite database. So that's pretty cool. And a virtual table actually allows for plenty of cool functionality in SQLite. So we have FTS that we just described. Uh there is also R3 a virtual table that allows for some uh clever geographical indexing or the CSV virtual table that lets you treat your database as you would a CSV file. And actually all these virtual tables are queried just like regular tables. But behind the scene some dark magic happens because every query invoke a callback method on something called shadow tables. So shadow tables. Uh for example let's look at a virtual table that is created using FTS uh virtual table module right. So we create a table and we insert a string into it. Now obviously to allow for some efficient search we need to have some metadata right. We need to have some indexing or pointers or token or some tokens or stuff like that. So obviously we have some the raw text and some metadata. But this one virtual table is actually comprised from three shadow tables. So the raw text would go to a shadow table called content and the metadata would go to the segdeer and segments. And actually each of these shadow tables uh they pass information between them. They have all those interfaces. And those interfaces given that they have such uh trusting nature they are really fertile ground for bugs. Uh many of the bugs in SQLite are uh uh presented there. So uh it's a really good thing to look at if you're hunting for bugs. And let me show you an example of a bug that I found in one of the interfaces of the Archie virtual table. Uh so again uh Archie is a virtual table module and it is available in macOS and iOS and now also built into Windows 10. And it's used for some geographical indexing. I'm not entirely sure what that means but the structure should be the following. A table should begin with ID that is supposedly an integer and then you have a bunch of coordinates like X and Ys. Uh so Archie interfaces would obviously expect ID to be an integer. But if we create a virtual table using the Archie module and will insert uh a value where ID is definitely not an integer but it's a string and then we use the uh Archie node that is one of the Archie interfaces. We get the following CVE. That's an out of bound read. Uh and it's really cool because this bug is now available in iOS and macOS and as you can see here also in Windows 10. So let's go back to exploitability for a second. We realize that virtual tables do have bugs right? And using our method of query hijacking we can actually trigger them at the C2 and cause it to segfault. But gaining flow control requires some formal scripting right? Because segfaulting is not enough. We write we want to uh write a complete exploit. We want to bypass ASLR. We want to have some logic. However we don't have the JavaScript interpreter so we don't have any variables or arrays to use like you would in web SQL exploitation. However we do vaguely recall hearing somewhere that SQL is to ring complete. So we decided to put it to the test in terms of exploitation. And we started writing our own wish list for exploitation primitives. So obviously we'll need to leak some memories right? We want to bypass ASLR. Uh we'll have to do some common tests like unpacking 64 bit pointers to integers and then do some arithematics with them right? We want to calculate where functions are or where the heap is located. Afterward we want to pack those integers back to little and 64 bit pointers. And maybe we also want to fake some objects in memories. This is a really uh powerful primitive that is uh really helpful in many vulnerabilities. And uh we want to know how to heap spray uh because why not? It might be really really useful. So the question remains can all these primitives be achieved with nothing but SQL? And the question is, yes it is. So I proudly present to you something we call query oriented programming. And to explain it we will uh exploit the unfixed CVE 2015 7036. And you might ask yourself what how come a four year old bug is still unfixed but this is exactly our point. It was only considered vulnerable from uh the context of the untrusted web SQL. So it was mitigated accordingly. It's just not available in web SQL anymore. However so it was it is black listed actually unless you compile SQLite with a certain flag. Uh but these platforms are still vulnerable. So we have PHP 5 and PHP 7 and iOS and macOS all still vulnerable to this CVE. So let's describe the vulnerability a little bit. So we said that it has something to do with the tokenizer. So a tokenizer is a set of rules for extracting terms from a document or a query. And the default is called simple just split those strings by white spaces. However if you like you can uh register custom tokenizer using the FTS tokenizer function in an SQL query. I will repeat it slowly. You can actually register a function using an SQL query. You are about to pass a raw pointer to an SQL query. This is absolutely insane. Actually I have no idea how to really use this functionality other than in my exploit. So FTS tokenizer is actually an overloaded function. And if you call it with one argument, so simple again this is the name of the default tokenizer. It actually spits back the address of this tokenizer. So to make it a bit more readable we'll use the hex decoder. And we can see that we actually got a pointer to Lib SQLite but it's the other way around because little endianity. And uh if you call it with two arguments so the first one is the tokenizer name, the other one is a raw pointer. Uh you actually override the address of this uh tokenizer. So everyone trying to use this tokenizer, everyone instantiating a virtual table will try to uh initialize it and will then crash and burn with segmentation fault uh at the address given to it. So a little recap. Uh we established that SQLite is a wonderful one shot for many targets. One good bug in SQLite is one good bug in so many platforms. And we realized that is a complex machine written in C and using query hijacking we can now actually start triggering bugs. And we aim to write a full exploit uh uh implementing all necessary exploitation primitives uh with nothing but SQL queries. So our exploitation game plan is as follow. We are going to leak some pointers and then we'll calculate some function addresses. Uh we will create our own fake tokenizer with some pointers to system to allow us to actually execute code. We will override the default tokenizer and trigger it right. And then something will happen and I know what you think is going to come up next. But we are not going to profit, we are going to grab your grandma's yahoo password because this is what is available in password stealers. So starting with the memory leak uh we first of all we need to gain a leak to lib SQLite right. Uh so we already know how to do it right. The FDS tokenizer just give us the address. But we have a slight problem because it's in little indian so we need to flip it. So surely we can read the pointer in a reverse fashion two characters at the time and concatenate everything together. So our leak would eventually uh be the following query and we actually now have a leak to lib SQLite. This is really really cool. Another thing that is going to be useful for us is a leak to the heap right. So I'm going to do something pretty similar to the archery bugs that I've shown you. I'm going to confuse a virtual table interface. So again I'm creating a virtual table and I'm inserting a string into it and then I will confuse the match interface uh that is usually passing a pointer. But instead of uh passing it to some other interface that is expecting this pointer I will simply pass it to the hex decoder again. So now we read this uh pointer and it is indeed a pointer to the heap again in uh little indian but using the trick above we should be fine in flipping it. So we can cross that off the list. We know how to leak memories now it's time to unpack some pointers. Uh but before we do that we actually have a slight problem because unlike uh browser uh web SQL exploitation we don't have any javascript variables or arrays to use. Uh and this is a big problem because we need to create some logic. We want to calculate things. We want to store things and naturally storing things in SQLite uh requires you to have some insert uh but we can't use insert right. We have to begin with create. And we can only create tables and views and indexes and triggers. So then we thought about chasing uh chaining a couple of views together and use them sort of as pseudo variables. Again let me show you an example. We create a view called little indian leak uh and we use the vulnerability as mentioned in the earlier slide. Now we create another view leak and notice how it refers to the first view right. It's selecting from little indian leak. Again we are doing the trick from before. And now we're actually uh we actually remain with a pseudo variable called leak that is actually all these chains together. So we have one pseudo variable that contains all the calculations from before. And again this is a huge step forward because we want to create some logic. Uh our exploit should bypass ASLR and those kind of things. We need to be able to store things and uh this would really help us. So unpacking 64 bit pointers. Uh to calculate the base of an image or the heap we have to convert our pointers to integers. And this can be done using the following query. So again we start with sub SDR reading one character in a reverse fashion. Then we take this character and we use it in in SDR. In SDR is just like SDR char. And we get the value of this hex character because it is one base. You have the minus one on the side. And then you have some multiplication and shifting dark voodoo. And you simply return it for every character in the pointer resulting in this monster query. But eventually we actually unpacked uh the pointer and now we have the integer value. So again we can cross that off the list as well. Moving to pointer arithmetics and it is actually really really easy right now when we have uh integers at hand, right? All we have to do is uh in one sub query we use our pseudo variables and the other one can have any constant them we wish. And now we know where lib escalite is located. So uh packing 64 bit pointers because we read some pointers and we manipulated them. Now would be a good time to write them somewhere. Uh so we thought char is going to be really useful here because we are used to char being the reverse of hex, right? And it actually works fairly well on some values. But on higher values this was a problem because they were translated back to their uh two byte code points uh in unicode. And this was a huge obstacle for us. And we actually uh uh had a really big problem with this. And after bashing our head against the documentation for about a week we suddenly had the strangest epiphany that our exploit is actually a database. And if I want any conversion to happen at all I can simply uh prepare a key value table in advance while I'm generating this database and then use some sub queries. Uh so this is the python function that is actually generating this hex map key value table. And you can see that it's a very simple for loop from zero to ff uh just inserting all the values that I wish. And now our conversion uh use sub queries that are selecting from this hex map so you see how the view is referring to hex map. And again we have some uh shifting and modular dark magic and we concatenate everything together resulting in this query. So now we also know how to pack 64 bit pointers we are moving forward. Let's talk about faking objects in memory because writing a single pointer is definitely useful but it's not enough, right? Uh we all want to fake objects in memory it's a really powerful primitive and if you recall uh FTS tokenizer actually require us to assign a tokenizer module so we need to fake one. And a tokenizer module is the following struct. Uh we don't really care about most of it uh it starts with an integer and then it has three function pointer. First of all we have the create which is the constructor then we have the destroy which is the destructor. Obviously we want both of them to be valid because we don't want to crash during our exploitation. The third function pointer uh open actually tokenizes a string. So it gets a string as an argument. This would be a really good place to put our system function so we can execute code as we wish. So I've used most of my SQL knowledge by now but I still have one more trick up my sleeve. I remember join queries. So we are going to fake an object using the following join query, right? We can see that we started we created a view called fake tokenizer and it's concatenating a bunch of A's and then uh a packed version of the simple create address and then another packed pointer and then a bunch of B's with uh a join query. So actually if we verify it from a debugger we can actually see that it works pretty good. The memory section begins with a bunch of A's and then we have two pointers and then we have a bunch of B's. So we have successfully faked an object in memory using nothing but SQL queries so far. Uh so for our final primitive uh we want to do some heap spray. Uh because now we have our malicious tokenizer, right? And using our leaks we also know where the heap is but we are not entirely sure where our tokenizer is between uh it is inside the heap. So it's time for some spraying and ideally that would be some repetitive form of our fake object primitive, right? Uh so repeat sounded like a really good option to us. Uh sadly SQLite did not implemented for us so we had to do it by ourself. Stack overflow for the win we found this really elegant solution. So the following query uh it uses the zero blob function that returns a blob consisting of n bytes uh they're all zeros, right? And then we'll use the replace function. We're going to replace each and every of those null bytes with our fake tokenizer. So again the colors here are pretty bad but verifying it with PONDBG uh we actually see that we got perfect consistency. Our fake tokenizer is repeating itself every 20 bytes so this is perfect. And it looks like Christmas came early because we can go on to pawning shit. So again our target is the following code, right? We have the password stealer and it selects a column called body rich from a table called notes, right? So we are going to create a view that is called notes and it will have three sub queries inside. We're going to start with a sub query called uh a heap spray and then we'll override the symbol tokenizer and then we'll trigger our malicious tokenizer. And you might ask yourself what is a heap spray? Well heap spray is obviously a cop train. It's another view called heap spray that utilizes our heap spray crazy capabilities. We start with a bunch of a's and then we concatenate one of the pointers that we are interested in. P64 simple create for example. Needless to say that P64 simple create is another view, right? That actually refers it's a packed version of U64 simple create. The party goes on because U64 simple create actually goes back to U64 lib SQLite that utilizes some of our pointer arithmetic capabilities, right? And everything is derived from our initial leak, the U64 leak. But it's actually turtles all the way down. U64 leak refers to leak, right? And our unpacking capabilities. And leak goes all the way back to our initial vulnerability. And every time I describe this cop train, this is how I must look, right? I must look insane. But luckily for you guys you don't have to look like me. Because we created cop dot pi. And cop dot pi is a really useful Python library generating these crazy long statements, right? In something in like the style of pond tools. So creating the past chain is actually only this like four lines of Python. And it's going to be available in our GitHub right after this presentation. So now that we have everything in order, let's show our first demo for today where we will own a password sealer backend that is running the latest PHP 7. So this is our module, this is our panel. Obviously it's just a model that we set up from the leaked sources. And you can see all those victims that are infected. And we are trying to go to p dot PHP where our web shell should be. Obviously it's still not there because we have yet to exploit it. Moving on to the attackers computer we see that we have two scripts. The first cop dot pi will actually generate our malicious database. And we see using LS that a database was indeed created. Now we are going to emulate an infection, right? So we are going to send this SQLite database to the C2 server as if we were infected and it is a bunch of password that is interested in. And this process takes a bit of time so we can look at all the crazy statements above. We started with the leaks and then we unpacked them and packed them again and manipulated. And you see that at the bottom our payload is actually creating p dot PHP with the simplest web shell. So after our exploit will run successfully we'll go to that page. Now that it's finished we go to p dot PHP again. And we see we got 200, the page to exist. So now we can actually execute some code on the back end. Who am I? W, W, the data. And obviously cutting ETC password. And we got it. Yeah. So actually if you think of it what we just demonstrated is that anyone querying our malicious database can actually be exploited. And this is a lot of fun. And given the fact that SQLite is so popular it opens up the door to wide range of attacks. Let's explore another really interesting use case. So this is iOS persistency. iOS uses SQLite extensively. It is everywhere, right? And persistency is really hard to achieve on iOS because all executable files must be signed. SQLite database being data only are not signed, right? There's no need to sign them. And iOS and MacOS are both compiled with the enable FDS tokenizer compile time flag. So we plan on regaining code execution after the reboot by replacing any SQLite DB, right? So as our target we chose the contacts DB. So this is the name of the contacts. It's address book SQLite DB. And these are a couple of tables inside it. There's nothing really special about those tables. They're just here as an example. We will replace the DB with our malicious DB, right? That will start with two DDL statements. And you guys are already familiar with them. The first one will override the simple tokenizer with a bunch of A's. The second one will actually trigger it, right? It will start an FDS virtual table, try to construct our malicious tokenizer. And now what we're going to do is we're going to go over each and every of the original tables. And using query hijacking we are going to rewrite them as views redirecting the execution to our malicious DDL statements. So you see we select from override and then we select from crash. And then we go and we do it for the second table as well. And we reboot and secure boot was actually bypassed and we got the following CVE. And this is really interesting because you can see that the crashing address is 41, 41, 41, 49. And this is exactly as we expected, right? This is where the constructor X create should be. But that's actually not everything. Because the contact DB within your iPhone is actually used and shared by many, many different processes. So whether it's the contact app or FaceTime, Springboard, WhatsApp, Telegram, XPC, proxy, many, many processes, some of them are more privileged than others. And we've actually established the fact that we can execute code on the querying process. So this actually means that we got privilege escalation using our tricks, right? And actually there's nothing special about contact DB. Any shared database can be used to achieve our goals. All these methods and techniques were reported to Apple and they gained those CVEs if you want to go and read about it later. So if you'll take anything away from this research, I don't want it to be the crazy SQL gymnastics or I don't want it to be a bunch of CVE number. I want it to be the following. Quering a database might not be safe. Whether if it's across reboots or shared between processes or shared between users, querying a database might not be safe. And with COP, actually these memory corruption can now be reliably exploited using nothing but SQL. And we really think that this is just the tip of the iceberg. So far, SQLite was only examined through the very narrow lens of Web SQL. And while browser pawning is really exciting, SQLite has so much more potential from exploitation perspective. So we really want to see where the community will take this research to. And we do have a couple of ideas of our own. So obviously something really cool to do in future work would be to expand our primitives and gain something powerful as absolute read or write. And second, our exploit was actually a really sketchy POC, right? It had a bunch of hard coded constants inside. But you can actually make it really, really clever. Because if you think that your exploit is actually a database, you can choose the right COP chain for you based on the results of functions like SQLite version or compile option, like getting the compile option used. So you can dynamically create your COP chain to be exact to the target that you are exploiting. Obviously we think that these techniques can be used to privilege SQLite almost everything. All we have to do is find a database that is writable by a weak user and queried by a more powerful user. So it's also interesting to look at other platforms than iOS or macOS. And something, another thing that is really interesting is that many of the primitives that we have shown are not exclusive to SQLite. You can actually port many of them like the packing and unpacking. You can port them to other database engines. So this would be also a really cool path to pursue to see how these techniques work in other database engines. And that's it. Thank you so much.