 Hi everyone. I'm Smea, Smealum, Jordan, whatever fuck you want to call me. Today I'm going to talk to you about Jailbreak King v. Nintendo 3DS. And you might be wondering, okay, why does this matter? Well, truth be told, it really doesn't. It's just kind of a way to piss off Nintendo. And the reason Nintendo doesn't want us to hack their consoles is because, you know, you want to sell games, you want to make money off their games. And unfortunately, once you hack these consoles, it makes it possible for people to play games for free. I'm not really happy about that. The thing is, it's actually also like a really interesting target in terms of security properties, in terms of hacking stuff. So we're kind of in the middle here of like trying to do interesting things, but also, you know, bad results happen. So I'm not trying to make people have the ability to steal games, but it kind of happens. Anyway, first thing about talking about the hacking of 3DS is kind of introducing 3DS. Right, what is this? 3DS is a game console. It was originally released in 2011. There is a new one that was released in 2014. They're essentially the same thing, except the new 3DS, which, you know, is a great name, has twice the CPU cores. It has higher frequency. It has more memory, basically the twice the amount of main memory. And beyond that, they are basically the exact same thing. They are running the same operating system, which is something I'm just going to get into. It's a really cool microkernel architecture. And they both have, in addition to the main CPU, which is what runs your games and stuff, they have a secondary CPU, which is the ARM 9 CPU. So ARM 11 here is what you can see in the CPU box here. It's basically what is going to be running all your games, all your apps. Basically, anything that hits the screen, anything that you can interact with is going to be running on that CPU. On the other hand, you're going to have the ARM 9, which is the console's security slash IOCPU. And so the ARM 9 is basically responsible for doing a bunch of security tasks and kind of brokering access to a bunch of hardware. So in this case, I kind of like showed some hardware devices here. This is not an exhaustive list. It's just a few examples that will come in handy later. But so the idea is that the ARM 9 basically has access to, you know, everything. It has the keys to the kingdom. I mean, it doesn't literally have the keys, actually, because the keys are on, like, this crypto hardware blob over there. But it has the ability to talk to the crypto hardware blob. It has the ability to encrypt and decrypt content, which is really all we care about. And then it also has the ability to access this NAND chip, which is all the, you know, permanent storage as well as the SD card. While on the other hand, if you take a look at the ARM 11, your ARM 11, first off, does not have access to ARM 9 internal memory, which kind of makes sense. But it also does not have access to the crypto hardware. It does not have access to the NAND chip. And so basically, any time that the ARM 11 wants to access a file on disk anywhere, it has to ask the ARM 9 very nicely to give it access. And so that gives the ARM 9 the ability to, you know, broker access to resources, kind of like in, kind of a sandbox model. Now, taking a look at what actually runs on the ARM 11 is, as mentioned, a very cool, I think very cool microkernel based architecture. And so the idea is that you will have as little code as possible inside of the kernel, right? Because that is going to be your high list level privilege on that CPU. You want to have as little code in there and ideally have the most, like, all your drivers and stuff in user mode. And so that's what you're going to see to the right here in the base memory region is you're going to be having a bunch of processes which are called system modules and are essentially just user mode drivers. If you think of a monolithic kernel model like, say, Windows, you would actually have all these drivers live inside the kernel. And what that means is that whenever you compromise one of those drivers, you gain access to the entire system. Whereas here, if you compromise a driver, you just gain access to whatever that driver had access to. Because the way that this operating system works, it actually gives as little privilege as possible, you know, principle of this privilege to each process. And so what that means is that, for example, a game is only going to have access to a small portion of the system call table. Same thing, you're going to have, in addition to games that are running in the application memory region, you actually have applets which run in the system memory region. And applets are going to include anything that can run at the same time as your game. So stuff like home menu, the web browser, the notes taking app, whatever any of that crap can run at the same time. And so it's in the separate memory region. The whole point here is that between the game and the home menu, you actually have access to the same set of system calls, which is fairly limited. As you can see, it's basically like half of all the system call table. But then if you take a look at one of the system modules, you're going to have access to the same set of system calls. In addition to that, you will have access to new system calls, which are going to be privileged system calls. Things that are going to, for example, allow you to create a service, advertise that service. And then in addition, you know, if you have a, an especially special system module, it might have access to a system call that is only accessible from that particular process and nowhere else. In addition to that, you actually need to have the ability to talk to these drivers, right? Because they're not just in the kernel, they're in these little pockets of like, you know, processes here. And the way that this work is basically any given driver or any given system module can advertise a service. And then through the kernel, a game can connect to that service and kind of talk to it directly. And the cool thing about that is that much like the system call filtering, you actually have a service access list. And so for example, a game might not be able to access this AMSIS service, AM standing for application management. So it's a service that lets you, you know, install and uninstall games or applications or whatever. And so it makes no sense for a game like, you know, Zelda to try and install and uninstall new processes. However, it makes sense for a home and you have access to that. And so you have this very granular level of privilege control on the system. And what that means that even if you compromise a game, you might not be able to access all of the attack surface that you want. And, you know, that's actually like a really good security model. And beyond that, you just want to just I just want to mention that, as mentioned, like I said earlier, the arm nine handles a bunch of the of well, the arm nine does handle like a bunch of tasks, such as a crystal crypto tasks, as well as bokeering access to physical storage. And so you actually have to to go from one process to another, and then to the arm nine to complete certain tasks. And so you you have like this very, this very deep level of like, you know, different levels of privilege that kind of like live one on top of the other. And it's not it's not as simple as just, you know, user mode, kernel mode and then the security processor, there's actually different layers and different levels of privileges between those. So then if we take a look just at these, this physical memory separation, because as I mentioned, you know, you have this application memory region, you have the system memory region, you have the base memory region. And so these are actually physically separated memory. And so you have the FC RAM, which is the main bank of RAM. So that's going to be 128 megabytes. And it's actually separated into these three regions, such that whenever you allocate memory, virtual memory, like the actual physical backing memory will never, you know, go from one region to another. If you allocate memory from a game, it will be in the application memory region, it will never end up in the base memory region. And that might seem kind of trivial, but it will come up later. And then the thing is, you know, you know, kernel from for the arm nine is also it's going to live in our memory. So you can't actually mess with it from the arm 11. And then W ram is going to be what contains all the memory that pertains to the arm 11 kernel. And yeah. And so the cool thing is with this kind of like, really deep security model is in theory, at least, compromising the whole system should take a number of exploits, right? First off, you need to actually even get code execution on the machine, which is not trivial because, you know, Nintendo is not, it's not Apple or Android, like it doesn't just give you the ability to create your own apps and like run into your console. So you need to actually first, you know, compromise an application. And from there, you'll be you'll have you might have code execution, but it'll be unprivileged. And so you kind of want to escalate your privilege to get more tax surface. And so one way of doing that is to compromise a system module. And from there, you might have access to, say, more system calls. And maybe the system calls are going to be more vulnerable than other system calls. And then maybe you can use those to compromise the kernel. And then from there, you will have the complete attack surface into the arm nine, which is secure processor. And assuming you compromise that you'll get the total control. Of course, you know, you have these kind of arrows to the left to kind of signal that yeah, this is in theory in practice, you can kind of go from like, number one to number four. Sometimes I know someone has a bug that just goes straight to number four. But in this case, we are actually going to explore a bug chain that does every single one of these steps. And so it's kind of unnecessarily complicated. But I do want to show, you know, that in theory, the security model can be really cool and actually be really effective. And yeah. So the first thing is actually getting protection of the machine. Just for a little bit of history. Okay. This is supposed to be like animated, but I guess not. Okay. Yeah. So for a little bit of history, there have been kind of two classes of entry points on the Nintendo 3DS. And, you know, one of these is going to be things like cubic ninja exploit from a couple of years ago, which is a kind of bug that is trivial and really should not be exploitable on any modern platform, but is because Nintendo's, Nintendo kind of lacks a number of mitigations on the 3DS. And you know, it kind of makes just for security. And the thing is, you know, 3DS also have these web browsers, right? It has the actual web browser applet and actually has the YouTube app, which is just like a web browser with like a fancy coat of paint. And, and you know, the thing is from those, you can really trivially bypass these mitigations anyways. Like no one believes that a web browser exploit is ever going to be stopped by SLR or stat cookies. And so kind of a conclusion for that is even though all these bugs are trivial or still exploitable on 3DS and really shouldn't be, at the end of the day, the threat model that Nintendo needs to adopt is that, you know, user mode will be compromised, right? So they need to base themselves on that. And that makes it such that you end up with like a lot of low hanging fruit. And the exploit that I'm going to talk about today is in the M copy app. It's called the microSD network system transfer thingy basically just allows you to access the files on a microSD over network. And the way that that's implemented is an SMB server. And because SMB is a notoriously secure protocol, of course, you end up with the ability to find vulnerabilities really trivially. And so the way I did this took, you know, like an hour or so, I just grabbed pi SMB from GitHub, modified a little bit to actually talk to Nintendo's SMB server implementation, because for some reason just didn't work out of the box. And after that added these six lines of fuzzing code, which, you know, just flip bits randomly. And if you want to take a look at what that looks like in practice, you have, you know, the 3ds is running the SMB server here, the fuzzer is running in the background. And it should just take a couple seconds. And then, of course, 3ds is going to crash because this, you know, super shitty SMB server actually, SMB fuzzer, I'm sorry, actually, you know, works surprisingly well. So at this point, you know, you have the 3ds has crashed. You know, 3ds does not normally give you, does not normally like give you a registered dump, like a crash dump like that. But this is running in custom firmware because that's how we do development these days. And so I'm not going to go too deep into detail into how that bug works, because first off, I'm not an SMB expert. I really literally just want that fuzzer, found that bug and exploited it. But to give a basic idea, basically, yeah, you have this like, this is the packet that actually crashes the console, but in its normal form. So sessions set up and x packet, whatever the fuck that means. And it has a couple of like, you know, data blobs in there, the NTLM response data blob. And what I mean by data blob is something that is going to have variable length. So as an attacker, I control the length of these blobs. And that is the NTLM response data blob and the domain name data blob as well. The domain name is actually just like a string that says work group. And the vulnerability is truly trivial. Basically, it just checks the length of the data blob for NTLM response. And if it's not a one specific value is going to take another code path and that code path just copies memory onto the stack into a buffer that has a fixed length. But, you know, like a length that is controlled by the attacker. And, you know, obviously you can just like, you know, override the entire stack with that from a packet that is crafted just like this. You just make it such that the NTLM response blob is not 0x18, it's going to be 0x10. And then, you know, you make the work group blob size be like 0, like, you know, hundreds of bytes instead of like 10 bytes or some shit. And you actually just overwrite the entire stack. You're able to overwrite a return address. You're able to redirect the CPU's execution flow and jump into existing code and get remote code execution essentially. In practice, you know, I say remote code execution, but in practice that actually just means Rop, which stands for return oriented programming just real quickly for people who are not really familiar with that. What that means is we are not necessarily able to inject new code because we have this mitigation called DEP, which is like data execution prevention. And what DEP does is basically you're not just able to inject code into the process and just jump to it. Because the thing is that any memory that is writable will not be executable. And so that's actually like a really good, really good mitigation that is actually enforced really strictly by Nintendo. There is under no normal circumstance that like under normal circumstances will never be memory in user mode that has both writable and executable or memory that was ever writable will never become executable. And so that's really well enforced. What that means is that instead of actually just injecting shell code and just jumping to it, you have to reuse existing code inside the process. That's what Rop is basically just over a return address. You jump to a tiny piece of code, make sure that then it'll jump to another tiny piece of code, another tiny piece of code, another tiny piece of code, which we call gadgets. And from there you are actually able to do arbitrary competitions, call arbitrary system calls and do kind of whatever you want. Thing is, you know, my personal aspirations for hacking the 3S were to actually run Homebrew on it, which is, you know, games made by amateurs, applications made by amateurs, that sort of shit. The thing about that is writing Homebrew and Rop is not ideal. You kind of want to do it in the in actual native code. And so, you know, we are not able to create new executable memory. We're not able to re-protect writeable memory to executable if we don't need to. What if we can just actually overwrite memory days read-only? And the way to do that is through DMA, right? You have a bunch of devices that have access directly to memory and, you know, the GPU is one of them because the GPU needs to be able to, for example, read a texture from, you know, read a texture from memory in order to render something. It also needs to be able to write a frame buffer to memory, right? And the thing is the GPU actually has access to all the FC RAM, all the W RAM, all the VRAM, and that means that you can actually just use the GPU to render over code pages, right? In practice, it's not that simple because otherwise you would just be able to overwrite the kernel because even though it technically has access to W RAM which includes the ARM 11 kernel, it's, you know, there is like a register that allows the Nintendo to limit the range the GPU has access to through DMA. And so we're not able to just overwrite the kernel. In fact, we're not even able to overwrite system modules or home menu because the system and base regions are not accessible to the GPU. But because the GPU does need to be able to access the textures and stuff from the current game, we do have access to the first half of the FC RAM. And so what that means is that if you think of, you know, this is physical memory at the bottom, you have virtual memory at the top, you have this is going to be your text section. So it's readable and executable, not writable. Basically just going to use the GPU to overwrite physical memory at the bottom. And then it's just, you know, well, because that's how memory works, it's going to show up in virtual memory. And you can just jump to it. And so basically we use the GPU to render code into, you know, these physical pages and then overwrite instant code. Right. And so Nintendo did not really like that very much and they tried to kind of put a wrench into our plans. And the way they did that was they realized, okay, whenever people use their ROP chain to use the GPU to overwrite code with, you know, other code, they kind of rely on the specific hard coded address. The reason being that, you know, this code page is always going to be at the same location in physical memory. And so we don't really need to do anything fancy. So their idea is, you know, this is before their mitigation was put into place, you just have these four blobs of code in virtual address space. And then it's just going to correspond really trivially to these four blobs in physical address space. So their idea is, well, let's just jumble it up. If you, in that way, you know, as an attacker, if I try to write to physical memory, you know, because the order is going to be kind of jumbled up and you don't know the size of the blocks, you don't know where the blocks which order the blocks are put into and stuff. Well, if I try to write to physical memory to the same location as before, it might show up in the blob that I wanted to, or it might show up in this other block. And so that means I won't know the location I just wrote to. And so we call this physical ASLR, which, you know, PSLR for short, because that's really what it is. And the thing is, it's actually kind of a shitty mitigation, because, you know, a good mitigation you want to have mitigation that actually creates extra work for the hacker every time they're at an exploit. The thing with this one is, well, you just kind of have to bypass it once because it turns out Rop, as has been known for about 10 years, is churning complete. And basically you can do arbitrary computations around it. So you can actually just do a for loop and search for the physical piece of memory that you want to overwrite and then overwrite it. So we basically had to write this Rop chain once and then you can just reapply to every exploit. So, you know, not a great mitigation. And so what that means in practice is, if, you know, this is like the actual you could just kind of write it on the computer, connects to the console, packs the console and then we have a correction. We have like the actual homebrew menu running on the console and you can just do that over network, over any console that is running 11.7 or whatever. So that's the first stage. At this stage we have compromised, you know, unprivileged user mode and well, yeah, that was the first step in our four-chain, four-exploit chain. And at this point we want to somehow escalate privilege. Because the thing is, okay, so we have code execution. This is great. But we only have access to the basic unprivileged system calls, right? So in terms of attacking the kernel, it's totally doable, has been done several times, but ideally want to have more tax surface. And likewise, if we want to escalate to another, to like an actual system module which might have access to more system calls, well, you know, mCopy, like this application we just compromised, only has access to a few of these services. So ideally we want a way to kind of migrate to another process that might have access to better privileges and such. And well, it turns out we can actually kind of do that because, you know, I was showing you this slide earlier in terms of like what the GPU has access to in terms of DMA. We're saying only has access to the actual application region. I kind of lied about that. It actually has access to a little bit of a system region as well. It just does not have access to the home menus code section. It does have access, though, to the home menus heaps section. And so what that means is like any memory that is dynamically allocated by the home menu, I will be able to read and write through the GPU. And so from there it's actually kind of trivial to just find some C++ project on the heap, you know, modified Vtable, for example, and just have it jump into other code. And then you get code extension. Well, you actually just get Rop in the home menu. So that's kind of the annoying thing. It's like you can't use the GPU to get native code extension instead of the home menu. And so in terms of how that works in practice, we had to write this whole like service that runs in the home menu. And that's all in Rop. And, you know, but once again, like Rop is turned complete. So you can just do whatever the fuck you want. And yeah, so at that point we have compromised like these two processes. And the thing that is interesting about that is as you can see, even though we don't have access to any additional system calls, we do have access to the services that home menu had access to. And one of those services allows us to, for example, kill processes and create new processes. And so the idea then is that we can actually just kill the M copy process because you have code extension inside the home menu process and replace it with another process and then use the GPU to take over that process and so on. And so the idea then is that we actually have, you know, in theory we have access to the privileges of any process that can live inside the application region that we can start. And so that means that any game, any app, any surface that these have access to, we kind of have access to as well. So we have like the biggest attachment facility we could possibly get from Unprivileged User Mode and that means that we can start looking into some more esoteric services such as LDR RO. So it lives in the RO process and basically what it does is if you think of Windows, right? Windows has these DLL files, dynamically linked libraries. Well, it turns out the 3DS does as well. And they don't call them DLLs, they call them CROs which stands for CTRL, Relocatable Object, probably. Kind of just guessing at this point. And it is like an interesting process for us to go after because it actually has access to a very special system call which actually allows us to create new executable memory if we want to which will come in handy later. And so taking a look at how it works, basically you first allocate a piece of memory inside your application, you will load your CRO into it from the file system or whatever and then you're going to ask LDR RO to load it for you. And what I mean by load it is that because it's a DLL, it's supposed to be executable code, it's going to need to be reprotected to be executable at some point as a process that didn't have the ability to do that, but this LDR RO does. So the first thing it does is it actually locks it away from the application and then is going to apply dynamic linking stuff to it which just means like relocating some pointers and so on. And then it's going to reprotect it as being executable for the pages that that's relevant for. So what I mean by locking is that my application will not be able to write to that memory which makes sense because we don't want to well LDR RO does not want us to be able to mess with it as it's happening. And the thing is because it's the application itself is loading this CRO blob, the linker does have to be built defensively against like malformed CROs. And actually they did like fix a bunch of bugs there and made it such that as far as I can tell there's not a lot of vulnerabilities that you can just exploit by just giving a malformed CRO. The thing to notice though is as mentioned, the application is the one that allocates the memory that is going to contain the CRO. So what that means is that physical memory for this is going to be in the application region which means once again we can use the GPU to kind of like mess with that CRO blob as it's being relocated. And you know that sounds like it could be a problem because it was built defensively against malformed CROs. But what about CROs that are like kind of being modified on the fly? Well, turns out is not secure against that at all. And so if you look at the code that this is code that lives in the RO process and just kind of is part of the relocation process of this. The first thing is basically is going to go through all the offsets in the header of the CRO and kind of modify them to stop being offsets into the CRO and become actual pointers to the CRO. So it basically just like it just adds the base address of the CRO to each offset in the header. After checking, of course, that the offset is within the bounds of the CRO, right? And so that could be fine. The thing is this pointer that is going to be used later on by the RO process lives in physical memory that will roll over. And so what that means is that whenever it ends up being a pointer, you know, this is, for example, the pointer to the segment table in the CRO. Well, what you can see is loaded from the CRO here and then is going to be used directly to both read and write memory. So as an attacker, if I can modify that pointer, I can start getting RO to read it at the trade of the process, which is not great for them. And in practice, we end up with like three kind of weird, kind of weird, like, corruption primitives. The first process to write an arbitrary value at an arbitrary location as long as there's, like, a byte that has the value two, eight bytes after the location we're trying to overwrite and also, like, the location, like, four bytes after what we're trying to overwrite can't be value zero for some reason. And then the same thing below for the second primitive, except it has to be byte three and then all the way below, it can be any value there as long as what we're overwriting is not value zero. And then we're not actually just overwriting it. We are actually incrementing it with some other value. So basically, we have, like, these arbitrary read and write the primitives. Well, actually, really, just arbitrary write. But they're not really arbitrary in the sense that we do have, like, these weird constraints here. But of course, it's not that hard to exploit this in practice. What I want to do is get Rop inside of this process. In order to do Rop, I just want to overwrite a return address on the stack. And this is just kind of showing what I can and cannot overwrite based on these primitives. What you see in orange here are actually return addresses on the stack that is what I would want. And so what's in yellow without Rop can actually overwrite. And what you can see is there's actually an overlap between return addresses that I want to overlap overwrite and the locations and memory that I can actually overwrite. The thing is we do have, like, corruption target here, of this corruption primitive here, which does allow us to overwrite memory, well, actually increment memory at an arbitrary location with much fewer constraints. So I don't need to have byte3 for that. And instead, what I'm going to do is I'm going to use that to, you know, at this location. We meet all these constraints for this primitive. And so I can use this to actually place byte3, which I can then, you know, use this with the second corruption primitive to just overwrite this return address with an arbitrary value. So at that point, you know, there is like a little bit more to the actual full on exploit, but this is the basic idea, right? It is pretty simple. And at this point, I will have Rop execution inside of this process and I do have more privilege than I had before. And so what that means in practice is I just have access to a few more system calls and I get to, like, look at them and see if I can use them to actually take over the kernel. So, taking over the kernel. Well, now we have taken over this process called RO. We have access to more system calls and one of these system calls that is actually really interesting is called control process memory. What control process, well, yeah, first off, this is an interesting system call because RO is literally the only process that has access to it. So in a sense, you can kind of think of RO as like an extension of the kernel that just like has this one very specific goal, like purpose and it has to use, like, this very special system call that was built just for it in order to achieve it. The thing about control process memory is it's really just the same thing as control memory except that it can work cross-process as long as you have a handle to that other process and it has fewer constraints. One of the fewer constraints that it has, as mentioned, is that it can actually create or re-protect existing memory as being executable. Which is really useful for us if you just want to run Homebrew, right? We can just, we don't need to mess with the GPU anymore. We can just like create new arbitrary executable memory and we can just like do whatever we want. But the other interesting thing about it is it can also bypasses some of the restrictions that control memory has in terms of where it is allowed to map memory to. And what that means is that we can map the null page, which is address zero, which is something that is notoriously, you know, not allowed because a lot of bugs rely on the ability to place memory at address zero because a lot of bugs are just going to be de-referencing a pointer that is null and should not have been and was not checked properly. And so that's kind of interesting for us because then if we can find a null dereference bug inside of the kernel, which would normally just be a denial service bug, all of a sudden we might be able to elevate it to become an actual remote code extrusion bug. And so what is a good target for null dereferences? Typically it's going to be memory allocation because if you have memory allocation primitive and you run out of memory or you just like try allocating memory is going to return null, send as kernel maloc or whatever does not check that the pointer is null, then all of a sudden you have null dereference and things become interesting for us. So taking a look at how the allocator in the kernel works for kernel objects, this is basically what it is. It's a link list. Well, so it's a slab heap. What that means is that basically for each type of object we're going to have one memory blob that's going to be subdivided into sub-objects. And so basically whenever these sub-objects are not used they're a part of a free list. What that means is you have that list head and then each free object is going to link from one to the other and then allocating an object just means popping a free object from this free list and putting it into like whatever you want to use it for and then freeing an object is just going to be pushing an object back into that free list. So what happens if we run out? Well, if we run out we end up having the free list head point to null and so whenever you allocate an object next it's just going to return null and all of a sudden we might have our null dereference bug that you want. Now what that means is that of course the code that uses this allocation function has to check the resulting pointer is not null. If it's zero it should just throw an error and usually it does but you can see is like that last example there it's allocating a new linked list node and it's checking that this node is not null and if it is not null it's going to initialize it to zero. Think of it as a pointer that uses the node out without checking for anything. It just like it does this check for it to zero but then even if it was null it just kind of uses it without caring. So it is like kind of an odd programming pattern but somehow it ends up being in literally every location that the kernel uses these linked list objects and so the idea then becomes well if we can make the kernel run out of these linked list objects we can make one of these linked lists be in the null page which is controllable by us from user mode and then well once we have that we might be able to actually take over the kernel. So the question is how do we actually make the kernel run out of these linked lists? Well a good way to do that is actually just to look at other system calls and how they work and one of them is weight synchronization N. It is a system call that is basically the same thing as weight synchronization one. The only difference that weight synchronization one weights on one object whereas weight synchronization N weights on N objects which I know it's pretty obvious. The thing is what I mean by weighting on an object is going to be something like a kernel object like a thread. When you're weighting on a thread that means that you're going to have your current thread weight until that other thread is dead and then your thread is going to be woken up is going to get in an event. You can also be weighting on an event object on a mutex weighting on a mutex just means weighting until that mutex is not locked anymore. That's a basic idea. N will take N objects as input in practice that's up to 250. And then it's going to weight on them and as soon as one of them is signaled it is going to wake up your thread and do whatever. Now the question is okay it has to weight on these 256 objects somehow so it has to keep track of them somehow. And the way that it does that of course is a linked list. So the idea then becomes well if we can create as many threads as we want and have each one of them weight on as many objects as we want then we're going to be creating a bunch of these linked lists that all have as many as 256 objects in them and then well there's only about 1500 linked list nodes that are allocating the kernel so after a few attempts we should be able to actually get it to run out. Yeah so that is essentially what we do and we can actually trigger an LD reference bug that way. The thing is it's not trivial to exploit necessarily because well it turns out linked lists are using the kernel a lot and the problem is well what if another process is trying to use a system call because you know that's what processes do and it needs to use a linked list because you know that's what the kernel does and it has run out well it's basically going to crash because that other process does not have the null page mapped into it. So that's not great and then the other thing is even if our own process might have another system call or even the current system call because if you look at this you know it's going to just continue allocating new linked list of new linked list nodes over and over again right and so even after we've triggered the vulnerability we're still going to be keeping a like we're still going to keep like allocating new nodes after this and so the problem with that is well next time a node is allocated even if it's in our current process null is going to be returned and if it's from another list you're going to end up with two lists that you know collide into the same node and you end up with like all these linked lists from the kernel that like are kind of mangled into one another and that gets really messy really quickly and we want to avoid that if possible. So the way to do that is basically to just kind of do this thing I just call it like just in time freeing but it's really not as fancy as it sounds the idea is that as soon as you're going to have a linked list node allocated in the null page is going to write data to that null page and so because the 3DS has multiple CPU cores you can basically just have one thread do that you know that null dereference bug by keeping by allocating more and more and more linked list nodes and then you can have the other CPU just like reading from the null page at all times and as soon as it sees that you know zero has changed it to like a pointer value be it like next pointer, previous pointer and the object pointer is going to be able to take action. So as soon as it sees that the first CPU course or core zero is just going to signal an object that another thread was waiting on and that's just going to free a bunch of these linked list nodes and then the next time we have an allocation is just going to use one of these linked list nodes that was just freed and basically that's how we get around the whole issue. Now the question is okay we're able to trigger this bug and we are able to do it without crashing everything which is great but how do we actually exploit this to get course you should have a kernel? Well typically like I'm sure like people who are familiar with linked list bugs will know that basically you want to do this through the unlinking phase and so just like explain how that works so basically imagine you have this linked list right so each node is going to point to the other in the previous one so when you free let's say node two what's going to happen is just you have to update the next pointer from node one and the previous pointer from node three it's pretty straightforward. Now the thing is in our case we actually control node two right we have full control over the next pointer value and the previous pointer value and so let's say in this case that you know we save that next points to zero x babe and previous points to zero x dead well if we try to unlink this bad node what's going to happen is that it's going to write the next is going to write the value of a previous to next and the value of next to previous essentially so that means you're going to be writing zero x babe to zero x dead and zero x dead to zero x babe and so that means that we can actually use this to write an arbitrary address to an arbitrary location as long as both addresses point to actual valid writable memory and so we end up with this primitive which is obviously super powerful because we can use this to say overwrite a function pointer and that's essentially what we end up doing because if you look at the code that actually invokes the linked list on linking right before freeing the kernel object and well right after freeing the kernel object is actually going to make an indirect call for a V table so if you can overwrite the pointer that is pointed at by that V table then you have the ability to just jump to any location and code that you want. The only kind of annoying thing is that it turns out the free kobge function is going to panic if it tries to free a null object which is kind of weird because the allocation function does not really seem to care about returning a null object but you know whatever you end up being able to actually just exploit this by doing by being like you know vaguely tricky and by vaguely tricky I really just mean that you know you overwrite this V table pointer the only thing is you have to overwrite the V table pointer with the address of a node that's going to be overwritten and because of that you have to actually make the null page be read, writeable and executable but basically what's going to happen is you just kind of put a piece of code in there in the second node there that is going to just jump to some other location and use your mode and basically you just get kobge function that way. It's not that complicated. At this point we have access to everything in VRM 11, we have compromised VRM 11 kernel which means that by extension we have actually also compromised literally every other process that is running on the VRM 11 at this point we can kind of just do whatever we want we can run whatever games we want we can access all the hardware that is accessible by VRM 11, that's pretty great but it's not enough for whatever reason we do still want to actually take over VRM nine because you know I guess that's cooler it actually doesn't really give you access to much more it does like allow you to write directly to the NAND chip which is nice and definitely useful for other exploits but yeah so we want to be able to do that and again we don't actually have the ability to write directly to VRM nine memory but we do have the ability to talk to VRM nine through other ways and so VRM nine is responsible for certain services such as accessing permanent storage as mentioned but it also does other things and one of those things is actually backwards compatibility. So the 3DS is actually able to run old DS games and the way that that's done is by basically turning the 3DS into a DS in terms of hardware like it puts a bunch of weird hardware registers and that actually brings up a third CPU that is like kind of hidden and then it turns the ARM nine into like a DS mode CPU and it's kind of crazy and so the thing is it has to be able to do that in order to do that it actually has to kill the current operating system and start another operating system that is going to do a bring up for all this crap and so the operating system that we've been working on so far is this native firm thing firm being for firmware presumably but you have other firmware that can run on 3DS you have safe firm which is what runs when you do an update for your console when you have TWL firm which is the one we're gonna be interested in and TWL is just like the codename for the DSI so that's where any comes from and so in terms of actually launching another operating system what the 3DS does is it basically just has the ARM nine do everything first the ARM nine is going to load the memory like load the firmware image from permanent storage it's going to load into memory that we cannot alter from the ARM 11 which is ARM nine internal memory and it's going to use its crypto hardware to actually decrypt it and authenticate it make sure it's like actually the Nintendo special sauce and not something that we've altered somehow and then it's going to copy each individual section into where they're supposed to go because you have code that needs to run in the ARM nine that's pretty easy it's already an ARM nine memory but you also have code that needs to run in the ARM 11 and that has to be copied to FCRAM, WRAM and whatever now the thing is once it's done that the ARM nine is not just going to start executing its own code it is first going to tell the ARM 11 hey, please start running the code on your end too because I need you to do that as well thing is we've compromised the ARM 11 so we can basically tell the ARM nine to fuck off and we can just keep running whatever code we're already running and do whatever we want and so that becomes interesting because if you take a look if you take a look at how Tileo firm works basically it first has to load a ROM like a DS game image from somewhere and then turn into a pseudo DS mode sort of thing and it can actually load its ROM from three locations can either load it from an actual physical game card or from the NAND card from the NAND permanent storage or for some reason it can load it from FCRAM which of course we have complete control over from the ARM 11 and that becomes interesting because well you know it's a ROM loader it's a file format parser well there might be bugs in there and it turns out there are and so from the ARM 11 we can actually mess with that ROM and kind of inject something the only thing is of course DS ROMs are signed so Nintendo actually checks the DS ROM is valid before using it and so that should kind of kill the idea except that for some reason it does not check the signature if the ROM is coming from a CRAM which is completely baffling because that is like the one location that we have control over honestly don't know why I did not really care to reverse engineer into like why that happens but it does and so we are actually able to move forward this idea and so the DS ROM is going to basically contain two memory images that have to be copied because these are the code images that are going to be run by the ARM 9 and the ARM 7 which are two CPUs of a DS and essentially yeah the TWL firm loader has to basically copy these to ARM 7 sections into whatever memory is going to correspond to the DS mode RAM address so you have this kind of formula to convert a Nintendo DS mode physical address into a DS physical address so the two things to notice is first off it's like a kind of weird hardware compatibility mode and so you only have like two bytes that are used every eight bytes which is why we have to multiply by four for some reason and then the other thing is you can see with this formula that you can actually create any 3DS physical address as long as you have the right Nintendo DS physical address like we can cover the entire memory space even though it's not meant to be the case and so because we have control over the address in Nintendo DS mode that the ARM 9 and ARM 7 sections are going to be loaded at we can actually turn those into any 3DS physical address that you want for example memory that is used by the ARM 9 to execute code so for example we could maybe overwrite the actual ARM 9 3DS mode code and take it over that is as long as there are checks are not good enough turns out of course their checks are not good enough because they did not do an integer overflow check which again kind of sad and there's also no bounce checks on the section sizes so that means that we can have these like kind of crazy values of in address like let's say we want to overwrite this address over there because like that is the ARM 9 like thread stack address well we can just like create this fake Nintendo DS mode address and then give it its crazy size this is like about one gigabyte of memory so this would never be valid for Nintendo DS ROM and then if you check on the math it's actually going to give you the address that you want and it's also going to go through all these checks that we just mentioned here and so at this point basically we end up with the ability to write about a gigabyte of memory to an arbitrary location through DS physical space and that should be a problem because you know we actually don't have a gigabyte of memory to overwrite the thing is you know I'm going to kind of skip over because I'm running out of time but the idea is that we can just overwrite this physical address here well the return address for this function because the actual memory that is being copied is being copied by tiny blocks instead of like big blocks so you actually end up being totally fine now the only thing is for some reason it's copying two bytes at every eight byte boundary and so you can't actually just like overwrite code because you can only overwrite two bytes you know on every eight byte slot which is super annoying what that means in practice is if I want to overwrite this call stack right I can actually only overwrite the bytes that are highlighted in orange and so in terms of making a ROP chain that's you know not ideal but we can make it work because the arm nine doesn't really have a depth or anything so we can actually just use this to place an actual address the address of like actual code that we control because that code can be in writable memory in this case we place that code into the Nintendo DS mode ROM header and then we just overwrite one return address of a top there and make it jump to this gadget that's just going to skip a bunch of the call stack and then it's going to return into this address and we will get code execution on the R9 and yeah at this point we have full control of the entire machine you know we can do this we started with nothing went over network sent like one magic packet has it and then it just kind of you know gives you full access to everything and at that point you can read and write and you can mess with the crypto engine and kind of do whatever the fuck you want so thank you for your time all the code for the exploits for this is available on GitHub want to give special thanks to a few people Derek, Ned Will, Yelzé, Pluto, Nairwirt and if you want to follow me on Twitter that's my handle, Smilum not tweeting very interesting things so you know please don't and yeah have a good DEF CON