 You have been here on stage before. You successfully tempered with the Wii. You successfully tempered with the PS3 and got some legal challenges over there. Some unfounded legal challenges, yes. And then you fucked an excuse my French over here by the way that is number 8021 to get the translation on your DCT phone. So you fucked with the Wii U as well. And well, console hacking 2016. Yes. Here we go. So I'm a lazy guy so I haven't turned on my computer yet for the slides so let me do that. Hopefully this will work. So my computer is a little bit special. It runs a lot of open source software. It runs FreeBSD. It runs, even has things like OpenSSL in there and Nginx and Kyro I think. And WebKit. It runs a lot of interesting open source software. But we all know that BSD is dying so we can make it run something a little bit more interesting and hopefully give a presentation about it. Let's see if this works. It's a good start, black screen, you know. Like it's thinking to disk and file system shutting down. There we go. And yes, I run Gen2 Linux. This is the does Wi-Fi work moment. Hopefully. NTP. Yeah, no, NTP failed. Well, that's a bit annoying, but it still works. Hello. Yeah, it takes a bit to boot. It doesn't run system D, you know. So it's saying it's a tiny bit slower, but it's saying there we go. This is the does my controller work moments. Bluetooth and the install one. Okay, it does. All right, so let's get started. So this is console hacking 2016 PS4 PC master race. I apologize for the horrible Nazi joke in the subtitle, but it's a red thing. So yeah, PC master race. Why? Well, PS4, is it a PC? Is it not a PC? But before we get started, I would like to dedicate this talk to my good friend, Ben Beyer, who we all knew was bushing. Unfortunately, he passed away in February of this year and he was a great hacker. He came to multiple congresses. One of the nicest people I've ever met. I'm sure some of you have met him would agree with that. And if it weren't for him, I wouldn't be here. So thank you. So the PS4, is it a PC? Is it not a PC? Well, it's a little bit different from previous consoles. It has actually six. It's an actually six CPU. It runs free BSD runs webkit. It doesn't have a hypervisor, unfortunately. Then again, the PS3 had a hypervisor and it was useless. So there you go. And so this is different from the PS3, but it's not completely different. It does have a security processor that you can just ignore because it doesn't really secure anything. So that's good. All right. So how to own a PS4? Well, you write a webkit exploit and you write a free BSD exploit. Duh, right? And everything runs webkit and free BSD is not exactly the most secure OS in the world, especially not with Sony customization. So this is completely boring stuff. Like what's the point of talking about webkit and free BSD exploits? Instead, this talk is going to be about something a little bit different. First of all, after you run an exploit, well, you know, step three, something set for profit. What is this about? And not only that, though, before you write an exploit, you usually want to have the code you're trying to exploit. And with webkit and free BSD, you kind of do, but not the build they use and it's customized. And it's annoying to write an exploit if you don't have access to the binary. So how do you get the binary in the first place? How do you dump the code? That's an interesting step. So let's get started with step zero, blackboard to code extraction, the fun way. A long time ago, in a hackerspace far, far away, failure to follow got together after 31c3. And we looked at the PS4 motherboard and this is what we saw. So there's an ALU, a Southridge, that's a code name, by the way. Then there's a Liverpool APU, which is the main processor. It's a GPU and a CPU, which is done by AMD. And it has some RAM. And then the Southridge connects to a bunch of random crap, like the USB ports, a hard disk, which is USB for something inexplicable reason. The internal disk on the PS4 is USB. Like it starts out to USB and then to USB on the Southridge. Even though it has SATA, like what? The Blu-ray drive is SATA. The Wi-Fi Bluetooth has the IO and the Ethernet is GMII. Okay, how do we attack this? Well, GDDR, what just? Oh, I have a screensaver apparently. That's great. I thought I killed that. Let me kill all the screensaver real quick. Something had to fail. It always does. I mean, of course, I can SSH into my PS4, right? So there you go. Okay, I could have sworn I fixed that. Anyway, so yeah, which one of these interfaces do you attack? Well, USB SATA is the IO, GMII, that's the raw Ethernet interface, by the way. All of these are CPU-controlled. The CPU issues, commands, and the devices reply. The devices can't really do anything. So you can't write to memory or anything like that. You can exploit USB if you hide the bug in the USB driver, but we're back to the no-code issue. So GDDR5, that'd be great. We could just write to all memory and basically own the entire thing, but it's a very high-speed bus. It's definitely exploitable. If you're making a secure system, don't assume we can't own GDR5 because we will, but it's not the path of least resistance. So we're not going to do that. However, there's a thing called PCI Express in the middle there. That's interesting. PCI is very fun for hacking, even though it might seem intimidating, because it's bus mastering that means you can DMA to memory. It's complicated and complicated things are hard to implement properly. It's robust. People think that PCI is this voodoo high-speed. No, it's not. It's high-speed, but you don't need match traces to make it work. It will run over wet string. Like, you can hotwire PCIe with pieces of wire, and it will work, at least at short distances anyway. Believe me, it's not as bad as you think. It's delay-tolerant, so you can take your time to reply. And the drivers are full of fail because nobody writes a PCIe driver assuming the device is evil, even though, of course, everybody should because devices can and will be evil, but nobody does that. So what can we do? Well, we have a PCIe link. It's cut the lines and plug in the Southridge to the motherboard, to a PCIe motherboard that we stick on the side. So now the Southridge is a PCI card for us, and we connect the APU to an FPGA board, which then can pretend to be a PCIe device. So we can man in the middle of this PCIe bus, and it's now times one width instead of times four because it's easier that way, but it'll negotiate. That's fine. So how do we connect the motherboard on the FPGA? There's, of course, many ways of doing this, but how many of you have done any hardware hacking, even Arduino or anything like that? Raise your hand. I think that's about a third to a half or something like that, at least. And when you hack some hardware, you melt some hardware. After you blink an LED, what is the first interface you use to talk to your hardware? Serial port. So we run PCIe over RS232 at 115 kilobaud, which makes this PCIe... I said it was delay tolerant. So it makes this PCIe 0.0002x and eventually there was a gigabit ethernet port on the FPGA, so I upgraded to that, but I only got around to doing it in one direction. So now it's PCIe 0.0002 in one direction and point point in the other direction, which has to make this one of the most asymmetric buses in the world, but it works. Like believe me, this is hilarious. You can run PCIe over serial. Also, we were asking coding, so half the bandwidth. Yeah, it's fine. It's fine. So PCIe 101, it's a reliable packet switch network. It uses thing called transaction layer packets, which are basically just packets you send. And it can be memory read and memory write, IO read, IO write, configuration read, configuration write. It can be a message signaled interrupt, which is a way of saying, hey, listen to me, by writing to an address in memory, because you can write the things, so why not write for interrupts? It has legacy interrupts, which are basically emulating the old set this wire low for interrupt and high for no interrupt thing. You can tunnel that over PCIe. And it has completions, which are basically the replies. So if you read a value from memory, the completion is what you get back with the value you tried to read. Okay, so it's PCIe, right? So we can just go wild with DMA. We can just read all memory dump. For kernel, hey, it's awesome, right? Except there's an IO and memory in the APU. But of course, the IO and memory will protect devices. It will only let you access what memory is mapped to your device. So the host has to allow you to read and write the memory. But just because there's an IO and memory doesn't mean that Sony uses it properly. So here's some pseudo code and, you know, it has a buffer on the stack. It says, please read from flash to this buffer with the correct length. Can anyone see the problem with this code? Well, it maps the buffer and it reads, and then it unmaps the buffer. But IO and memory don't just map byte food to byte bar. They map pages, and pages are 64K on the PS4. So Sony has just mapped 64K of its stack to the device. So we can just DMA straight into the stack, basically the whole stack, and take over. So now we get code execution, FreeBSD, kernel dump, and Webkit and IO slips dump, just from mapping the flash. Okay, that's step zero. So we have the code, but this is not, you know, that's not the PS4 that we did this on. It's a giant mess of wires. Someone here knows about that, you know, flying over the face. But we're going to make a nice exploit. We've done that because, as I said, Webkit, FreeBSD, whatever. What comes after that? Okay, we want to do something. Of course, we're going to run Linux. And how do you go from FreeBSD to Linux? It's not a trivial process, but you use something that we call PS4K exec. So how does this work? It's, you know, it's simple, right? You just want to run Linux? Just jump to Linux, right? Well, kind of. You need to load Linux into contiguous physical, run set up boot parameters, shut down FreeBSD cleanly, hold secondary CPUs, make new page tables, disable, blah, blah, blah. A lot of random things. I'm not going to bore you with all this crap because you can read the code, but there's a lot of, like, iteration in getting this to work. Now, let's assume that you do all this magical cleanup and you get Linux into a nice state and you can, you know, jump Linux. Okay, now we jump Linux, right? It's cool. Yeah, okay. You can technically jump to Linux and it will technically run for a little bit. And then it'll stop. And you're not going to get any serial or any video or anything. What's going on here? Okay, let's talk about hardware. What is x86? x86 is a mediocre instruction set architecture by Intel. It's okay, I guess, you know. It's not great. It's okay. The PS4 is definitely x86. It's x86-64. What is a PC? Ah, a PC is a horrible, horrible thing built upon piles and piles of legacy crap baked thing back to 1981. And the PS4 is definitely not a PC. Then again, that's practically Sony-level hardware fail, so it could be, but it's not. Okay, so what's going on? Well, a PC, a legacy PC, basically has an 8259 programmable interrupt controller, a 253-programmer interval timer, a UART at IO3F8, which is the standard address for a serial port. It has a PS2 keyboard controller, 842. It has an RTC, a real-time clock, with the CMOS. Everyone knows the CMOS, right? MC146818 is the chip number for that. An ISA bus. Even if you think you don't have an ISA bus, your computer has an ISA bus inside the Southridge somewhere, and it has VGA. The PS4 doesn't have any of these things. So what do we do? Well, okay, let's look a little bit at how a PC works and how a PS4 works. This is a general, simple PC system. There's an APU or an Intel Core CPU with a Southridge, Intel calls it PCH, AMD, FCH. There's an interface that is basically PCIe, though Intel calls it DMI and AMD calls it UMI, whatever, DDR3 RAM and a bunch of peripherals and SATA, whatever. A PS4 kind of looks like that, right? So you'd think this can't be that. What's so hard about this? Because all the crap I mentioned earlier is in the Southridge and a PC, right? So the PS4 has a Southridge, right? So the Southridge, the AMD standard FCH implements Intel legacy from 1981. The Marvel Aeolia, Marvel is the maker of the PS4 Southridge, implements Intel legacy from 2002. What does that mean? Ah, that's no Southbridge. That's a Marvel Armada SOC. So it's not actually a Southbridge. It was never a Southridge. It's an ARM system on a chip, CPU with everything. It's a descendant from Intel's strong ARM or X-Scale. It has a bunch of peripherals. And what they did is they stuck a PCIe bridge on the side and said, hey, EX86, you can now use all my ARM chip. So it exposes all of its ARM peripherals to the EX86. They added some stuff they really needed for PCs and has its own RAM. Why do they do this? Well, it also runs 3B on the ARM in standby mode. And that's how they do the whole download updates in the background, get content, update, whatever. All that crap is because they have a separate OS on a separate chip running in standby mode. Okay, that's great, but it's also batched insane. Yeah. So quick recap. This is what a PCI bus number looks like. It has, sorry, a device number. So it has a bus number, which is 8 bits, a device number, which is 5 bits, and a function number, which is 3 bits. So you've probably seen this in LSPCI, if you've ever done that. This is what a regular Southridge looks like. So it has a USB controller, a PCI, you know, ISA bridges, SATA, whatever. And it has a bunch of devices. So one Southridge pretends to be multiple devices because you only have 3 bits for a function number. So you can only have up to 8 functions in one device. So the, you know, Intel Southridge just says, well, I'm a device 14, 16, 1A, 1B, and just a bunch of devices, and you can talk to all of them. If you LSPCI on a roughly unpatched Linux kernel on the PS4, you get something like this. So the Aeolia, first of all, clones itself into every PCIe device because they were too lazy to do if device equals my number, then reply, otherwise don't reply. No, they just said, oh, just reply to every single PCIe device that might query. So Linux sees the Southridge, you know, like 31 different times, which is kind of annoying because it gets really confused when it sees 31 clones of the same Southridge. And then it has 8 functions, ACPI, Ethernet, SATA, DMC, PCI Express. Okay, 8 functions, so all 3 bits. It turns out 8 functions are not enough for everybody. Function number 4, PCI Express Glue has a bridge config, MSI interrupt controller, ICC, we'll talk about that later, HPET timers, flash controller, RTC timers, 2 serial ports, I2C, all this smashed into one single PCIe device. So Linux has a minimum system requirement to run on anything. You need a timer, you need interrupts, and you need some kind of console. The PS4 has no PIT and no PIC and no standard serial, so none of the standard PC stuff is going to work here. The board has test points for any 250 standard serial in a different place. So we want Dmessage over that. Okay, fine. Linux has Erdicon, which we can point to a serial port and say, please send all your Dmessage here very early, because I really want to see what's going on. It doesn't need IRQs. You set console equals to your 8250, you set the type, the address, the speed, and you'll see it says 3200 instead of 115 kilobot. That's because their clock is different. So you set 3200, but it really means 115K. And that gets you Dmessage. That actually gets you, you know, Linux booting, uncompressing, whatever. That's pretty good. Okay, we need a timer, because otherwise everything explodes. Linux supports the TSC, which is a built-in CPU timer, which is super nice and super fun. The PS4 has that. But Linux tries to calibrate it against the legacy timer, which on the PS4 doesn't exist. So that's fail. So again, the PS4 really is not a PC. So what we need to do here is define a new sub-architecture, because Linux supports this concept, says, is this not a PC? Is this a PS4? The bootloader tells Linux, hey, this is a PS4. And then Linux says, okay, I'm not going to do the old timestamp calibration. I'm going to do it for the PS4, which has a special code that we wrote that calibrates against the PS4 timer. And it disables the legacy crap. Okay, so now this is not a PC. This is officially not a PC anymore. Okay, now we can talk about ACPI. You might know ACPI for all its horribleness and all its evilness and all its Microsoftiness. But ACPI, most people associate with suspends and suspend and hibernate. It's not just power. It has other stuff too. So we need ACPI for PCI config, for the IOMMU, for the CPU frequency. The PS4, of course, has broken ACP tables, because, of course, it would be. So we fixed them in PS4K exec. Okay, now interrupts. We have timers. We have serial. We fixed some stuff. The PS4 does message signaled interrupts, which is what I said, the non-legacy, the nice new thing, where you just write a value. And what you do is you tell the device, when you want to interrupt, please write this value to this address. The device does that. And the CPU interrupt controller sees that, right, and says, oh, this is an interrupt, and then just fires off that interrupt into the CPU. That's great. It's super, you know, super fast and very efficient. And the value directly tells the CPU that's the interrupt vector you have to go to. Okay, let's see. That's the standard MSI way there, that your computer does MSI that way. This is how the PS4 does MSI. The AOLIA ignores the MSI config registers in the standard location. Instead, it has its own MSI controller, all stuffed into function four, which is that glue device, yeah, glue. Each function gets a shared address in memory to write to, and the top 27 bits of data, and every sub function, because you count a lot of things into one place, only gets the different five bits. And all MSIs originate from function four, so, like, this device has the fire interrupt and it goes to here, and then that device fires an interrupt. Like, what the hell is going on? Like, seriously, this is really fucked up. And the eyes are missing in the front there. But, yeah. So, yeah, driver hell. Now, the devices are interdependent, and the IRC vector location is not sequential, so that's not going to work, and you need to modify all the drivers, and, like, this is really painful to develop for. So, what we ended up doing is there's a core driver that implements an interrupt controller for this thing, and then we have to make sure that loads first before the device driver, so that Linux has a mechanism for that, and we have to patch drivers. Some drivers, we patch to use these interrupts, some drivers, we raft around to use these interrupts. Unfortunately, because of the top bit thing, everything has to share one interrupt within a function. Thankfully, we can fix that with the IOMU because it can redirect interrupts, so you can say, oh, interrupt number zero goes to here, one goes to here, two goes to here. So, that's great because it's consecutive, right? Zero, one, two, three, four, five is obviously going to have the same top bits, but we have to fix the ACP table for that because it's broken. But this does work, so this gets us interrupting that function and their individual. So, look at the checklist. We have interrupts, timers, early serial, late serial with interrupts. We can get some user space. We can, you know, stash some user space and binaries into the kernel and it'll boot and you can get a console, but you get a console and you try writing commands and sometimes it hangs. Like, okay, what's going on there? So, it turns out that free BSD masks interrupts with an AMD proprietary register set, we have to clean that up too, and that fixes serial and all the other interrupts. This took ages to find. It's like, why interrupts on CPU zero sometimes don't run? Yeah. I ended up dumping register sets and I saw this FFFF here and not FFFF. What's that? But yeah, like tracking through the stack to find this was really annoying. All right, so we have the basics. We have like a core platform. We can, you know, run Linux on, even though it won't do anything interesting, add drivers. So, we have USB XHCI, which has three controllers in one device. Again, just, you know, let's make it insane. We have SDHCI. That's a SDIO for the Wi-Fi and the Bluetooth. Needs a non-standard config, needs quirks. Ethernet needs more hacks. It's still partially broken. It only runs at gigabit speeds. If you plug in 100 megabit switch, it just doesn't send any data. Not sure why. And then all of this works fine in the next 4.4 and then just three days ago, I think I tried to rebase on 4.9 and so we have the latest and the greatest. And everything failed and DMA didn't work and all the drivers were just throwing their hands up in the air and what's going on here? Aeolia strikes back. So, that's what, you know, the, you know, the Aeolia looks like normally. So you have, again, it's an arm sock. It's really not a device. It's like its own little system, but it maps its low two gigabytes of the address space to memory on the PC and then the PC has a window into its registers that it can use to control those devices. So, the PC can kind of play with the devices and the DMAs to the same address and that works great because it's mapped in the same place and then has its own RAM, you know, in the total address space. This works fine, but now we add an IOMMU because we needed it for the interrupts and the IOMMU inserts its own address space in between and says, okay, you can map anything to anything you want. It's great. You know, it's a page table. You can say this address goes to that address. Linux 4.4 did this. It would find some addresses at the bottom of the IOMMU address space, say, you know, page one goes to this, page two goes to that, page three goes to that and say device, you can now write to these pages and they go to this space in the x86. That works fine. It turns out Linux 4.9 or somewhere between 4.4 and 4.9 it started doing this. It would map pages from the top of the IOMMU address space and that's fine for the IOMMU, but it's not in the window in the AOLIA. So now, you know, you say, ethernet DMA to address FE something, something, something and instead of DMAing to the RAM on the PC and DMAs to the RAM on the AOLIA, which is not going to work. So, yeah, effectively the AOLIA implements 31-bit DMA, not 32-bit DMA because only the bottom half is usable. This is all really fucked up, guys. Seriously. And this is littered all over the code in Linux so this needed more patches and it works, but, yeah, painful. Okay, devices, devices work now for something completely different. Who can tell me who this character is? That's Starshot from Space Battleship Yamato and apparently that's the code name for the PS4 graphics chip. Or at least that's one of the code names because they don't seem to be able to agree on what the code names are. Like, it's got Liverpool in some places and Starshot in other places and Fabie J in other places and we think Sony calls it Starshot and AMD calls it Liverpool, but we're not sure. We're calling it Liverpool everywhere just to avoid confusion, but yeah. Okay, what's this GPU about? Well, it's an AMD CIs generation GPU, which is spelled CI instead of SI because S was taken. It's similar to other chips in the generation, so at least it's not a batshit crazy new thing, but it does have quirks and customizations and oddities and things that don't work. What we did is we took Bonair, which is another GPU that is already supported by Linux in that generation and just added a new chip and said, okay, do all the Bonair stuff and then change things and hopefully adapt it to the PS4. So hacking on AMD drivers. Okay, well, they're open source, but AMD does not publish register docs. They publish 3D shader and command queue documentation so you get all the user space 3D rendering commands that's documented, but they don't publish all the kernel hardware register documentation. That's what you really want for hacking on drivers, so that's annoying. And you're thinking, the code is the documentation, right? Just read the Linux drivers, that's great. Well, yeah, but they're incomplete and they have magic numbers and you don't know if you need to write a new register that's not there and it really sucks to try to write a GPU driver by reading other GPU drivers with no docs. So what do we do? We're hackers, right? We Google. Every time you need information, hopefully Google will find it because Google knows everything. And any tidbit you can find in any forum or code dump somewhere, that's great. One of the things we found is we Google this little string, R8XX GPU in quotes. And you get nine results. And the second result is this place. It's a silicon kit token, okay. It's an XML file. And if we look at that, it looks like it's an XML file that contains the dump of the Bonaire GPU register documentation. But it's like broken XML and it's incomplete. It stops at one point. But what's this doing here? And what does this come from, right? So let's dig a little deeper. Okay, Google, what do you know about this website? Well, there's some random things. Like what the hell know.txt and what the hell yes.txt and some Excel files. Oh, sorry, Excel, like XML style sheets. And then there's a thing at the bottom there called rai.grammar.for.txt. Hmm, I wonder what that is. And it looks like it's a grammar, BNF notation description for a syntax of some kind of register documentation file. It just looks like an AMD internal format. But it's on this website. Okay. So we have these two URLs. Slash pragmatic slash bonair.xml and slash rai slash rai.grammar.for.txt. Let's try something. How about maybe pragmatic slash bonair.rai? Nah, it's a 404. Okay, it's pragmatic slash rai slash bonair.rai. Ah, bingo. So this is a full bonair or almost full bonair register documentation with like full register field descriptions, breakdowns, all the addresses. It's not 100% but it's like the vast majority. This seems to be AMD internal stuff. And I looked this guy up and apparently he worked at AMD at some point. So, but yeah, this is really, really helpful because now you know what everything means and debug registers and yeah. So I wrote a working parser for this format, not the XML. This I was apparently writing a XML parser or something like convert this thing to XML but it was all broken. Oh, he was writing PHP, by the way. But there you go. So I wrote a working run in Python and you can dump it and then you can see, you know, what each register means and it'll tell you all the options. You can take a register dump and map it to the, you know, basically documented. You can diff dumps. You can generate defines. It's very useful for AMD GPUs. And this, grossly speaking, applies to a lot of AMD GPUs. Like they share a lot of registers. So this is useful for anyone hacking on AMD GPU stuff. Over 4,000 registers are documented in the just in the main GPU address space alone. So that's great. Okay. So we have some docs. How do we get to a frame buffer? Uh, so if you, uh, you know, the user has HDMI, that's easy, right? The GPU has HDMI. And if you query the GPU information, you actually get that it has an HDMI port and a display port port. Okay, maybe it's unconnected. That's fine, right? Ah, but if you actually ask the GPU, it tells you HDMI is not connected. Display port is connected. Okay. Yeah, they have an external HDMI encoder from display port to HDMI because just putting a wire from A to B is too difficult because this is Sony. So let's put a chip that converts from protocol A to protocol B. Yeah, yeah, yeah. Yeah. Yeah. And okay, it's, uh, it's, yeah. It's a Panasonic display port to HDMI bridge. Not documented, by the way. Requires config to work. That's why it doesn't just work, even though some bridges do. And you'd think, okay, it's hooked up to the GPU I squared C bus because GPUs have in the past used these bridges and not this one in particular, but other AMD cards have had various chips that they stuck in front. And the code has support for talking to them through the GPU I squared C interface, right? Right? That's easy. Yeah, you wish. This is Sony. Enter ICC. So remember the ICC thing in the olia? It's an RPC protocol you use to send commands to an MCU that is somewhere else on the motherboard. It's a message box system. So you write some message to a memory place and then you tell it, hey, read this message and then write some message back and it tells you that's the reply. You access it via AOVIA, not via the GPU. You use it for things like power button, the LEDs, turning the power on and off and also the HDMI encoder R squared C. So now we have a dependency from the GPU driver to the AOLIA driver and two different PCI devices and two different, yeah. And okay, again, ICC, but this I squared C is a simple protocol. You read a register, you write a register, that's all you need. It's super simple, right? Right? Now let's make a bytecode fucking scripting engine to issue I squared C commands and delays and bit masking and everything. And why, so only why, like why would you do this? Well, because ICC is so slow that if you actually try to do one read and one write at a time, it takes two seconds to bring up HDMI. Yeah, like, yeah. I don't even know at this point. I have no idea. And by the way, this thing has commands where you can send scripts in a script to be run when certain events happen. So yo dog, I heard you like scripts. I put scripts in your scripts so you can I squared C while you I squared C. Like, let's just go even deeper at this point, right? Because, yeah, yeah, yeah, yeah. Okay, we wrote some code for this. You need more hacks. It needs all display port lanes up. The next choice to downscale doesn't work. Memory ground with calculation is broken. Mouse cursor size is from the previous GPU generation for some reason. I guess they forgot to update that. So wait, all this crap, you get a frame buffer, but X won't start. Ah, well, it turns out that PS4 uses a unified memory architecture. So it has a single memory pool that is shared between the X86 and the GPU. And games just put a texture in memory and say, hey, GPU, render this. And that was great. And this makes a lot of sense. And their driver uses this to the fullest extent. So there's a VRAM, you know, the legacy GPUs had a separate VRAM. And all these integrated chipsets can emulate VRAM using a chunk of system memory. And you can usually configure that in the BIOS if you have a PC that does this. And the PS4 sets it to 16 megabytes, which is actually the lowest possible setting. And, yeah, 16 megs is not enough to have more than one full HD frame buffer. So obviously that's going to explode in the next pretty badly. So what we do is we actually reconfigure the memory controller in the system to give one gigabyte of RAM to the VRAM. And we did that in VSD KXX. So it's basically doing, like, BIOS-y things. We're reconfiguring the Northbridge at this point to make this work. But it works. And with this, we can get X to start because it can allocate its frame buffer. But, okay, it's 3D time, right? Yeah, GPU acceleration doesn't quite work yet. So we got at least, you know, X. But let's talk a bit about the radio and GPU for a second. So when you want to draw something on a GPU, you send it a command. And you do this by putting it into a ring, which is really just a structure in memory that's just a list of commands. And it goes or it wraps around, right? So that way you can queue things to be done on the GPU and then it does it on its own and you can go and do other things. So it has a graphics ring for drawing a compute ring for GPU and a DMA ring for copying things around. The commands are processed by the GPU command processor, which is really a bunch of different CPUs inside the GPU that are called F32, and they run a proprietary AMD microcode. So this is a custom architecture. Also, the rings can call out to IBs, which are indirect buffers. So you can say, basically, call this piece of memory, do this stuff there, return back to the ring. And that's actually how the user space thing does things. So you, you know, this says, draw all this stuff and it tells the kernel, hey, draw all this stuff and the kernel tells the GPU, jump to that stuff, read it, come back, keep doing stuff. This is basically how much GPUs work, but radion specifically works like, you know, but this F32 stuff. Okay. The driver complains ring zero test failed. Thankfully, it tests them. So at least, you know, it has nice diagnostic. How does the test work? It's really easy. It writes a register with a value and then it tells the GPU with a command, please write this other value to the register, runs it, and then checks to see if the register was actually written with a new value. So the write doesn't happen. It never, it's there. Thankfully, thanks to that RAI file earlier, we found some debug registers that tell you exactly what's going on inside the GPU and it shows the command processor is stuck waiting for data in the ring. So it needs more data after a knob command. Yeah. Knob is hard. Let's go stalling. So packet headers in this GPU thing has a size that is size minus two. Whoever thought that was a good idea. So a two word packet has a size of zero. Then AMD implemented a one word packet with a size of minus one. And old firmware doesn't support that and thinks, oh, it's free FFF. So I'm just going to wait for a shitload of code in the buffer, right? It turns out that Hawaii, which is another GPU in the same gen, has the same problem with old firmware. So they use a different knob packets. So there was an exception in the driver for this and we had to add ours to that. But again, getting to this point, many, many, many hours of headbanging. Yeah. Okay. We fixed that. Now it says ring three test failed. That's the SDMA ring. That's for copying things in memory and it works in the same way. It puts a value in RAM, tells the SDMA engine, hey, write a different value and checks. This time, we see the right happens, but it writes zero instead of the dead beef or whatever. Okay. So I tried this. I put two right commands in the ring, saying right to one place, what to a different place. And this time, if I saw what it did, is it wrote one to the first destination and zero to the second destination. And again, okay. It's supposed to write dead beef, which is what you see there. It says, you know, dead beef is that word with the value. It writes one. Well, there's a one there. That wasn't there before. It was a zero because it was padding, right? So yeah, it turns out they have it off by four error in the SDMA command parser and it reads from four words later than it should. Again, this took many hours of headbanging. And it was like randomly try two commands. Oh, one, one, one. Yeah. So it reads four words too late, but only in ring buffers. Indirect buffers work fine. That's good because those come from user space, so we don't have to muck with those. We can work around this because it's only used in two places in the kernel by using a fail command instead of a write command. That works fine. Again, how did they even make these mistakes? Okay. But still the GPU doesn't work. The ring test pass, but if you try to draw, you get a bunch of page faults. And it turns out that what happens is that on the PS4, you can't write the page table registers from actual commands in the GPU itself. You can write to them from the CPU directly. You can say just write memory register write. MMI or write. But you can't tell the GPU, please write to the page table register this. So the page tables don't work. The GPU can't see any memory, so everything's broken. Linux uses this. FreeBSD doesn't. It uses direct writes. And we think this is maybe a firewall somewhere in the Liverpool, some kind of security thing they added. We can directly write from the CPU, but it's like breaks the regular, like it's not asynchronous anymore, so this could break things. That's a really hacky solution. I would really like to fix this. And I'm thinking maybe the firewall is in the firmware, right? But it's proprietary and undocumented firmware. So let's look at that firmware. It's a thing. It reads microcode, right? This CP thing. It's undocumented, but we take the blocks out of FreeBSD, and that's great because we don't have to ship them. Let's dig deeper into those blocks. So how do you reverse engineer an unknown CPU architecture? That's really easy. You run an instruction and see what it did, and then just keep doing that. Thankfully, we can upload custom firmware, so it's actually really easy to just have a two-instruction firmware that does something and then writes a register to a memory location. And that's actually really easy to find. If you just first write the memory instruction, it's really easy to find in the binary, because you see GPU register offsets that stand out a bit in one column. So long story short, we wrote F32DIS, which is a disassembler for the proprietary AMD F32 microcode. I shamelessly stole the instruction syntax from ARM, so you may recognize that if you're used to ARM assembly. And this is not complete, but it can disassemble every single instruction in all the firmware in Liverpool for PFPMEC and RLC, which are five different blocks in the GPU. As far as I know, this has never been done before. All the firmware was like voodoo black magic thing that's been shipped, not even the non-AMD kernel developers know anything about this. And you can disassemble the desktop GPU stuff too, so this could be good for debugging strange GPU shenanigans and non-PS4 stuff. All right, alas, it's not in the firmware. It seems to be blocked in hardware. I found the debug register that actually says there was an access violation in the bus when you tried to write this thing, and I tried a bunch of workarounds, and I even bought an AMD APU system desktop, dumped all the registers, dipped them against the one I had on Linux, and tried setting every single value from the other GPU and hoping I find some magic bits somewhere, but no. They probably have a setting for this somewhere, but it's a C of ones and zeros, good luck finding it. It does work with the CPU write workaround though, so hey, at least we get 3D, and it's actually pretty stable, so if there's a race condition, I'm not really seeing it. So checklist, what works, what doesn't work. We have interrupts and timers, the core thing you need to run any OS. We have a serial port, we can shut down the system and reboot, and you think that's funny, but actually it goes to ICC, so again, need some interesting code there. I actually just implemented that what, four hours ago, because you're pulling the plug, it was getting old. The power button works, USB works. There's a funny story with USB, is it used not to work, and we said, fix it later, there seemed to be a special code missing, and then someone pulled a repo from the USB not working branch, and tested it and said, oh, it's working. It seems we fixed it by accident, by changing something else. The hard disk works, which is via USB. Blu-ray works, I wrote the driver for that also four hours ago, three hours ago now, yeah, something like that, and it spent 20 minutes looking for someone in the hack center that had a DVD I could stick in to try it. Apparently I'm from the past if I ask for DVDs, so yeah. But it does work, so that's good. Wi-Fi and Bluetooth works, Ethernet works except only a gigabit speeds, frame buffer works, HDMI works, it's currently hard coded to 1080p, so yeah, it does work. We can fix that by improving the encoder implementation. 3D works with the ugly register right hack, and speed if audio works, so that's good. HDMI audio doesn't work, mostly because I only got audio grossly working in general recently, and I haven't had a chance to program the encoder to support the audio stuff yet, because again, more annoying hacks there. And the real-time clock doesn't work, and if you think that's simple, well, the clock device is simple, but ever since the PlayStation 2, the way Sony has implemented real-time clocks is that instead of reading and writing the time on the clock, which is what you would think is the normal thing to do, they never write the time on the clock. Instead, they store an offset from the clock to the real-time in some kind of storage location. And there's a giant mess of registry, it's called, in the PS4, and I don't even know where it's stored. It might be on the hard drive, it might be encrypted, so basically getting the real-time clock to actually show the right time involves a pile of nonsense that I haven't had a chance to look at yet. But we have NTP, right? So it's good enough. All right, oh, and we have blinking lights, important. The power LED does some interesting things if you're on Linux, so that's good. So, the code, we can get the PS4 KXEK code on our GitHub page that has the KXEK and the hardware configuration and the bootloader Linux stuff. You can get the PS4 Linux branch, which is our fork of the kernel, based on 4.9, which is the latest public version, I think. You can get our Radeon patches, which are three, I think, really tiny patches for user space libraries, just to support this new chip. Really simple stuff, the Knop thing and a couple commands. And the RAI and F32 disting I mentioned, you can get Radeon tools at that GitHub repo. I just pushed that right before this talk. So if you're interested, there you go. And if you're going to get the RAI file, well, you probably want to run before the guys at that website realize it really should take that down. But I'm sure the internet wayback machine has it somewhere. So yeah. Okay. Well, that's everything for the story of how we got Linux running on the PS4. And you can reach us at that website or failoverfall on Twitter. Thank you. So I hope that wasn't too fast. Sorry. I had to rush through my, like, 89 slides a little bit because I really wanted to do a demo. And then again, this kind of is the demo, right? But we can try something else. So maybe I can shut this... If I can aim with my controller. This is really not meant to sound mouse. That's not right button. Come on. Yeah, I think it is. Close, close, maybe. Yes. So we have this little icon here. I wonder what happens if it works. So we have internet access. Hopefully Wi-Fi works. Actually, let me just check real quick. Because this could work really badly if we don't. Ping 8, update 8, right? Yeah, we have internet access. Okay, Wi-Fi works. Okay. Wonder what happens if we click that. It takes a while to load. This is not optimized for... So the CPUs on this thing are a little bit slow. But hey, you know, it works. And now it's a real game console. And this is... There we go. Okay. So yeah, I think we can probably take some Q&A because this is a little bit slow to load. But we can try a game, maybe. Well, if you are for Q&A, I think there will be some questions. So shall we start with one from the internet? Testing, testing. Okay, hey. The internet wants to know if... Well, most of your research will be published or if stuff's going to stay private. Well, all of this... I mean, the publishing is basically the code which... And the explanation I just gave. As I said, everything's on GitHub. So all the drivers we wrote, all the... In that case, I guess also the spec is the code. If you really want to, I could write some wiki pages on this. But roughly speaking, what's in the drivers is what we found out. The really interesting bit, I think, is that there's 32 stuff from the AMD GPU stuff, and that we have a repo for. But absolutely, if you have any general questions on any particular device or any details, feel free to ask. I don't know. Again, it would be nice if we wrote a bunch of docs and everything, but it's not really a matter of not wanting to write them. It's lazy engineers not running to write documentation. But the code is at least... The things we have on GitHub is pretty clean. Okay, so someone is piling up on four. Guys, if you have questions, you see the microphones over here. Just pile up over there and I'm going to... Four, please. Just a small question. How likely is it that you upstream some of that stuff? Because, I mean... So, there's two sides to that. One side is that we need to actually get together and upstream it. The code, some of it has horrible hacks. Some of it isn't too bad. So, yeah, we want to upstream it. We have to sit down and actually do it. I think most of the custom x86-based machine stuff in the kernel is doable. The drivers are probably doable. Some people might scream at the interrupt hacks. But it's probably not terrible. And if they have a better way of doing it, I'm all ears, the other kernel devs. The Radeon stuff is white fishy because of the encoder thing that is really non-standard. And also, understandably, AMD GPU driver developers that work for AMD may want to have nothing to do with this. And in fact, I know for a fact that at least one of them doesn't. But, I mean, they can't really stop us from upstreaming things into the next kernel, right? So, I think as long as we get the code into a state where it's doable, it's fine. But most likely, I think... I think most likely the non-GPU stuff will go in first if we have a chance to do that. And of course, if you want to try upstreaming it, go ahead, it's open source, right? So... Over to microphone one, please. Hi. First, I think I should implore you to try and find Travel Hudson and control him into using your free BSDK exec implementation in Heads instead of having to run all of Linux in it as a joke. But my real question is if the reason you used Gentoo was because SystemD was yet another hurdle in getting this to run. I run Gentoo on my main machine. I run Gentoo on most of the machines I care about. I do run Arch on a few of the others, and then I live with SystemD. But the reason why I run Gentoo is, first, it's what I like and use. And second, it's super easy to use patches on Gentoo. You get those things we put on GitHub, which are just patch files. It's not really a repo because they're so easy. It's not worth cloning everything. Just get those patch files, stick them on Etsy, port touch patches, have a little hook to patch it, and that's all you need. So it's really easy to patch packages in Gentoo. That's one of the main reasons. Yes. Number three, please. Will there be new exploits, new way to boot Linux on PS3 with modern firmware because finding one with firmware 176 is really rare. That was 405. But again, our goal is to focus on the, I just told you the story of the pre-exploit thing, because I think that's good, like hacker story, a good knowledge to try new platforms. And the next thing we're working on, the reason why we don't want to publish the exploit or really get involved in the whole exploit scene is that there's a lot of drama. It's not rocket science in that it's super custom code. This is what can free VSD. It's actually not that hard. And we know for a fact that several people have reproduced this on various firmwares. So there's no need for us to be the exploit provider. And we don't want to get into that because it's a giant drama fest as we all know anyway. So please, DIY it this time. Thanks. And what is the internet saying? Testing? Okay, the internet wants to know if you ever had fun with the VSD on the second processor. Oh, that's a very good question. And I myself haven't. I don't know if anyone else has looked at it briefly. One of the commands for rebooting will boot that CPU into free VSD. And there's probably fun to be had there. But we haven't really looked into it. And over to five, please. I was wondering if any of that stuff was applicable to the PS4. We are additional, however, it's called the new one. Sorry, have you ever tested? Sorry, say that again. The Sony brought up a new PS4. Oh, the Pro, you mean the PS4 Pro? Yes. Yeah. So Linux boots on the Pro. We got that far. GPU is broken. So we would like to get this ported to the Pro and also working. It's basically an incremental update. So it's not that hard, but the GPU needs a new definition, new chip, all that stuff. Yeah, I get it. You've found some friends here. Yeah. But yeah, as you can see, the 3D works and there you go. You will hear a buzzer. When you hear the buzzer, look down at the floor. Good. You can probably have to look up and down in this game. You have to do a physical and metal wellness exercise. Ready for another one? There is a brain thing on the wall. Well, then number three, please. I want to ask you if you want to port this radium patches to the new AMD GPU. CPU, a GPU driver because AMD now supports the Southern Ireland GPUs. Yes, that's a very good question. Actually, the first attempt we made at writing this driver was with AMD GPU. And at the time it wasn't working at all. And I was a bit concerned about its freshness at the time. And it was experimentally supporting this GPU generation. I'm told it should work. So I would like to port this, you know, move to AMD GPU. Not that we have a working implementation and we got the cleanup code much better. We know where all the knits are. I want to try again with AMD GPU and see if that works. That's a very good question because the newer gen might require the driver maybe. So, yeah. Thank you. Well, then I'm going to guess we asked the internet again. Okay. The internet asks, states that about a year ago you argued with someone on Twitter that the PS4 wasn't a PC. And now you're saying it kind of is not something. And what's about that? So again, my reason for saying it's not a PC is that it's not a PC. It's not an IBM personal computer compatible device. It's an XC6 device that happens to, you know, be structured roughly like a current PC. But if you look at the details, so many things are completely different. It really isn't a PC. Like on Linux I had to define, you know, sub arch PS4. It's an XC6, but it's not a PC. And that's actually a very important distinction because there's a lot of, you know, things you've never heard of that are XC6 but not PCs. Like for example, there's a high chance your monitor at home has an 8186 CPU in it. So, yeah. So nobody's piling at the microphones anymore. Is there one last question from the internet? Yes, there is. And the question is, well, if there was any decryption needed. No. So this is purely, you know, you exploit WebKit, you get user mode, you exploit the kernel, you get kernel mode, jump Linux. There's no security, like there's nothing, like stopping you from doing all this stuff. There's a sandbox in FreeBSD, but obviously you exploit around the sandbox. Like there's nothing, you know, there's no hypervisor, there's no monitoring, there's nothing like saying, oh, this code should not be running. There's no like integrity checking. You know, they have a security architecture, but as this tradition for Sony, you can just walk around it. So, yeah. The PS3 was notable for the fact that the PSGL break, which is a USB effectively piracy device that was released by someone that basically used to USB exploit in the kernel and only a USB exploit in the kernel to effectively enable piracy. So, when you have like a stack of security and you break one thing and you get piracy, that's a fail. This is basically the same idea, except I have no idea what you need to do to do piracy and I don't care. But, yeah, Sony doesn't really know how to architecture systems. That's it then. That's it? All right, thank you very much. Here we go, that's your applause. Thank you.