 Alright, so let's get this show on the road. Please settle down, take a seat, and please give a warm welcome to our first time deaf-con speaker, Ulf. So please. So thank you everyone for coming and listening to me. Today we are going to direct memory attack the kernel. My name is Ulf Frisk, and helping me with the demos today, I have Martin Bergwist. Today we are going to Totalion Linux, Windows, and OSX kernels by DMA Code Injection. And we're going to dump memory at speeds in excess of 150 megabytes per second. We're going to both pull and push files from the target system. We're going to execute code, and spawn a system shell. After this talk, I will be open sourcing the software, making all this possible. And since we are talking about a hardware based attack, you also need a hardware which is already available for purchase online for less than $100. But first, a little bit about myself. My name is Ulf Frisk. I'm working as a penetration tester, primarily with online banking security. I'm employed in the financial sector in Stockholm, Sweden. I have a master of science in computer science and engineering. And most recently, I've been taking a special interest in low level Windows programming and DMA. And this has been a little bit like a learning by doing project for my part, learning more about 64-bit assembly and operating system kernels. Actually, in order to be able to do this talk, I have to put up this slide. I need to point out that this talk is given by me as an individual. My employer is not involved in any way whatsoever, what I'm doing here today. But I'm here today to present PCI Leech. PCI Leech is the combination between the PLX technologies, USB 3380 development board coupled with a custom firmware and a custom software. On the image here, you see this development board in the mini PCI Express form factor. To the left, you see the PCI Express side. It's the side that goes into the target computer or if you wanted to call it the victim computer. The USB 3380 is able to send both DMA reads and writes into the target system main memory. To the right, you see a USB 3 connector which allows us for connecting this board to a controlling computer and one's connecting to a controlling computer. This controlling computer is able to transfer memory at very high speeds with USB 3 straight into the memory of the target computer. What's very nice about this hardware is that it requires no drivers at all on the target computer. It just works. It's hardware only. And with this piece of hardware, I'm able to get well over 150 megabytes per second DMA transfer speeds. Unfortunately, this chip is only capable of 32 bit addressing and that means that you're only able to access the lower 4 gigabytes of memory with this card. As we'll see later on, that's not really a problem in practice. Actually, actually the USB 3380 has been presented here at DEFCON before. It was presented two years ago as the NSA play set slot screamer device by Joe Fitzpatrick and Miles Crabill. So I want to really thank Joe for bringing this really nice piece of hardware into my attention. So thank you very much, Joe. If I compare PCI Leech to slot screamer, it's obviously exactly the same hardware. It's a different firmware and a different software. This also means that if you already do have a slot screamer device, you should be able to reflash it and try this software on. It's faster. The slot screamer was able to achieve around 3 megabytes per second, something like that. And the PCI Leech device is able to achieve well over 150 megabytes per second DMA transfer speeds. The PCI Leech is also capable of kernel implants. In fact, it's relying heavily on kernel implants. But what makes all this possible is of course PCI Express. PCI Express is a high speed serial expansion bus or it's not really a bus since it's point-to-point communication, but anyway, it's packet based. And to the upper right, you see a schematic of PCI Express. You have the PCI Express root complex anchored within the CPU ship. From this root complex, you have several serial lanes that you can connect PCI Express endpoints to. You can also connect like PCI Express switches and bridges. So you can say that PCI Express forms a small device network within a computer. Depending on how much bandwidth the device needs, it can consume between one and 16 serial lanes, a graphics card that needs lots of bandwidth, typically consumes like 16 lanes. PCI Express is designed to be hot, pluggable, and it comes in many form factors and variations. It comes as the standard form factor, as you all know, the PCI Express, standard graphics card and similar things. It comes in the mini PCI Express form factor, as you saw in the previous slide. It comes as Express Codge, which goes into laptops and also Thunderbolt encapsulates PCI Express. And what's nice about PCI Express from our point of view is that it's DMA capable and that means that it's circumventing the CPU cores so the PCI Express endpoints can read and write memory directly. But what is direct memory access? And how does it work? And you have the CPU core. It usually executes code in something called a virtual address space. And you have a memory management unit, which is built into the CPU, which uses page tables in order to translate these virtual addresses into physical addresses. And it actually translates pages and a page is typically four kilobytes long. It can be larger as well, but most cases are four kilobytes long. PCI Express devices have traditionally been able to access all physical memory straight out without any limitations whatsoever. But CPUs nowadays, they do have something called an IUMMU, which works similar way to the memory management unit for the CPU. And this allows for virtualization of device addresses as well. So in theory, the operating system should be able to protect themselves fully against DMA attacks if the IUMMU is fully used. But as we will see later on, that's not really the case. Actually, this is the complete firmware of the PCI device. It's a whopping 46 bytes in total. The first two bytes is a header, or actually the first byte is a header 5A, 0, 0 tells us that just load configuration data from the configuration into the configuration registers at the power on. Next, we have the length, which is in little endium. So 2A is 42 bytes of configuration data. Then we have the USB controller register. We need to enable the USB 3 port on this board because it's disabled by default. First, you have an address to the register with the 2310 here. And then you have a D word or four bytes or 32 bits, which is programmed into that register at the power on. And this enables the USB 3 port. Then we set the PCI Express vendor ID and product ID to a Broadcom SD card. And this is pretty much just a left over from the slot screamer software I started to toy around with. And then in green here, we enable the four DMA endpoints, which are capable of high speed DMA transfers between USB 3 and PCI Express. We set the first endpoint to a write endpoint, which allows us to write memory from USB into main memory of the target computer at high speeds. Then we set the following three endpoints to read endpoints. Reason why we set three endpoints to read endpoints is that read is much more common than write. And we can get a little bit more transfer speed out of this chip if we're doing multi-treaded access. And at last, we set the USB vendor ID and product ID to Google Glass. And the reason why I'm doing this is that I wrote this program for Windows. Windows has a very nice user-mode USB stack called WinUSB. But in order to activate it for a certain hardware, you need to sign a small configuration file with a driver signing certificate. And those ones are kind of expensive. So I didn't want to purchase one. So it was much easier to find a device out there that actually uses this WinUSB stuff already and lie about being that device. Let's get into the kernels. Most computers today, they do have more than four gigs of memory. If you're able to get the kernel module into a system, it should be able to access all memory and also be able to execute code. So what we can do is we can search for kernel structures, code signatures, whatever in lower memory using DMA and patch that code and hijack execution flow of the kernel code that way. And when we are doing this, we need to keep in mind that the PC Express DMA works with physical addresses. Kernel code runs in virtual address space. I divided exploitation into three stages. First we have the stage one, which is pretty much just a hook. Then we have the stage two, which is the stager for the final stage three kernel module implant. We start by trying to locate the kernel or a driver or whatever in the kernel space that we can target. Usually at the end of the kernel itself or a driver, there is some free space in the last page because it's usually not completely filled out. And that page is already executable. So we put our stage two code in there, which is around 500 bytes. Then we search for a function to hook. And once we find that function, we overwrite it with a call into the stage two code, which is already written into the kernel. And when a thread starts executing the hooked function, it immediately jumps to the stage two part code. And the very first thing the stage two code does, it restores the stage one code to its original state. Then we check if we are the first thread running here. We might run in a multi-threaded environment. And if we are not the first thread running here, we immediately jumps back to the now unhooked stage one function and resume the normal execution flow for that kernel thread. But if we are the first thread running here, we locate the base of the kernel. And we need that in order to look up some function pointers that we are going to need later on. For example, we need those function pointers in order to allocate two pages of memory. The first page we use as a buffer that the PCIe main control program running on the other computer can use DMA in this buffer in order to communicate with the kernel module that we are going to insert. The second page is the kernel module or the stage three code itself. Then we write a small stage three step into the second page. And this is pretty much just a tight loop. And then we create a new kernel thread in that loop. And at the very end, in the stage two section, we write the physical address of the buffer we allocated into the code where the stage two part is located. And the PCIe main control program is polling this buffer all the time with DMA. And once it receives the physical address, it writes the complete stage three contents into that address it received. Then the loop, which is already executing the third kernel thread there, it senses that the complete stage three contents is written. So it exits and starts by setting up a DMA buffer, which is around four to 16 megabytes big in lower memory. And then it starts looping, waiting for commands. The commands are pretty much read memory, write memory, execute code, or exit. Let's start by attacking Linux. The Linux kernel is located in low physical memory. If kernel address based layer optimization is not enabled, it's located at 16 megabytes in physical memory. If k a salar is enabled, it slides at the two megabyte chunks. So once we find the kernel, we search for a function or random function to hook. In my code, I chose VFS read since it gets called pretty often and it works fine. Then I search for a function called call sims lookup name. This is pretty much the equivalent of get proc address in Windows. It allows me to use a kernel symbol name and send it to that function and it will look up the function pointer for that or a symbol for that. Then we write the stage two code and write the stage one code. Then we wait for the stage two code to return with the physical address of stage three. We write the complete stage three code and then it's demo time. In this demo, I will show how we can use a generic kernel implant in order to both pull and push files from and to a Linux system. And we're also going to dump the memory. This is the demos supposed to be like this. Sorry about that. Here you see a Kali Linux computer and we will try to log on to that computer with the root account. That one was not working here. We will reboot the computer afterwards and do the demo after the Windows demo. But we will start by dumping the memory on this computer anyway. So let's dump the memory. We'll use the Linux 64 bit kernel module. We're going to dump the memory and store it in seed temp here. So first we insert the kernel module into running kernel. Then we receive execution and then memory dumping is starting. Memory dumping works the following way is that the kernel module first asks the running kernel about the physical memory map. In computers, physical memory is not one big chunk of memory. You have like memory map PC express devices in between there. That if you read those ones you can crash the computer or whatever. You also have unreadable memory such as system management mode that you can't read. So it first queries the computer about the memory map reports this back to the PCLH main control program. And once the main control program knows about the physical memory map it can ask the inserted kernel module to read certain memory chunks. Dumping memory is usually pretty fast. It should be well over 150 megabytes per second. But in this demo I have to use a crappy USB hub. So the speed is a little bit lower. But it should still be well in excess of 100 megabytes per second as you can see here. Thank you very much. And when we dump the memory let's try to run volatility on it as well. I'm ramming the Linux PS3 command here just to show you that it's working here. At the very bottom you see the PCLH kernel thread for the inserted kernel module. And if you scroll up here we see lots of kernel threads and user mode processes here and the system gets very close. Let's move back to Windows 10. In Windows 10 the kernel is located at the top of the physical memory which is kind of boring for us since we can't access it directly. And this is a problem for us if the computer do have more than around three and a half gigs of RAM. And the reason for that is like memory map, PC express devices and other things pushes the last bytes of the memory well above four gigs. So this means that the kernel executable is not reachable directly and most drivers are also loaded below or above four gigs so they're not reachable. But if we look at the memory structures below four gigs we see that the page table for the kernel itself and the important kernel drivers are actually loaded below four gigs in it entirely. So let's attack the page table. Paging on a 64 bit system works this way. First you have a virtual address or a linear address at the top in red here that you wish to translate into physical address and this is what the memory management unit is doing. So it memory management unit starts by reading the physical address in a CPU register called cr3 in order to find the physical base address of a table called page mapping level four. And you take the topmost bits from the virtual address to point out which entry in that table to use and in the PML for entry you have the physical address of the page directory pointer table and you take some more bits from the virtual address to point out the entry in that table which contains the physical address of the page directory. Take some more bits from the virtual address which contains the entry in that table which is the physical base address of the page table itself. Take some more bits in the page in the virtual address and you get the page table entry. It's the entry that we're going to target and corrupt in order to gain kernel execution. What's nice about this is that all four paging structures here are actually loaded below four gigs so we can access them by using DMA. Kernel address space starts at the address you see here on this slide. Windows do have kernel address base layer randomization so that means that there is no fixed virtual addresses between reboots. The kernel is loaded at different places and drivers are loaded at different places as well so we can't use that. But if you take a page table entry and have a look at the lowest three bits and the highest bit which is the present bit and if it's a read or write page or if it's a user or a supervisor slash kernel page or if it's an executable or non executable page those four bits together form what I call a page signature. And if you take and have a look at the driver or the kernel itself it actually you can call those collect that collection of page signatures a driver signature. So what I'm doing I'm searching for the driver signature by walking the page table once I find the correct driver to target I locate the page and rewrite the physical address in the page table entry to a place below four gigs which I can control over DMA. So let's continue on to the Windows 10 demo. In this demo I will use a page table rewrite in order to implant a kernel module I'm going to execute code I'm going to dump memory I'm going to spawn a system shell and also try to unlock the computer. So let's switch over to the demo. Here we have a Windows 10 computer we will try to log on to that computer without using a password here. As you can see we couldn't log on to that computer without using a password on the domain account but what we can do is that we can insert the piece a leech device here into the computer and once we've done that we can try to load a kernel module into running kernel by using a page table hijack. So in Windows 10 because we are looking for driver signatures we need to target the specific driver version. So let's do that and use the page table hijack here. So we search for page table location. We hijack the page table we wait for a kernel thread to start executing there. We receive execution and we loaded the kernel module at this memory address. And now we can try to remove the password's requirement to that computer by the way it's fully bitlockered so we can log on to it without using a password. In order to do this we need to specify that we are going to use the unlock implant. It works similar way to inception but this is all done in kernel code because we are inserting this kernel module into the target system and in order to insert it we'd also need to specify the memory address we just received here so let's do that. And it says zero is success here so it says zero here so let's try to log on. As you can see it's quite easy to log on to that this computer. Let's try to dump the memory of that computer as well. And we need to specify the kernel module address of the loaded kernel module here as well. Dumping memory works in a similar way to Linux. First we ask the kernel module that is already inserted to report back the physical memory map to the PCLH leach main control program running on my demo computer here. Then it asks the running kernel module to read certain memory chunks that it knows it's already accessible and store them in the DMA buffer that was already allocated in lower memory. Memory dumping takes around a minute on an 8G system and of course once you dump all memory you can run memory forensics tools on it such as volatility. You should also be able to for example extract credentials with Mimicats or things like that. And this works on fully bitlockered computers by the way. So let's wait until the memory dump is complete here and let's try to spawn a system shell. In order to spawn a system shell we can use the PSCMD kernel implant and we also need to specify the memory address of the kernel module that is already inserted into the kernel. So let's try to run it. Let's spawn a system shell and it's as easy as that. So let's check who we are. Thank you very much and as you can see we have a system here and once we're in a system of course we can do everything. We can disable bitlocker, we can spy on other users files and do whatever stuff so but let's not do this here. So because this is a Windows demo there is one more thing missing here. So we need to specify the kernel module address here as well. We're missing a blue screen here. I was missing that one. So let's run the PSBlue kernel implant here. And as you can see as you can see Windows don't like me. Actually Windows 10 they do have some very nice anti-DMA features built in in the enterprise version but they are not enabled by default. Windows 10 can be made rather secure against DMA attacks if the virtualization based security features are enabled like credentials guard and device guard. It's quite easy often for users to mess around with settings in the UAV. For example disabled VTD or disabled secure boot and things like that. And then this virtualization based security features will be disabled in Windows as well. So we come to recommendations later on. But let's target the last missing operating system here that is OSX. OSX is just like Linux. It's located the kernel of OSX is located in low physical memory. Its location is dependent on the kernel as a large slide which is lodged in two megabyte chunks. OSX nowadays enforces kernel extension signing. System integrity protection means that users can't write to certain folders. And kernel extension signing means that you can't load unsigned drivers. All max today pretty much have a thunderbolt but the thunderbolt is actually protected with the VTD. OSX actually uses this IOMMU in order to protect itself from DNA attacks. So that's kind of boring. So what can we do in order to change that? So we can visit Apple's website. Thank you Apple. And Apple on their website tells us in plain how to disable VTD. So yeah it's easy as that. In OSX we'll first by using DMA we will search for the Mac O kernel header. Mac O is the binary format on binaries in Mac including the kernel. And then we search for like a random nice function to hook. I think guy hooked mem copy in this example. And then we write the stage two code into the memory of the target computer. Then we write the stage one code. We wait for stage two code to return with the physical address of the stage three code. We write the stage three code and then it's demo time. In this demo I will show you how to disable VTD in order to gain DMA access. And then we're going to dump the memory and unlock the computer. So here you have a Mac actually to write here you have a express card to thunderbolt converter which you don't really need for this part. All you need in order to disable VTD is that you need to power on the Mac. Which we will do in a second. It was kind of slow here. I think the movie was very slow. Let's try to reopen it. Let's move on here. We actually we boot into recovery mode by pressing command R. When we are starting the computer then you enter recovery mode. There is no password into recovery mode. And then you start the terminal and then you type nvram boot args dot equals zero just as the Mac Apple tells you on their website and VTD is now fully disabled here. So once VTD is fully disabled we should be able to target the computer over thunderbolt here. So let's do that. Here you have Macbook Air with that adapter connected to the right. And let's try to log on to that computer without using a password at all. As you saw we couldn't log on to that computer which is kind of boring. So let's insert the PCIe control adapter in the converter here. So let's start by loading a kernel module into the running macOS kernel here. And it's as easy as that. We say that we're going to load the kernel module and that we're going to target OSX here. And the kernel module is loaded at this address. And then we should be able to remove the password requirement on this Mac. So let's run the Apple 64-bit unlock implant here. And we need to specify the memory address of the already inserted kernel module as well. And it says serial is success here and we have a state of serial here so we should be able to log on here. So let's try to do that. And we're in. Thank you very much. Thank you very much. So what can we do about this in order to protect our chef's better off? Of course we can purchase hardware without using any DMA ports whatsoever. It's the low-tech variant. It works perfectly fine. If we do have windows with auto-booting bitlocker and things like that we should be able to disable like express card ports in the computers. You can do this in the UEFI settings usually. But then you need to probably you need to change the bitlocker settings in order to trigger if this port is re-enabled at the latest list. Of course if you don't want to have your Mac security disabled in the recovery mode you can set a firmware password on the Mac in order to protect yourself. And also setting a BIOS password in the PC is a good idea. Of course pre-boot authentication is always nice to have. And of course the long-term solution here is for the operating system vendors actually to make full use of the IUMMU that is already in the hardware. And Windows 10 has some very nice virtualization based security features there going on. So Microsoft seemed to do some very nice work as well. So what can we use PCL each for? Of course we can use it for awareness. It's part where I'm doing this talk. You saw today that the full disk encryption is not really invincible in any way. It's excellent for forensics and malware analysis. Sometimes you want to run malware samples on real iron hardware and you don't want to pollute that system with lots of diagnostic software or whatever. So it could be nice to have a kernel implant in a hardware device. You can use it to load unsigned drivers into the operating system kernels. It's a good pen testing tool. I do realize that law enforcement might use this tool as well. But please if you want to take a look at this don't do an evil with this tool. PCL each targets 64 bit operating systems. It runs on 64 bit Windows 7 and 10 at the moment. It's able to read up to four gigs natively. And if you're able to insert a kernel module it should be able to read all memory of the target system that the kernel can read. And if a kernel module is inserted obviously you can execute code on the target system as well. I have kernel modules for Linux, Windows and OS X at the moment. It's written in C and assembly in Visual Studio. It's as a modular design. I tried to make it as modular as possible. You should be able to create your own signatures very easily and also create your own kernel implants. Actually to the right here you see a very minimal kernel implant. It's in assembly and it reads some control registers of the CPU and prints them on the screen on the computer running the PCL each main control program. Maybe we should. But we are missing one thing here. We should try the Linux demo again here. See if we're in a better luck this time. So as you saw we couldn't log on with Tor as the default password. So let's pull a file from the Linux system. A nice file to pull is the shadow file. And it's as easy as pulling a shadow file from a running Linux system which uses the input by the way. And then we can open the shadow file and have a look at it. And the root account here has a very long password hash. So of course we can try to crack it but it's no fun doing that. So let's replace it instead with the default password hash of Tor. So this is the default password hash of Tor. So let's write the file back. And we're going to push it back to the Linux system. And we are going to use the file push kernel implant here. And now it should be on the target system. So let's try to log on here. See if it works better this time. And as you can see very. So when you leave here today I want you to remember that inexpensive universal DNA attacking is here. It's the new reality of today. Physical access is still very much an issue. You should be aware of potential evil made attacks. For example if you bring your Mac on to security conferences. And please do remember that full disk encryption is not invincible. After this talk I will be making the GitHub repo public at this address here. And please give me a couple of hours in order to do that. But I will definitely do it today. And thank you very much to Joe for the slot primer. And you've been a huge inspirational source for my work here. So thank you very much Joe. And also thank you to inception for being a big inspirational source for my work. And also thank you to the guys at PLX technologies for creating this wonderful ship. So thank you. Thank you very much for today.