 Hello, everybody. My name is William Woodruff. The name of this talk is It's Coming from Inside the House, Colonel Space, Fault Injection with KRF. A little bit about me. I am a security engineer at a company called Trailabits. We're a small cybersecurity consultancy based out of New York, although we're about, I'd say, about half remote these days. In my day job, I do a combination of just sort of general security engineering, mostly on open source work, as well as government-funded program analysis research, mostly involving automated exploitability reasoning and automated exploit patching. But to get into, so this slide is kind of a lie. This talk doesn't really have two parts. This is to keep me honest about the stuff I want to cover. The first part of this talk is that faults are vulnerabilities, and I'll cover what a fault is, but for the time being, just accept that. I'll talk about why handling faults is hard and why sometimes given, like, the POSIX and Linux IO semantics, we can't even really faithfully handle them. And I'll also talk about why failing to handle faults leads to real security vulnerabilities in actual code. And then finally, I'll talk about why the cloud itself makes it easier than ever to write fault-vulnerable code. And interestingly, the cloud itself may be vulnerable to dangerous faults. And then finally, I'll sort of wrap things up with a demo of our tool, KRF, which I wrote over a weekend as a sort of a joke, and you'll see why it's kind of a joke. I'll show you guys how it works. I'll show you how to use it on your tools, and I'll talk about the results we've gotten from it in real security audits we've done. So to get started, what even is a fault? So in the context of systems programming, and specifically in the context of this talk, by fault I mean a well-specified failure mode for usually kernel-managed resources. So that'll be something like your paradigmatic E-access, E-perm, EIO, returned by a syscall when the kernel is unable to service your request. And as I'm sure all of you know, almost anything that touches the kernel can fail in some pretty well-defined way. So open can fail because you give it a bad path or because you want to have access to that path or because it's been interrupted by a single delivery or whatever fork can have the same thing. But also you have really rich failure modes where some combination of NLSAM and ACLs causes this particular path to fail that otherwise would succeed for a given user. Different file systems support different things and so you try to do an extended attribute and it fails. So rich things that don't necessarily correspond to intuitions about the ways syscalls normally fail can occur. One thing that I think is really important is to think about faults as part of the design contract for writing secure in heavy quotes C. It is your obligation as a C programmer to handle faults when they are expected to occur. You can't just sort of like swipe them aside as I personally am so frequently guilty of doing because hardware is fundamentally unreliable. Resources are always eventually exhausted and users always do eventually challenge their permission boundaries. And unfortunately for us handling faults is hard. And I think I actually, yeah I got my slides in the wrong order. Let's start here. Handling faults is hard. There's a lot of Unix baggage associated with like fault handling on Linux. 99% of the time the right thing to do is you want to like clear Erno or sometimes you don't have to. You want to make your call, you want to check your return value and then you want to check Erno. Historically that wasn't always correct because thank you POSIX 1990 whatever. Erno didn't used to be thread local. That's fixed now. Erno doesn't always have to be cleared. That's not an issue if you do clear it but because it doesn't always have to be cleared programmers are act inconsistently and don't clear Erno. You get inconsistent return values. Most of this calls return negative one. Some return null. Some return void star negative one. Some return some kind of error enum if you're inside of libc. And then finally because again it's a C and C plus plus there's absolutely no enforcement of any of this. And so user space programmers get lazy like myself and we just don't bother checking errors at all and then things fail and then we get sad. But also to make things even worse some faults can't even really be handled. So like I think some of you probably saw this. Recently there was like an Fsync gate thing with Postgres. Fsync will return, I think it returns success even if the dirty buffers inside of the kernel haven't been fully flushed in some cases. And so you'll end up with like silent data corruption inside of major databases that assume that Fsync guarantees the success of every previous sync. And that's terrible. Google's solution to this has been I think to side channel those errors with a Linux kernel module which they then broadcast via net link to some other service which then presumably like kills off that database process before it can actually get corrupted. There's a similar issue with like close and interrupt. Some of you might have seen Colin Percival's blog post a few years ago about how close two is fundamentally broken because if close two receives an E-inter the state of your file descriptor is no longer determined, so it could be open or close, and if you have another thread that is also interacting with your process and the file descriptor is closed it could reuse that file descriptor number. So if you then go to close it again you suddenly close the wrong file. And that's great, right, like who doesn't love that. And you know, it only took eight years but it came up with like galaxy brain solution. I feel bad cause I think it was case already said galaxy brain, but it was already in the slides. It took him like eight years to come up with this like crazy solution involves a cookies and a pipe and I think a select and you basically select on the pipe and like if a cookie comes out you know that nothing else has closed it. But like nobody's gonna do that, so it's broken. And then also just like enter in anything really. SignalFD has this issue and it's usually just safer to die but programs don't always just die and things go awry. But also, or not just also but I want, the reason I think this is interesting is because faults are, or at least I like to think of them as exploit primitives. So here is like a, this is code that probably doesn't occur in the real world but it takes the general form of code that does. So you have some kind of like tight event loop where you're reading some data from a user or data from some source, you're performing some operations on it and you're passing it to some like serialization or operation function. And so what you really have here is you have a heap spray primitive combined with a read primitive combined with some serialization. And each of these things individually is not a vulnerability but combined together you build up your heap spray, you spray your heap full of bad YAML or whatever. You fault read and all of a sudden you're doing arbitrary serialization. And YAML in this case because it's C probably isn't huge issue but like you can imagine some serialization format that supports arbitrary calls or something. So that could be pretty bad. Unfortunately for at least my purposes, faults are rare. Even though normal programs perform on the order of thousands of billions of syscalls as a testament to the stability of our kernels, very few of them fail and even fewer than fail predictably. And of those ones that fail predictably even fewer than fail in ways that are sort of immediately obvious as exploitable. So that makes sort of like just going out into the world and trying to make things fail really hard. That being said, they might not be as rare as you think especially in our modern magical world of containers and namespaces and all these other things that are all of a sudden imposing very sensible limits on your processes in ways that the original programs weren't designed to consider. So one thing that I find interesting as I like I look at a lot of containerized software for my job. People often sort of like go willy nilly and just containerize this old program that was not written with sec-comp in mind and all of a sudden inherited resources like the conditions around inherited resources change. So all of a sudden like this program can't open files that it assumed it could open and that's a fault. That's something that actually could be predictably caused to fail by someone who has knowledge of the program. Similarly, users do dumb things. I mean like users love to unplug random peripherals when they stop working. Users like to like plug their monitor in over and over or you know eject hard drives and do other bad stuff. Those things all cause faults. Current program is usually good and handle those faults in the kernel and then propagate them correctly to user space. User space programers then ignore them. So those are predictable sources of faults. And then similarly because of like the UNIX user model any program that runs at the same user by default can clobber other resources owned by that user. So you know pipes, shared memory, shared objects, things like that. So those are all things where like, if I'm running as one program under a user and that user has another program that's running that also talks via some privileged socket or whatever to the kernel or to a more privileged user that I can reliably cause a fault. And that program, whoops. Yeah, maybe I can escalate my privileges that way. But like I said, because they are generally rare from a vulnerability and resiliency research purpose perspective, I want to make them less rare when I'm actually doing like audits of programs. And what that means is that I don't actually want to wait for faults to happen. I want to make them happen. And so let's do fault injection. This is like a common thing. This is something that people have been doing since the 80s. And thankfully there are a ton of different really flexible approaches to fault injection from within like the Linux ecosystem. One that I've used a bunch for like programs where I have total control over the ecosystem is I'll just relink the program with faulty functions and wrappers. So like I'll have some LLVM pass that just does the right thing for me. And I'll have an output binary and it'll have faulty wrappers around my real functions. But this isn't always great because I don't always have the source. You also have LDP reload. You're all probably familiar with LDP reload. You just use it to interpose a library in front of your actual linker path and bam, you have a wrap. And then finally we have dynamic instrumentation methods. And I'll go through a few of those and talk about why they're good, why they're bad, and why I went throughout I did. Which is not necessarily a good reason. But yeah, so here's a, to give some background, here's a contrived dynamic linkage scenario. And like obviously this isn't what dynamic linkage really looks like, but it helps for visualization purposes. So you have your curl binary and inside of main it calls curl easy setup. And curl easy setup has this well defined set of failure conditions. So it can either succeed and do curly okay or it can fail for some reason and return unknown option or whatever. And those failure options are probably gonna occur, those failure routes are probably gonna occur pretty rarely in the real world. So what I wanna do is interpose my fake easy set opt and then apply some custom logic. In this case really just an RNG. It says, oh if the RNG is zero, I will bomb out with my unknown failure. Otherwise I will use DLSIM to just go to the real easy set opt which will then perform as normal business as usual. So that's really easy. This is like a great way to do very like first level full injection or any sort of like dynamic interposition of functions. Oh, blah blah, okay yeah. So like I said, this is conceptually really simple and it's really easy to use because all you have to do is that LD preload equals whatever your shared object is toss it in front of your binary and off you go. Unfortunately this comes with a bunch of downsides. One major one is that the year is 2019 and everybody loves go. Go is statically linked. LD preload doesn't work with static functions because it doesn't use the dynamic linker or static functions don't hit the dynamic linker. They don't work with syscall2 or ASM intrinsics because again you're not actually calling the function in libc that gets dispatched eventually to the syscall. LD preload is not actually interposing the syscall itself, it's interposing the wrapper which in turn causes other problems like unintuitive interposition. So as most of you probably know, open3 is really open at two in glibc. 4.3 is really cloned to. There's a dozen plus other ones and this usually isn't an issue in terms of actually causing faults but it does cause issues when you're actually trying to go back and triage your faults and figure out what actually failed. And then finally I hate maintaining state inside of LD preload. I don't know about any of you but like every time I do it I break something. I don't like thinking about it. So not using it. What else is there? There's dynamic instrumentation. ptrace2 is a really awesome syscall but it's also really slow. So I should say it's awesome. There's a really good blog post by I think Chris Wellens on using ptrace to intercept syscalls as well as add a syscall within user space. But because you have to impose, I think it's a minimum of three ptrace calls for each syscall you interpose. You're essentially adding a 3x overhead for every syscall you make. And also because you're actually activating ptrace on your inferior process, you can't easily debug that process with another debugger. There's also dynamic instrumentation frameworks. There's DynamoRio and Intel PIN. I have used these a bunch. They're easy to use once you get to know them but their performance varies. Concerningly their correctness varies depending on the host you're on. And they take a long time to learn if you're not already familiar with them. And then finally there's like, as others have mentioned throughout this conference, many, many systems within the kernel for performing various forms of introspection, interposition, and hooking of syscalls. And I think every single one of these, except for maybe EBPF, can do what I want. But unfortunately I am a bit of a Linux kernel noob so I don't know how to use any of these. I know you can do this with SecComp with a SecComp Red Erno. I know you can do this with Kprobes somehow. And I'm pretty sure you could do with LSM hooks. I just didn't try any of those, sorry. But among those there's probably lots of like really good and fast approaches. And I'd have to talk to a bunch of current people I know to figure out how to use them. So the remaining question is can we do better than dynamic instrumentation and can we do faster? And the answer is yes with KRF. And so I should say this is a bad approach. You should never do this, but I did it. What KRF does internally is it takes your syscall table and it replaces the slots of interest with faulty wrappers. And if that sounds bad, that's because it is bad. Inside of those wrappers, if the call is targeted, it redirects to a faulty syscall that returns based on another dispatch table, some Erno. I'd auto-generate those dispatch tables based on the man pages, which say the set of acceptable faults for a syscall. If the call isn't interesting, we redirect to the normal syscall and on module on load, we just mem copy the original syscall table back into place and everything proceeds as normal. So from a bird's eye view, it looks like this. We have this like Rapsys read which performs some targeting check. If that targeting check is true, we perform the real sysread and if it's false or invert it, whatever, returns default. In reality, that would be a dispatch table that returns the real Erno based on another targeting check. And then inside of our module on it, we do a lookup of the syscall table and then we just swap out that NR read, which very bad, don't do that. But that looks like this from a high-level view. So what we really have is like our runtime function call which would be fread or Java Neo read or some high-level language call which trickles down to a glibc call to read three which eventually performs a syscall two to read two which will then hit our kref sysread. Our kref sysread will perform a targeting check. If it is not targeted, it will dispatch to the normal sysread. If it is targeted, we'll go to internal sysread. From there, we'll perform an RNG and either fault or one out of end times return success just that we don't completely destroy every program we run it under. And this code is actually all macro it up but it's no longer like this because we drastically improved the code gen. But what that looks like is just this. All we do is we kref define this break wrapper if it's targeted and we no longer use an LCG. Better, we use a better RNG now. But if targeted and some RNG, we perform our real or our fake syscall and then if not, perform our real one. And I mentioned that sort of glossed over targeting before but we actually support a variety of targeting strategies. So a few nice ones are targeting by user or group ID. So you can actually tell kref, I wanna fault every syscall performed by processes owned by this user or by this group. That makes it really easy to fault like a big family of processes that respond at once but there are multi process applications that have multiple sub users and so that has an extra hassle involved. We can also currently target by process ID or inode or really anything that's available within that current struct I'm exposed during syscall context. But my personal favorite is using personality two for this. Personality two is a syscall but there is a personality field inside of the task struct which is used internally to dispatch a different versions of syscalls based on a process's disposition. So you could run in theory a SunOS or a BSD process on Linux and have a dispatch to a version of malloc that is different. I think this is actually used internally to fix little page stuff for SunOS because I think SunOS has a valid null page. I might be misremembering that. But another nice thing about personality two is that children actually inherit it so we can actually just exec off and all of our children will trickle down that personality and correctly be loaded into this faultable state for KRF. And so this is how you use it. For 99% of you who are running like a kernel above I think 3.17, I think this is the one I need. It's as simple as cloning down our repo, make and make install and then make and smud. And then I'll go over a quick summary of each of these commands you use from user space to actually manipulate KRF. I should say please do not run this on real hardware. It is a root kit. It is a root kit that rewrites your syscall table in unsafe ways. It will break your hardware, or not your hardware, it'll break your kernel, it'll destabilize your host. We use it to find bugs but it's full of bugs itself. So please do me a favor and do yourself a favor and don't do that. So from within user space we have these three commands. I didn't put KRF message up here because all you really do is just run it and it'll spit out logs. But KRF control can be used to actually set the parameters for the module. So that includes things like setting a list of functions that you wanna fault. So in this case I set read, write, open and close as my faultable functions. We also have a notion of profiles within KRF. So if you wanna default every single syscall that we tag as an IPC syscall. So that's like, I don't know, like sysv stuff. You can do tickpipc, you can fault an entire process ID. And also you can use this clear flag to just wipe out the syscall table state and return it to like a safe neutral state. And then also we have this KRF exec helper which all it really does is it sets personality, sets some rlimits and then execs off to make the process faultable under KRF. And you don't actually need that unless you're actually using the personality faulting mode. You're just using like the UID or GID mode or whatever. You can just run the program as normal and KRF will pick it up. And I sort of like glazed over this before but does KRF actually work? And the answers are both yes and no. KRF, fine, so we've used it on our actual audits. We do a lot of audits of smart contracts as well as audits of code that interacts heavily with user space, basically just systems code. We have successfully defined vulnerabilities in native components during smart contract audits. We've also used it to find a service vulnerability or a weakness rather in Kubernetes during trilobits audit which some of you may have seen. And I'd like to give a special shout out to Bobby Tonek for doing that one because he wrote all of the instrumentation to make it run under Kubernetes. But also there are things about KRF that are not ideal. For one it trashes programs in unrealistic ways like your average read is not gonna fail but KRF makes the average read fail. KRF will make things that will almost never fail when the real world fail. It will find bugs that are real bugs in the sense that they are things that you should have handled but fake bugs in the sense that they will almost never actually happen. So that's something that we've been working on from a triage perspective. So right now what KRF does, it spits out seg faults and cordombs like nobody's business. And then we have this giant pipeline that I actually had an intern work on this summer. Triaging those cordombs and figuring out whether or not this is actually a realistic crash given like real world conditions. And they've made really excellent progress on that and I'll share some of their analysis work if you wanna come up and talk later. But thank you. That's it. Any questions? Have you looked at all the fault injection system in the kernel? It doesn't cover sys calls. Right. So it would be interesting to see if you could attach to that in some way or I don't know it wasn't mentioned anywhere here so I thought probably you've looked at it but I'm curious what you found when you were looking for it. Yeah, when I was doing the initial write I did take a look at it and the first thing I saw was it wasn't connected directly to sys calls. And so I might have incorrectly ignored it based on that initial apprehension but that is something that I definitely should take another look at. So you were saying you use an RNG to decide when to fault things but it looks like based on your targeting, what I'm interested in is is there a way to see that RNG with a predictable value so that you can reproduce the exact failure path and know exactly when the process is gonna launch at which index? Yeah, so we do support exactly that. We actually have a procfs node exposed for setting the seed and then you can just go off and reproduce a crash. Okay. Yes, is it correct that most of the time or maybe all the time the result of this is a crash? Is that normally what you're looking for or could there be other errors that don't cause a seg fault but are still problems that need to be fixed? And if there are, how do you find those? How do you find out that it happened? Yeah, that is a really good question because right now our current triage system assumes that a seg fault or at least something that produces a core dump is tied to bad behavior. Whereas you're completely correct, a program could incorrectly handle a fault and go on and do something wrong internally without ever crashing. That is something that would involve possibly instrumenting the program with like LLVM or something to determine whether or not... I have a few different ideas there for handling those kind of cases. I'm happy to talk about that later. Should we go back to the, anybody else? What about faulting like coverage? Like sort of walking through a program faulting at different places? Yeah, that's something I thought about. It's not something that KRF currently supports. Right now it is like a haphazard random choice of syscalls. And as a result, you actually get pretty bad coverage because you end up faulting things early on in program execution which then cause unconditional abort. So I think the solution there would be to like... I've been thinking about ways to target areas of interest within a program and then defer faulting until that point. But that doesn't mean for a future investigation. Yeah, I think that I'm thinking the same type of thing that you asked and I'm wondering if rather than using random numbers you could say on the thousandth or 10,000, right? Because that way you get... If you do it with a random number you may not get deep into the program. Yeah, so I guess the demos didn't fully show but there are two parts, pieces of state or three pieces of state that determine targeting. There's the targeting mode which can be UID, PID, whatever. There is the RNG and then there's also a probability state. So I think the default probability is one out of 100 also targeted by RNG calls get faulted. And just based on like fussing with this thing in the real world that's produced some pretty acceptable coverage. But I think a better solution would be to have actual individual ratios for syscalls. So like fault to one out of every 10,000 reads or what have you. So that's like a future improvement. No more questions? Thank you. Thank you. Thank you.