 All right. So, yes, I'll be talking about Windows driver's attack surface, some new insights. So my name is Ilya. I work for a company called IOActive. I'm the director of penetration testing. I do pentascore review, basically break stuff for fun and profit. And I'm here to talk today about Windows driver's attack surface. It's sort of divided up into, you know, I have a little intro and then I've got two pieces. And piece one talks about, gives you some background and talks about the where. And in part two is sort of the guts of the talk and talks about the what. I got a lot of ground to cover today, so I'm going to try and move my way through part one really fast. I hope you'll bear with me. If I don't, I'll run over so I kind of have to. So, yeah, what's this talking about? Basically, I'm going to be talking about the attack surface of Windows WDM, which is the Windows driver model drivers, and specifically about the implementation security. I guess the audience is sort of the, you know, if you're looking for bugs in those drivers, this talk might be interesting. If you're a driver developer, this talk might be interesting. If you're just curious about this kind of security stuff, then this talk might be interesting, too. In terms of knowledge, it would be nice if you have some kind of general background on what an operating kernel looks like, and if you have some specific knowledge about Windows kernel and Windows drivers, that's even better, but not necessarily required. Before I talk about my organization, it's clear to say that standing on the shoulders of giants, there have been a number of people that have done research in these things, and I would do a disservice if I didn't at least put their names up there. If you think I missed someone, please let me know. I would like to add their name to it, but I think this is a fairly good list. This talk is slightly different than most of those, though. So if you look at most of the previous research on Windows kernel and Windows drivers, it's mostly been focused on exploitation, and it's usually been focused on the Windows kernel specifically and not so much on drivers, and it's almost always been focused on one particular issue, and so my presentation is different in the sense that the focus isn't so much on exploitation, but it is more on the finding, identifying of bugs, and then giving some kind of guidance on how would you mitigate or fix these issues. Also, it isn't so much about the Windows kernel itself as this about specific drivers. Of course, to go hand in hand, you can't really have one without the other, so most of this stuff applies to Windows kernel as well and vice versa, but it's a slight difference, though. The other thing is that I'm not talking about one particular issue. What I did is I sort of sat down and said, looked at all of the common bugs you have or you find in most Windows drivers and sort of try to find the common theme around it and a thread, and that's basically what I'll be talking about today. Right, so the intro is basically, this talks intro, you could easily spend days talking about all sorts of intricate little details about Windows driver bugs, and so I tried to get it down in the one hour thing, and basically it comes down to all of these little obscure things that very few people seem to know about them and what the consequences are. If you know a lot of people don't know about these little things or if you do them wrong and whether or not that's documented or not, and it turns out some of those things are kind of like, most of it you'll find it on MSDN, but a lot of it is like you have to read between the lines. You'll have these subtle hints as to what is problematic and what isn't, but quite often it's not explicitly mentioned, and so it makes it very hard for people to know these things, and that includes unfortunately most driver developers. So with that, let's dive into part one. So yeah, this section is meant by no means to be exhaustive, it's a quick reminder, and I'm going to try to make my way through it really fast. So basically, yeah, I'll start off with a little bit of architecture. Basically, at a high level, the Windows Chrome is divided up into all of these managers that have their own tasks to do, and the idea is that you as a user, if you want to get to them, you basically call system calls and you end up in one of these managers. For example, if you do anti-device IO control, you end up in the IO manager. From a 50,000 feet view, this is kind of what it looks like, so you have, I don't know if you know, okay. Basically, from user land, you call system calls and then you go into your call table and then it'll go into one of these managers, quite often you end up in the IO manager. Inside this thing, there's a whole bunch of frameworks working together. There's over a dozen, and all of these are worthy of their own presentation, and I wish I could talk about all of them, but due to dime constraints, I'm going to limit myself to the Windows driver model, WDM, and a tiny, tiny little bit about KMDF. What is WDM? It's basically the Windows driver model. It's been around since around Windows 2000. It extends and updates the old NT driver model. It is the standard model for how drivers are written. In recent years, there's been a shift towards KMDF, but there's still a fair amount of WDM, and so that's why I'll basically focus on WDM. Let me now say a few things about the IO manager, which is this proxy that sits in between the user and going to drivers, and depending on what kind of request you do, it may or may not do some kind of validation. It takes your request from user land and then it packages it up in this thing called an IO request packet, an ERP, and I'll talk about that in a little bit, and then basically it finds your driver's dispatch routine for whatever thing you want to do and sort of hands it off to there. Let me try and illustrate that with some code. Basically, if you have a very, very simple driver, basically has this thing called a driver entry, which is basically the main entry point, and then essentially any driver that wants to have interaction with user land goes to the IO manager and says, hey, create this device. They call IO manager, and they say, okay, now that device is created, I want user land to be able to talk to it, and so they go to IO manager and say, IO creates a symbolic link, and basically what it does is it creates a symbolic link for user land to talk to that device, and so basically this means as you register a bunch of dispatch calls to the IO manager, and then you basically can do IOctals or FS controls or open a read and write and so forth. In code, this kind of looks like this, where basically, you know, as I mentioned before, you do a great device, create symbolic link, and then you do all of these dispatch functions. And then for all of those functions, you basically have, and there's more of these. But basically for all of these functions, it kind of looks like this, where there's two arguments that get passed along as a device object and the actual ERP. And so the ERP is interesting because this contains all the data that's packaged up from user land that your driver gets access to. And this includes things like, you know, data being passed around, your IO status, what kind of request what you are. And basically, it's sort of this kind of nicely completes it where it says, okay, this is the pointers that come from user land, this is the system pointer, user pointer, memory descriptive and so on. And then you have this, your ERP stack. And so let's talk about the ERP stack. Basically, for every ERP that comes in, there's a stack associated with it, which is this ERP stack. And it contains very specific information that your driver needs for that particular operation. And so it will say, okay, you were called with this major, minor function number. It's this number. And then based upon which major, minor function number, there's a specific parameter union. And it'll have different sets of elements inside union based upon whether it's a read or write and so forth. But essentially, all the specific data you need, you tend to grab out your ERP stack. And then, of course, from user land, how do you get to there? Well, you call this, you know, empty device control file system call. And then if you look at sort of the last five arguments, sort of ones where there's, you know, potential for attack surface and where the entry into driver happens, which is your control code, your input buffer, input buffer length, output buffer, output buffer length. And so, okay, one stat sort of goes through the IO manager and it goes to your driver. The driver goes to your device control dispatch and sort of packages up this ERP and hands it over. So how, depending on your IO control code, the way the data gets packaged up differs. And so let's look at what the IO control code looks like. Even though it may look like just a number, it's actually split up into several different bits and pieces. From a security perspective, we really only care about two pieces of those. Do you require access and the transfer type? And so I'll say a little bit about transfer type and the required access. Required access is basically one of three. It says file any, file read data and file write data. And what that means is if you open a handle to a device and you say, you know, I open it as read only, and then you issue an IO control that requires file write data, for example, the IO manager sees all your handles only read is only open for read, but your actual code says file write data, that doesn't match. And so the IO manager will kick you out. And so this restricts you to the required access that you need. And so the other one is the transfer type. And there's basically four different kinds of transfer type. There's the buffer, needer, indirect and out direct. And so let me sort of run through those real quickly. The first one is method buffered. And method buffered is sort of the preferred safe-ish way of passing octals to drivers. What that means is the IO manager basically looks at the data that UseLand passes in and it says, okay, your input buffer is, I don't know, let's say 10 bytes big. What I'll do is I'll mirror it, I'll create a kernel buffer that's 10 bytes big and I'll copy your data over. And then I'll pass that on to the driver. And what that means is the driver doesn't have to worry about, you know, race conditions or making sure that the address is within user and doesn't point to kernel and all these tricks, you don't have to worry about it if you're using method buffered. Additionally, when the kernel does all that copy of data around, it charges quota for your memory. So if you hit your quota, then that fails and sort of returns and fails. So method buffered is sort of the safe-ish way of passing data around. Meta-neeter is the exact opposite. The IO manager does nothing. The IO manager basically takes the data, given to it as is, and sort of passes it on to your driver. So your input buffer, your input buffer length, your output buffer length, they're just raw pointers, they're raw lengths that have not been validated, and pass along to your driver. This is the endless source of Windows driver bugs. It's, in my view, as a driver writer, you should avoid them at all costs. Unfortunately, that doesn't appear. I mean, there's plenty of people, plenty of drivers that do this stuff. And I mean, there's a perf benefit, but it's a huge strain on resources for security, because you have to manually check everything yourself. So in order to sort of describe an indirect and indirect, I first have to say something about these things called MDLs. And in MDLs, basically, it's called a memory descriptor list, and it's a data structure in the Windows kernel that represents a buffer by its physical address. And I'm not going to discuss that, because the terminals are kind of opaque, but basically, Windows kernel has APIs to create, modify, delete, and consume these MDLs, so that you can go and say, okay, well, I have this MDL that describes physical buffer, now give me a kernel virtual address for it, for example. And the, I think it's the memory manager that will do it for you and basically hands you back a pointer. Why is this important? It's important because in this particular case, where the IO manager passes stuff back and forth, what usually happens is you have this double mapping. So what happens is the user goes to kernel to IO manager and says, oh, I have this useLand pointer and this length, and the IO manager goes to it and says, okay, I'm going to create an MDL that sees what this useLand pointer is and figures out what the physical memory bind is, and then, you know, if it's more than a page, it finds all the physical memory pages and sort of packets this up in MDL and pass that off to your driver. And then, when a driver looks at it, it'll see the MDL and say, okay, now get me the kernel virtual address for it, so you get the sort of double mapping in user and kernel that both map to the same physical page. And the reason why you would do that is so you get these situations where you have to copy large amounts of data around, because you're using MDLs, you now have zero copy as you get a huge, huge perf increase. Essentially, if you draw an MSPaint, that's what it looks like. Yeah, and so this is done for perf and you get zero copy. Additionally, you have no pain of probing and try accept. And so there are definitely upsides to MDLs. There's also downsides to it, and I'll get to that later. But essentially, now that you have an idea of what MDLs are, basically, the indirect and the indirect are these things where the indirect is sort of the, okay, the input is going to be an MDL, and so when your driver takes input and starts reading from it, it'll come from an MDL. And when you use the direct, it's going to be the exact opposite where it's like once your driver generates output, you just put it to an MDL and then user automatically basically gets it. In short, that's what these MDLs are. Okay, so this is one of my last slides for part one. I hope I wasn't too confusing. I know I ran through it really fast, but it's because the next section is really where all the cool stuff is. So most of the stuff I talked about or will talk about is WDM model, which is older. And so KMDF is this new kernel mode driver framework, and it's part of what's called a WDF, which is the something driver framework. And basically, this thing was made sort of, it was designed based upon learned mistakes from the WDM world. A lot of that has to do with power management and things like that. But also for security, a few things became apparent, and this sort of made easier to do and using KMDF. And so in general KMDF makes it easier to write drivers and less likely to introduce bugs. But it is layered over the same infrastructure that WDM still uses. That is to say if you don't understand the old model, you're still going to have sort of these implicit KMDF bugs. But in general, it does make it harder to have bugs. And so one example of security-wise is a great one is where it strongly discourages the use of passing user length pointers directly to kernel. And in fact, they have the set of APIs to sort of extract data out of the requests. So one of them is WDF requests retrieve input buffer. And if you have method buffered or direct or out direct, that API will just go grab the pointers for you. But if you're using method neater, that API will totally fail. And if you must use method neater, then you have to call this new API code. WDF requests retrieve input buffer. And that pretty explicitly tells you that it's unsafe. And so it's one of the things where, you know, they really knew that using these method neater things are horrible. And so they made it really hard for you to use it. And so if you do use it, you know, the names very explicitly tell you that things are unsafe. Generally as a driver developer, you're encouraged to use KMDF over WDM. And the cool thing about KMDF is that it's actually open sourced earlier this year. Microsoft slapped an MIT license on it and put it on GitHub. And about 90% of that code is there. There's tiny bits and pieces still missing. The redirector isn't there yet. But by and large the guts of KMDF are totally available under a free software license, I guess. And if you want, you can totally take a look at them. Okay. So now we got that out of the way. Let's dive into the watt of Windows driver's attack surface. So basically by and large, when you look at the sort of bugs you have in a driver, there's sort of three things that come up, right? They can be elevation of privilege bugs, you know, like a buffer overflow or something like it. They can be a service, right, where you get a kernel panic or a deadlock or, you know, resource starvation type of attacks. And the third one is where you get information leaks, right? By and large those are the ones you would see when you look for security implementation bugs in drivers. And so that's sort of the by and large the kind of things that I'll talk about. Specifically things that I'm not covering are exploitation or bypassing of mitigations. So I'm not going to go into the most part. There's bits and pieces where I'll sort of mention some of that. But by and large I'm working for the assumption that bugs are going to be exploitable. If that's that way looking for, I mean this isn't an exploitation talk, which is what I pretty much started off with saying, so I won't be focusing on exploitation. I will instead be focusing on sort of finding identifying these type of bugs and sort of the other thing is that I'm assuming that you guys have a basic understanding of, you know, native code security. So I won't discuss, you know, the standard buffer overflow or integer overflow or, you know, what it means if you haven't unveiled a linked field. I'm assuming those things are understood. Same thing for, you know, C++ type of bugs or those kind of things. So just because I said I wasn't going to cover intro flows, I lied. I am actually going to say something about that. I'm not going to say that. I'm going to say it's not going to be a problem. Specifically because in the Windows kernel, you actually have these APIs to help you not have intro flows. They'll have a set of APIs you can use. And if you are a driver developer, you are encouraged to use them, even though I haven't seen them that often. And you should in fact use them, but when you use these APIs, you should always or never do is always take your turn value, always pass the exact type, and always you never do arithmetic when passing parameters. Because when you do any of those three things, you're basically sort of reintroducing your integer issues when you're trying to use these new APIs. And I have examples of this stuff, but for interest of time, I'm sort of just going to run through it really fast. Basically the stuff on the left is wrong and the stuff on the right is correct. So the stuff on the left is sort of the here's how to use the APIs wrong and here's how to use the correct use of the APIs. Okay, so basically now I come down to sort of what I did is I sort of sat down and sort of looked over what most common security bugs are in drivers and sort of tried to figure out what's the common theme around all of these. And I came up with this idea where there's five things that almost that most drivers will do if they're talking to user land where there's potential for attack surface and where you have some kind of entry point. And basically it comes down to when you create a device, when you talk to an IO manager, when you handle data, when you interact with the memory manager and when you interact with the object manager. By and large, those five areas a driver will if they talk to user, those are the five big areas where you have attack surface. I mean there's, it's not, I mean there can be other ways too, but by and large it's, my view it's dose five. So basically I sort of structured my next set of slides around these five areas and then for each one of these five basically I'll discuss a few things that I've seen gone wrong and so I'll have the first set of slides and I'll have a cute little icon on the left top that indicates this is a bug and then a cute little icon with some tools that says this is a fix. But by and large I go by these five and then for each one of these five there's a couple cases that I cover. So with that let's start with device creation, right? As I showed earlier in the in the sample driver is when you create a device you basically have this API call which is IO create device or IO create device secure and your driver basically goes to the IO manager and says please create a device for me. And when you do that you can pass you can say okay these are the access controls that I want on it and there's two ways of doing that, right? One is you call IO create device secure and you can pass around this what's called STDL string which is a string representation of a security descriptor which basically says who can and can't access the device under what conditions. If you don't call IO create device the other way to do it is you specify the STDL string and the INF file that comes with your driver and then that will be applied by your manager when somebody tries to connect your device. Commonly though these things are missing you'll see a lot of drivers that they'll call just IO create device and they won't specify the STDL string in INF as you fall back to either default or an ankle that's too wide. This is just a common sort of simple design problem that you see with drivers that are thrown to get too fast. Basically the idea is that you just sit down and try to figure out who needs access to your device and under what circumstances and I mean it has value to sit down and really think about this because what you're doing is you can try to reduce kernel attack surface and so one good proper engineering example would be to sort of sit down and say okay well we have these set of IOctals that really an admin should only ever need to do so what we're gonna do is we'll create a driver that is only accessible by an admin and then what we'll do is we'll create a surface on top of that to through UDP or RPC or whatever and then they have the service sort of reroute things to kernel only when it needs to so in that example you know the the downside of that is that you have to write more code and you may get some performance degradation but the upside is you get extra security and probably better reliability as well so that's one solution but the idea is basically that this is usually a design question where it's like okay who really needs access to our device and why right okay so with that said let's move on to some more implementation stuff for device creation when you create this device you call this IO create device API and basically you're allowed to pass these device characteristics into it and basically there's one bit in there which is this thing called file device secure open and it has very special meaning so basically when you create a device in the Windows kernel by default the IO manager will always assume that it is a file system device and what that means is that you know when you open the device the IO manager basically assumes that it has you know directories and files and so if you do instead of you open device you say device slash file the IO manager looks at it and says oh I don't have to apply the ACLs here the file system has to do it I'll just go to the file system device and let the file system device handle it and that works really well if you're a file system driver but if you're not a file system driver which most drivers are so if you're not a file system driver all of a sudden you have this problem where the IO manager thinks you are and it doesn't it won't apply your ACLs and then hands it off to you but you're not checking him either and then if you don't set this particular flag and you're not a file system driver you have this sort of by design automatic ACL bypass problem and so that's why pretty much every time you create a driver and you're not a file system driver you should really really set this flag and it's bizarre because it's actually documented and there's a lot of drivers that are not doing this and what's really funny is that this bug is incredibly incredibly easy to find there's a tool out there called device tree and it denominates all of your devices and you click on it and you basically it'll allow you that's kind of hard to see there but you can basically see who see what the ACLs are and then it also says what are this file device secure flag is set and so if the flag isn't set and the ACLs basically say oh it's only admin then you know you most likely have a security bug right there and it's one click away it's easy to find so what's the fix for this one basically always use the flag unless you are very specifically building a file system driver you should always always set this flag that's the general rule now some people go well you know you don't have to set this flag you can sort of make your own create this batch call back and you can sort of put it in there and yes you absolutely can you probably shouldn't because A you're now adding more attacks even though this functionality is already there in the Windows kernel just set that one bit and you get it for free two is what if you do do it yourself there's a few intricate sort of little things that you want to try and mimic exactly what the IO manager does in that case sort of discrepancies so yes you can't do that but unless you're doing a file system driver probably shouldn't okay so that's one of five so the second of the five I had is basically so working with the IO manager and the first case I had is sort of a simple one but this is a pretty common one I've seen in a bunch of drivers where basically you'll have some like I octal call back and it'll be in direct from it out direct and it'll say okay well we'll use the ERP MDL address and we'll sort of get our due to get system address for MDL save and we get our kernel pointer out of it all without ever checking if that pointer can be null it can be null by you know by sheer the fact that the way it works is that that pointer can absolutely be null and so before you ever touch our ERP MDL address you should check it for being a null pointer and this one of the case where it can happen is if it's a zero sized buffer if you're not doing that then you may very well have a bug check or worse and I'll talk about that later but the idea is that if you're doing that in direct check your MDL address before you use it which is almost a fix great so the second sort of problem ish in the IO manager class is using Metabuffered I discussed Metabuffered before and I said it is the sort of safer option and it is but there's still some things you'd know as A is similar with the MDL address is that the pointer you are given from the driver's spectrum in the ERP for Metabuffered is this ERP system buffer and it's the problem similar to the MDL address where that pointer can also be null so the first thing you should do is always check make sure it's not null the other thing with Metabuffered is that even though it is the safer or I guess the safest method of all four is that because it's seen that way you get the sort of false sense of security in that while probably in capturing of the whole buffer happens for you any and all of the content inside of that buffer still needs depending on what it is still needs any kind of validation so there's a set of developers that will say oh Metabuffered, cool, we're good we don't need to do any security checking they'll have some kind of embedded-length fields that are never validated and used to copy stuff around and all of a sudden you get memory corruptions left and right so yeah, the idea with Metabuffered is that A, validate for null and B all the embedded content still has to be validated one of my old times sort of pet peeves or Krill bugs is this basically once let's say you get an octal and you do all the work and everything and you're successful and the octal is completed correctly and you're about to sort of return and you basically there's a field in your IRP that says IO status and you basically say IO status, success and then there's this thing called IO status information and the information is basically an integer and in the case of success what that means is I have completed successfully and these are the amount of bytes that I give to you as output and basically the quite often what people will do is they'll go to IO status.information they'll say okay whatever output buffer length that the user gave me that's the number I give you and sure enough you do that and everything works, nothing crashes you get all the data you need and everything works perfectly but there's this very subtle thing where basically all of a sudden you have this information leak so basically what happens is when you do an IOctal IO manager and you have better buffered the IO manager looks at your input length and your output length and it says okay which one of these two is the maximum and it says okay let's say the output length is the bigger one it goes and says okay malloc of kernel buffer of that amount of data and then it sees what is the length of the input buffer and it says oh buffers X great we'll take the input buffer and we'll use that length of the input buffer I just located so if you have an input buffer that is smaller than the output buffer then the delta between input and output is initialized memory and if you now have this bug where at the end of the you're done all the way at the end and what you've copied into the output buffer is smaller than the total or located space that is available then that delta all the way at the end is initialized memory you know ios status that information is the full buffer length then that includes that piece of initialized data all the way at the end and that gets copied back to userland and then when userland gets the buffer they can just read that stuff it's kind of subtle because stuff doesn't crash stuff doesn't break you have to really look for it but quite often when you have these kind of bugs you can rig this so you get really large buffers you can say I want a one or two gig buffer and all of a sudden you get just piles and piles and piles of initialized data that comes out of KERL and contains all sorts of interesting things and so this is I reversed out of a driver but it's a fairly common bug to see because you can test all you want this thing will never crash it's kind of subtle so unless you know about this bug explicitly people don't find it either in testing or looking for security bugs it's basically the common case where you start your dispatcher team and you say okay this is the output buffer length user gave me and then you do all the work and then all the way at the end when you're done you say okay I will start as information is we start off with the output buffer length and boof off we go I'm an old son this thing just leaked information the other thing that's really interesting about this one is even though this is a KMDF thing and even though KMDF is built on top of this in KMDF you can still have this particular mistake you don't actually manually fill out the members of the IRP but KMDF gives you a set of APIs where you basically just pass parameters so it's a WDF request complete with information WDF request set information both of them fill out this IOS status field and if it's too large you have the same problem and because KMDF is open source you can just go look at source and see what these functions do and indeed if you look it up and you see what these functions do you'll see that internally they still look up the IRP and they go and say IOS status dot information then the length you give it the fix here is simple I mean it's a little bit hard to figure out initially because the the bug itself is so subtle that because nothing crashes nothing ever misbehaves it's kind of subtle that way but once you're aware of it the bug is really easy and it's incredibly easy to fix basically any time you set the information you should just be exact if you copy five bytes in there then say that always record about a bunch in there and then just make that your IOS and you never have that bug again the thing is though you have to be consistent about it and that can be pretty hard especially if you're reviewing old code and so it's really easy to miss one or two cases that is still lying around code that was written 20 years ago okay so the last part about your manager that I want to talk about and this is sort of in broad strokes is about and the reason why is I talk about it broadly and not specifically because it's incredibly complex stuff and I could easily fill a one hour lecture just about cancellation stuff so basically what happens is when you know it comes to the driver and the driver says okay well this is a that I can't complete right away so what I'm going to do is I'm going to say hey come back later I'll give you a signal later once I'm done you can go do something else now and then once the driver is done it says okay well this is now completed and then it signals user land and says hey I'm done the time in between is undefined depending on what the driver does supposed to do can possibly be a long time for example something that's disk based and so basically at any point after the ERP is and user land knows that the kernel sort of is working on it user land can go back to the kernel and say hey you know that driver that's holding my ERP I'm done waiting you've taken way too long just cancel the damn ERP and then basically the driver sort of gets woken up again finds your ERP and cancels it and the problem with this is that there's potential for all these like race conditions where you'll have one thread that's still trying to work on the ERP and you have another thread that has been instructed to cancel it and so you have one guy working on it and one guy cancelling it and if they're not in perfect sync you have all sort of these weird weird race conditions and so that stuff gets incredibly complicated and so there are literally dozens and dozens of examples of what the stuff looks like and I wish I could get into it but I can't I will say though that ERP cancellation routines that have synchronization issues often lead to deadlocks, memory leaks, race conditions double freeze and use after freeze so all the stuff that you know if you're a security guy looking for bugs that you start drooling when you hear those things they're all in there and I really wish I could get into detail and so fix-wise it's kind of there's no easy advice to give the stuff is just extremely complex and when you do ERP cancellation be very careful, be extremely conservative implement with care you know double check, triple check, have it peer review, check again you know try to use cancel safe ERP queues that's pretty much it unfortunately there's no better advice okay so now that we've got those two down let's move on to user land and data pointers and so I guess we should first talk about the elephant in the room which is basically whatever driver takes direct pointers from user lands what you're supposed to do these things called probing which is basically a fancy word for a pointer within the range that user lands is allowed to be and including the length and then the end pointer and does it not overflow and all those things and that's basically what the probe for read and probe for write functions do okay there we go so basically if your driver takes points for user lands you're supposed to do this kind of probing if you don't probe so let's say you take a pointer from user lands and you forget to probe and you read from it basically that's potentially information leak you could be reading from anywhere and you could definitely crash and depending if that information gets sent back to user lands you could potentially in full leak so that's bad but if you get a pointer from user lands and you don't probe and you write from it you basically get this sort of write anything anywhere primitive in kernel and that is extremely bad you basically just own the kernel if you do that and so basically to try to prevent these cases you do probing and use the probe for read and probe for write APIs and so this is one example I think this is some intel driver where basically you take an input buffer and then basically there's no probing anywhere in between and you just sort of read and write from it but suffice it to say these bugs unfortunately are extremely extremely common another interesting sort of nugget with this stuff is that I think this was noted a few years ago so these probe APIs basically the way they work is you say okay here's my user land pointer here's the link that I want you to probe for and here's the alignment and you can have these very subtle bugs where people do probe but for whatever reason and when those APIs basically go those probe APIs basically say okay well if the length is zero short circuit logic return success and that is perfectly valid if I say probe of zero then I mean clearly that's zero the thing is that there's a little bit of risk here when that's being done basically the idea is that if you probe with the length of zero and then later on use that address but then you read or write one byte obviously the bug there is that you have a length discrepancy between zero and one but the difference is that because the way the probe APIs work this length discrepancy becomes incredibly exploitable right if the probe APIs had worked away where they said okay well the length is zero what we're going to do is we're just going to make the length one and we'll just probe that particular address it would make these bugs far less exploitable and so that's why there's risk here the common case for where these kind of bugs can happen where you do have a probe with length zero is A you'll see this code where they'll take like an input buffer length an input buffer and they'll just assume it's of a certain length and they never actually double check what the link really is but they still pass the original input buffer length to the probe functions so if you forget that kind of length check you end up with these exact kind of issues the other one is where if you do length checks is that you may still have like length integer overflows where the probing length just happens overflow right back to zero so that that can happen too so the fix basically is perform correct consistent probing always always always probe it's easier than it looks you always forget one somewhere it's one of these things where once you start like if you if it's only one simple function then that tends to be easy but you'll have these things where you have these complex drivers where it calls a function that calls a function that calls a function and somewhere down the road you're not quite sure if it's a user pointer or a kernel pointer and then you know you end up missing a probe so the second sort of variation of this is okay let's say you take a kernel pointer you take a user pointer in your driver and you've now probed it and then you're going to start using it you can still do things that are horribly wrong so every time you use a pointer a user pointer in kernel even after it's probed is that you must always wrap this in a try accept the reason being is that if you don't do any try accept is that after you after it's been probed you can have a second user land thread that takes you know your mapped address and basically changes the page protections or unmapped them and then the somewhere in kernel after the probing they go oh we'll just use these pointers and all of a sudden they're reading or writing to unmapped pages which will throw an exception and so that's why you must always do a try accept so the obvious case to forget exception handling but in the less obvious case is maybe you do have a try accept case but the thing is the accept case is rarely exercised because it's the exception it almost never happens and as such these bugs in exception handlers often don't show up in testing simply because it's never been exercised and so even if you do a try accept with user land pointers it's still quite often to notice things like memory leaks and reference count leaks and so the fix here basically is a use try accept using user land pointer and b is make sure your exception logic is sane exception logic can be really tricky so you want to design implement that stuff with care sort of another bug related to user land pointer is what they're called double fetches so the idea basically is I'm user land and I go call into a driver and say here's user land pointer and the driver takes the pointer it does the probing and it does a try accept and it does all that stuff right but it'll take a pointer and it'll say let's say it contains an embedded length field and what they'll do is they'll do reference user land pointer and they'll make sure that length field is valid it's got to be let's say bigger than 10 and smaller than 100 or something and then sure enough that's true and then right after that they'll do reference that pointer again they'll take the length field and they'll use it to do something else and the difference is that there's a race condition here where basically between the first check and the second check because I could have a second user land thread that basically overwrites that memory so that when you check it it's okay and when you start using it, it's some totally different value and this is a simple example of what that could look like where basically you'll have some function in kernel where you pass it some some user pointer and user pointer will contain some embedded length field so in this case it's got to be smaller than 100 and then they use that length field to copy into a local stock buffer that's 100 bytes and so between that check and read the first time and the use in the copy memory you can have a second user land thread that overwrites the length field and basically you end up with memory corruption and this is an example of a bug that is a double and triple fetch that I found in the FQAV program a couple of months ago and I have a link to the advisory below if you want to look at it the fix for this kind of stuff is basically capture data validate and only then do you use never fetch again keep the data you captured and validate that and use that similarly for user land type stuff it's sort of newly references I considered that in the same class it's not specific to Windows drivers I actually spoke about this particular bug at the CC conference here in 2006 I was talking about the Linux kernel at the time but it applies just as well to the Windows kernel and basically the only difference is in trying to trigger mapping the null page in Linux and Unix you basically do mmap and you say fix and you just give it an address of null and it will go map it in Windows it's slightly more tricky where basically if you try and map the address null it basically sort of goes like oh null means you don't mean an address I'll just look it for you and so the way you're supposed to do it is you say and then what it does is it rounds down to 1, 2, 0 and then it goes to maps to null page but it's essentially the same thing this used to be a much bigger deal as of Windows 8 mappings of the null page are disallowed which isn't so that's what they'll tell you and it isn't entirely accurate 64-bit windows mappings of null page are disallowed on 32-bit windows mappings of null page are disallowed by default and it's called the NTVDM which is the virtual DOS machine so if you want to run old DOS games for example you have to turn this stuff on and all of a sudden you can have no point references in your old DOS games and those would then allow you to explore no point references in Windows drivers essentially but even if this little corner case doesn't work and we take the premise that null mappings can't be allowed in Windows as of Windows 8 there are still some exceptions where null pointers in kernel might be exploitable so what if you have null plus a large offset where the offset basically becomes your pointer a common case of this might be where you have a mem copy a reverse mem copy or where you pass null to a function that has a special meaning the fix is basically defensive programming we all know about no point references they're nothing special I'm just going to skip this because I'm running out of time memory related bugs this is number 4 out of the 5 I had basically if you're a Windows driver there's two set of APIs you do to a locate memory it's the locate pool and a locate pool with quota basically there's a discrepancy between the two APIs when it fails returns null and the other API when it fails throws an exception so if you mix up one with the other old son you're checking for a turn of value but you might get an exception or you have an exception handler but what really happens is you get a null pointer back so you have to get those straight and I believe one of the pool with quota actually allows you it has an option where you basically go to it and say if you have an exception handler if you have an exception handler if you have a throw in give me a null I would recommend using the API that way because then you have consistency but that's one of the ways you can go wrong using these APIs there are two other things one is basically so the a locate pool with quota is basically done so that any time you have an exception handler you have to do allocations based on the quota of the user land caller so in my view any time you do an alloc based upon driven by user land you should always do the quota alloc if you don't I think that's a bug and then of course the other or simple case obviously yes if you don't check for a value yes you may have null references if you don't handle exceptions if you don't have a faulty exception logic these are pretty simple fix-wise it's like yes check for values yes charge quota when you have to yes cap buffers please don't have unbounded allocs so the second sort of memory related issue and this is again this is new as of windows 8 and this is mostly a mitigation type thing so by default in windows when you used to do a memory allocation the pages you would get back were executable as of windows 8 that is no longer true so there's the page pool and non-page pool for the page pool by default they are now non-executable for the non-page pool they still have to be and so they came up with this new thing called a non-page pool not executable and the idea is that any time you do a non-page pool alloc nowadays is that you specify this particular pool the non-page pool and x you really need executable non-page pool memory which is not impossible but it's pretty rare that you actually need it to also be executable and in that case sure go ahead and use the non-page pool one but in any other case where you need non-page pool memory and you don't need to be executable you should really use this flag it's not a bug per se by itself it's really you know this non-executable non-page pool is really mitigation expectation so the idea is that if you use this incorrectly you're kind of killing a mitigation and so basically the idea is that you know the idea is that you use this the non-page pool and x if you're going to do if you're going to do non-page pool allocations so the next memory-related bug sort of the there's an API called MMM get system address for mdl which is basically if you get an mdl how do you get it mapped in your virtual address space of kernel this old API basically if you call it there's a risk of if it can't map it what it does is it causes a kernel panic and so there's a new API called MMM get system address for mdl and so you know if you use the old one that's a bug and while this seems easy there's still a fair amount of drivers that use the old API and I started thinking about it and I think there's two reasons why you would still have that A if you have an old codebase but I think B is if you're developing based upon driver books and document basically books that basically tell you to use the old API because the book was written before new API existed and there's actually most of these kernel driver books are relatively old and so they advise using that API and I think that's why these bugs are still around yeah so this is basically what I was talking about earlier right where when you have these mdls that are used to create a double mapping between user and kernel all of a sudden you end up with these double fetches again where let's say you have like an article that comes in with method indirect and you take it you get your kernel pointer out of dmdl and you just start using like a better length fields then not realizing that that memory can be changed any given time by a user land thread so you end up with all of these sort of double fetch bugs again the thing is they're not quite as obvious as the ones that are directly from the ones I showed earlier because you're dealing with a kernel pointer so if you just look at the pointer you would assume that user land doesn't really have direct control over it and so that's why they're not as obvious this is a very simple sort of example of what that would look like right so you have an mdl and you say okay it gets the method for mdl safe and then you get your data back and then you say embedded length and then you kind of go okay we'll use that embedded length but because it comes from mdl you can have a second thread that just changes your embedded length after validation but before use did I okay I'm going to skip over this because I'm really close to running out of time so I think I have two more minutes sweet let me get to the fifth part object handling where basically so when user talks to kernel and it basically says okay I'm going to put some handle to some file go do something with it the way the kernel goes says okay how do I translate this handle to a kernel pointer I can use you call this function called reference object by handle and that basically translates one to the other and the api looks like this and it has this thing called object type in axis mode and basically object type says enforce that the object has this particular type and Windows has about 40 different types of objects and so if you take a handle from user and kernel and use this api and you say object type is null there's no type enforced and you get all of these classic type confusion bugs so that's one two the second bug you can have here is basically where if you in the axis mode you're supposed to specify user and you specify kernel then all of a sudden user land gets to give you handles that are really kernel handles and so because handle numbers are really not secrets they're easy to brute force and all of a sudden if you say axis mode is kernel mode user all of a sudden gets to tinker with your kernel handles okay another thing with this kind of object stuff is that once you reference it and you're done you're supposed to dereference it and then you're done with your kernel and so quite often you'll have these bugs where basically let's say you have some kind of exception that gets thrown and you forget to dereference a handle and all of a sudden this handle leaks so this is a no sort of ref counting bug except the problem with this is if this is repeatable let's say your ref count is 3 or 2 bits if you can repeat this about 4 billion times it goes to 0 and all of a sudden you have a awesome perfect so if you ref count in overflows all of a sudden it goes to 0 your object gets freed but you may still have a reference to it and all of a sudden you get these use after free cases as of I think windows 8 there has been code added to the ob reference calls where basically they detect an introflow and they cause a kernel panic and so all of a sudden these bugs are no longer exploitable beyond a leak which is great but up to win 7 that was still a problem related issues are a double ref count decrease use after ob reference at a passing of 0 and the fix basically always balance your reference calls and I'm done well unfortunately we do not have time for questions