 My name is Moen Schenk. I come from a small consulting firm in Denmark. I do a lot of... Better? Great. I do a lot of blogging about exploitation, both use mode and curl mode. You can read on the blog and also on github. So this talk is primarily about how to leverage these vulnerabilities all from a low integrity standpoint inside a sandbox. So expect a lot of hex and C code here and small days. First we'll go through some brief history on kernel expectations trying to get around up to date and then look at what kind of mitigations have been put in place in the latest versions of Windows 10 and see how we can actually overcome them. So right about vulnerability. The base case of this vulnerability class is that you can write a control value at a control address somewhere in the kernel. Where more commonly found bug, we most likely write a non-controlled or semi-controlled value at a control address in the kernel. So once that's possible we have to leverage this right in some way to get curl mode code execution. And the most important part about this is we have to know where to write. So that's one of the problems. These techniques shown here are for write vulnerabilities but they can be used for all other vulnerabilities as well like pull off those or use after frames. So looking at Windows 7, it's actually quite easy back then. First of all you could use just allocate the code directly to the non-page pool because it was executable. Then use the built-in APIs of the operating system just to return the address of the allocation. Then use the write or write to all write a function address like in a held dispatch table and then just call that and get your code executed in the kernel. Even easier you can allocate the usable memory and execute that from the kernel as well. So that's very easy back then. Going forward Microsoft can make quite a lot of improvements for security wise. First of all the APIs to call to get the curl addresses have been blocked from a sandbox so it's not possible to get these allocation addresses anymore. Furthermore most built-in APIs use the non-page pool NX as a standard now which means that the allocations are not allocated on executable pool memory. So even if we knew where they were we couldn't execute code there. And finally user mode, surprise mode execution prevention or SMEP has been implemented as well which means if we try to execute code in the user mode from the kernel we'll get blocked as well. So the techniques in Windows 7 don't really work anymore. So what we need is we need a primitive, read write primitive just like we know from the browser exploits just in the kernel. Some of the most known ones are the bitmap primitive and the window primitive. These have existed for some years. The idea is for the primitive that you can get the address of a bitmap through a table called the GDI shared handle table. Once you have the location of a bitmap you can use the vulnerability to overwrite the size of it. Once you're overwritten the size make sure you have two bitmaps after each other then use the first one to overwrite the second one and overwrite the pointer to the data area and use the second one to read or write any memory. Similarly we can use the window primitive. It's kind of the same thing we leak the address of the window this time using another table called the user handle table. Overwrite the size of what's called the extra bytes and then use the set window long pointer parameter the API to change these extra bytes and since we're overwritten the size of the extra bytes we can actually change the pointer for the string and pointer of the window. This allows us to read or write again anywhere in memory using user mode APIs. Regarding user mode kernel execution sorry user mode pages execution we have to bypass SMEP or get the DEP in the kernel disabled. The most commonly used method is overwriting the page table entries using the right primitive so we find the address of the page table entry for the code one to execute and simply either flip the NX bit or change the user mode page into a kernel mode page. Then we can execute it again. Sometimes a slide bypass in the kernel is needed as well for some exploits especially when you want to run the shell code. There are two known techniques. First of all the hall heap has been static for many years so there is a subpointer there so you know where there is a subpointer that is static address. Just like the SADT instruction can be used you get an address back from the destructor table and in that there is an NC kernel pointer. Then you use the read primitive to read this pointer out and find the base address by simply taking the NC kernel pointer going backwards in the driver to find the AP header. This is kind of like where kernel mode exploitation was before the anniversary update last year. In the anniversary update Microsoft did quite a few changes. We go through here see how they try to mitigate our ways of doing kernel exploitation. First of all they randomized the page table entries which means we can't just flip the bit because we don't know where it is anymore. They removed all the curl addresses of the bit maps and other objects from the GDI shared user handle table which means we don't know where to write anymore so we can use a bit web primitive. And also the SADT command has been mitigated if we're running in using Hyper-V then it simply gets that it slows this value so we don't know where the NC pointer is. So this breaks some of the stuff. Additionally there are also these mitigations which run really public. What they did is that they make sure that the string end pointer of a window object has to point inside the desktop heap because that's where the actual name gets allocated. So we try to read right outside the desktop heap by all writing the pointer we crash. So it also breaks the window primitive. So let's see if these mitigations actually work. Let's look at the bitmap first. So the bitmap stored somewhere in memory is actually stored in what's called the large page pool. This takes place if the bitmap is at least a thousand bytes long or more. Of course the large page pool is randomized and rebooted so we don't know where it is beforehand. So we need some kind of kernel information to find it. Luckily if we look at the tab we have the field of Win32 thread info that should contain the pointer into the kernel. It doesn't point to the large page pool but it does point close to it. So we could do, we should try to stabilize the pool. First allocate a few very large bitmap objects. Once this is done we can add a static offset to the pointer we found and we actually point inside our bitmaps. Sadly we only point inside the data of the bitmap so it's not really, we cannot really use that format but what we can do is we delete the second of these bitmap objects and try to allocate around 10,000 new bitmap objects of a page each. We'll find this point actually does point into the new bitmaps. This is a way to find the bitmaps again. Since we know what the bitmap is we can just overwrite the size and then again use two consecutive bitmaps to actually get a primitive back. So even though Microsoft removed the addresses from the table it's not really needed. And you can see here I did a simulator write what were just by overwriting the length and then cooking a few further executions of the code is possible to read out the content of the kernel and also write to the kernel. So looking at the window primitive, well we're not allowed to write you write outside the desktop heap. This is due to a new function called the desktop verify heap pointer and every time we try to actually use the string on pointer by the reading or writing with it well we have to be validated through this function. So it takes the base address of the desktop heap and the size of it and checks it's inside these two. But what we notice is that the pointer comes from an object called the tech desktop object. And there's actually no validation performed in this object pointer. And the pointer is taken from our object or window object. Which means if we can find this pointer and replace it we control the verification addresses. So what we do is we use the leak address which is known from the user handle table and just leak the address of it. And then we just when we overwrite the string on pointer we also overwrite the pointer for the tech desktop object because it's the header as well. And then when we try to read it right well we control where the desktop heap is. So verification succeeds everywhere. Again we can see here I try and simulate a write what where. And just like before we can read and write the argument to the kernel. So yeah that's what was put in the version update. But this talk was about creators updates. Let's see what was done there because they did additional educations. What we find in the especially for the window object is that the user handle table we should use to disclose addresses of the window objects has been changed. So before we saw a lot of real addresses there which were all the objects that have been removed now. So we cannot know where the objects sorry we don't know where the windows are anymore. Additionally the field called client delta which is the offset from the user mode map desktop heap to the actual desktop heap has been removed as well. So it's not really any way here of finding the window objects anymore. Additionally we use the set window long pointer API to make sure we rub the extra bytes to the kernel and overwrote the string name pointer. The new update here we see that when we perform this action the content which was the kernel mode address we were interested in is no longer written to kernel mode. It's written to user mode. So even if we actually knew where it was and we overwrote the length of it when we tried to actually overwrite it it doesn't work because we write the user mode instead. This of course breaks the primitive doesn't work anymore. Some additional changes is that the size of the bitmap object header is increased. This doesn't break the primitive but we need to make sure that we change the size so that location alignment still works. And now the whole heap is also randomized. So we don't know where the empty pointer is anymore. So we actually try to do a lot of stuff to break the primitive. Let's see if that actually works. As I said the client data is now gone but if we inspect memory we find that it's been replaced by a user mode pointer. And we check what's there at the user mode pointer. We actually find the desktop heap. The kernel maps desktop heap to user mode to make lookups faster. But let's also make sure that we actually have the kernel addresses. The user handle table was nice because it was a metadata so it was faster to lookup. But we don't have that anymore but we have the actual data. So what we do instead is we just manually search for it. So we take the handle value and we search through the desktop heap until we actually find it. Once we find it we know where it's at. So we're still able to leak the address. But there was a different problem. Even though we knew the address we ordered the size and extra bytes. We couldn't use it anymore because we're just writing the user mode memory. Looking at how that's working we find that the size of extra bytes is actually defined when we register the window class. But when we register the window class we set different parameters as well. We also set a parameter for an object called taxiLS which also has extra bytes. And even better they also have APIs to set these extra bytes and it's not the same. This one is called set class long pointer. And lo and behold even though Microsoft tried to mitigate it they only changed one of the APIs. Didn't change the second one. So the extra bytes from a taxiLS object are still placed in the kernel. Which means we can allocate a taxiLS object before the window object. We can use extra bytes from the taxiLS object so we'll write the string non-pointer of the window object. And this way we have a read write primitive bag again. It's not that easy to mitigate them. So this is clear. Even though in Creative Update we did a lot of changes we still have our primitives. But it did a lot of stuff right? To make sure the kernel was better. Looking in memory almost all the kernel memory is not randomized. The only place I know of that isn't is the user shared data structure for the kernel. But it's not executable and there's no pointers there. So it's not really interesting. The whole heap is randomized as a DT is mitigated so we need some new way to leak the NC kernel pointer. My idea is that perhaps you can find a leak that's primitive related. Which of course mean we need two bypasses since we have two primitives. And I want it to be the NC kernel pointer we leak. So to try and pursue this idea I thought about React OS. React OS is the open source re-implementation of Windows XP where the diverse engineering of all the structures which means that undocumented kernel structures are found there. Even though they're for Windows XP in 32 bit they might give some hints. So looking at the data structure for the bitmap or surface object as it's really called in the kernel we find a field called hdev. And the explanation for that hdev is it's a pointer to another object called the dev object. Inspecting further we find React OS has documented this object as well. Actually it contains a lot of function pointers. Which means these function pointers point to some kind of kernel driver which again gives us our kernel bypass we've got check our bitmap object we find that hdev field is empty. There's nothing there. Which means we cannot find the integral anyway. Luckily the bitmap we created using create bitmap API isn't the only bitmap. There are several other APIs to create bitmaps with. One of them is the create compatible bitmap. And trying to use this API shows that we actually do populate the hdev field now. And while this is popular we can verify in the debugger it does contain a pointer to a driver. It's not the NC kernel driver but it's a driver. So how do we do this in the exploit? What we do is we know where the first bitmap was at. We found that address so we go to offset 3000. We free this bitmap then we reallocate it with a compatible bitmap. We just spray a couple hundred of these to make sure that one of them is reallocating the same spot. Once that is done we can just read out the NC pointer for the kernel driver. The reason I took this driver, sorry this function, not any others, is that this function actually contains a call at offset 2B into the NC kernel. So from this it's very easy just to read out NC kernel pointer and then again use the read primitive to find the base address of the NC driver. So in this way we have a generic bypass to find the NC kernel using the bitmap primitive. Looking at the data, it's also documented on React OS. The header structure for the window object is kind of convoluted. It's a lot of structures nested in each other. If we follow the chain we find that a lot of the different header structures end up pointing at the E thread of the current process. The E thread is very interesting because it contains a pointer to the NC kernel. So again we just use a read primitive and the location of the window object to read out NC kernel pointer and then find the base address by looking for the P E header. So in this way we actually have a generic way to find the NC kernel pointer, NC base address no matter which primitive we use. As I said I would be dropping some extra stuff here. So actually while I dig this research I found a couple other bypasses. One of them is primitive independent. Which comes from the tab. As I said the Win32 thread info field we use that to disclose the address of the bitmaps. But it's more than that because it's actually a thread info pointer and a thread info pointer points to the E thread. As we saw before the E thread contains the NC pointer. So in this way we can just read out NC pointer without actually allocating any kind of objects if we have a read primitive. So this would work with any other primitive. If we need to know where the bitmap primitives are instead of using the tab we could also use the desktop heap. It actually contains on the created update we can find a pointer there. Use the same static offset from it and we get the address of the bitmaps again. This was a luckily fixed on created update after the submission of dark but it still works on the anniversary update using the thread local storage pointer. We can also use an address into the kernel which could be used to find the address of the bitmaps. And finally also mitigate on a created update but still working on anniversary update. Instead of allocating, actually allocating the window object and using the head of that we can use the gshet infrastructure directly by stretching through that and actually finding the driver to the entire kernel. But simply looking through structures until we land the correct window object. So in this way we're sure we have even if one of them is fixed. Although it's always cool to bypass kernel as it are. What do we need it for? We need to do something. And what we did before is we used the read write primitive to write the page table entries. So the pages were either executable in the kernel or we flipped the bit so it was a user-mode page which turned into a kernel-mode page. The reason we could do that was before the page table entries we're starting at a static base address. And we just calculate the address of the page table entries for any address. But now since it's randomized we cannot do that anymore. So we need to try to de-randomize it. And my thought is that even though it's randomized the kernel must use it a lot. So it must have APIs for this. These APIs must work even if there's a randomization. So I looked up and I found a couple different APIs which are used. And the most simple one I found is called the MI getPTE address. And it's simply used between an address and the MI getPTE. Looking at it in ITER doing a stat analysis we find that on the left side it still has a static address. So it's not compiled into it. It gets changed at runtime because we can see on the right side that at runtime the idea is different. But also means that when the driver is running in memory we have a way actually finding the base address in the page table entries. So what we do is we find this function and from that we read the randomized base address. And how do we find this function? One way is of course since we have the base address of the NC kernel we just add a static offset. The problem with this is it doesn't work across patches. Every time there's a patch the offset will change and we need to fix it. So better way is to do this dynamically. Since we have a read primitive I would like to look this up. What we do is we just dump the content of the NC kernel then straight through it using a hashing function. As it turns out a hashing function that just adds the first keywords of the function is collision free. So we just calculate it has beforehand and then simply doing the execution of the exploit use the read primitive to dump the content and go through until we find it. And doing this finds us across a lot of situations. From that we can just read it out since it's at offset 13. We read out the base address and then we can use the old formula again with the updated address. And from this we simply find it. Simply find the page elementary and we're back to where we were before. We can just allocate this time this case of shell code directly in kernel memory, flip the NX bit using page table entries and the write primitive and then we can call the shell code like before by overriding the shell and then invoking it. Similarly we could also just allocate it in user memory and flip the bit so it becomes a kernel page instead and execute it. Both will work. So this is like what we did before. So even though they've implicated a lot of changes a lot of mitigations, well we can bypass them all. And just to recap the steps here we use the vulnerability to create a read write primitive. From that we leak the base address of the NC kernel using either of the primitives. We locate the address of this function. From that we get the randomized base address of the PTE tables. We can now calculate the PTE of our shell code address copy your shell code to the page then overwrite the PTE of the shell code and run it. So yeah, this is just like we did before. There's not really any changes how to do kernel exploitation now. It's the same thing. But I like the old days of Windows 7 where we could allocate executable kernel mode memory and just execute it. Now flipping in kind of bits of the PTE table entries. Is that possible in Windows 10? That would be awesome. So let's start looking into that. And the first thing you come across is how does kernel actually allocate pool memory? Well it uses an API called EXAllocatePoolAttack. And this API takes some arguments one of them is the pool type. And even though the new standard is the non-page pool NX the old one is still supported and it's also now called the non-page pool execute. So if we could invoke this API with the correct pool type we'd allocate executable pool memory. And this API also returns the address of that pool memory. The only problem is of course this API is a kernel mode API. We cannot call it directly for use land. So what we could do is of course go write the health dispatch table with this API and we could call it. The problem is we need to control the arguments. And when we invoke the function through the health dispatch table we can only specify two arguments and have specific values for it to actually work. So this system call doesn't work. We need a different one. Looking around I found a very easily pronounceable one that is NT, GDI, and DDDI create allocation. We find that it jumps through all the 32K kernel drivers until it ends up being called through a function table. So it's a very thin trampoline nothing gets touched on the way there. And looking at this function table we find that this calls into a different driver and best of all that the arguments are not modified from the time of our system called New Zealand to the execution of the function through the function table. Additionally the prototype of this function the user mode function is that returns a Q word which could be an address. So it fits all our requirements to actually call it. To call the allocate pool attack. The only thing missing is that we should be able to actually write that address there. And inspecting the function table we find that it's writable. So we can just patch it using our write primitive and overwrite it with the allocate pool attack API. We also find that since it's a function table it contains pointers we don't know beforehand so we cannot use a hashing function to find it directly. So we need to find a function which uses this function table. And one function I found which works is the driver accumulation stage change notify. And it's quite simple function simple hepper and calls into the function table. The only problem is it's located in the RIN32 cave full.sust driver. So we turn one problem into another. We need to find that kernel driver. One way to do that is using the PS loaded module list. It's a link list which contains all the kernel drivers currently loaded. And looking through the structure we find that the name of one driver is located offset 60. And the base address of that driver is located offset 30. So again we can use our writ primitive to read through this list. The correct driver name and just read out the base address. So now we have the base address but again we turned the problem into a different one. Now we need to find the PS loaded module list and since this is a link list we can use a hashing function because we don't know the values of it. So again we need to find a function which uses this. One function is the key capture persistent threat state which is located in the kernel. We have the inter kernel base address. So now we can use again, use a hashing function. Find this function. From that get the base address from that get the PS loaded module list from that get the base address of the N32 key full dot sys from that get the driver aculation state change notify and from that get the function table. Luckily all of this takes less than a second when running so it's not a problem. Once we got that we simply override offset 68 with the allocate pool attack API and then call it allocating pool memory. And returns our allocated pool memory. Then we use our write primitive to copy the shell code into it and execute it by writing the function table with the allocated pool memory. This is like the days of Windows 7 allocated execute pool memory and execute it. This has a few more steps so it's not more efficient in any way but it's a different way. So if one method gets fixed I'm going to try another one. Yeah. So let's try and see it in action. So we have a Windows 10 here and I'm running in a low-managery level so just like inside a sandbox. So we try and run it. Does the pool spring and wants us to simulate the write what where. So let's do that from the currently barger. We can see here that it's we get the address here and it contains the length of the bitmap just as we wanted. So just simulate the write what where. So I increase the length here and let it execute and we go back. We haven't had a crash or we've got a system shell. And as you can see from the time I actually entered the simulated write what where the execution was less than a second. So even though all these lookups don't take any time in reality. So in summary even though there have been a lot of mitigations in the current versions of Windows 7 the latest versions of Windows 7 none of the old techniques are really broken. We can revive them, we can bring them back. So rewrite primitive work, page-shaped entry, overwrite work we can actually leak the intercurrent in new ways which didn't work before but which could leak it. And we can actually also now allocate executable kernel pool memory. The code for this is already on GitHub so you can get it there if you want to play with it. I want to say that of course I didn't find out all of this. There was previous research here which I wanted to credit people for. That's it. Thank you for listening to me.