 Hi there, my name is Bill and thanks for coming out to my talk. So I'm 19 years old, I'm a rising sophomore at the Rochester Institute of Technology, I love Windows Internals, I'm mostly self-taught with some guidance, and I have a strong game hacking background which is why I'm so interested in the Windows kernel. So what does this talk about? Well, in this talk we'll be going over loading a rootkit, communicating with a rootkit, abusing legitimate network communications in order to communicate with our malware, an example rootkit I made and a design choices behind it, executing commands from kernel, and tricks to cover up the file system trace of your rootkit. So when you hear me say rootkit, I'm going to be referring to kernel level rootkits for Windows. So why would you want to use a rootkit? Well there's a lot of reasons. Kernel drivers have significant access to the machine, unlike in user mode, you pretty much have the same, you have the access to anything and everything. Kernel drivers have the same privilege level as a typical kernel anti-virus. Generally speaking, you share the same access to the same resources. There are less mitigations in security solutions targeting kernel malware. If you can load kernel code, there's a lot you can do to cover up your tracks. Anti-virus often have less visibility into the operations performed by kernel drivers. This is because a lot of anti-virus rely on user mode hooks in order to gain visibility into potentially suspicious operations. In kernel however, you can't directly hook the Ntos kernel functions due to patch guard, reducing the visibility anti-virus has significantly. And finally, kernel drivers are treated with a significant amount of trust by anti-virus. Let's take a look at an example of that. So whether you're running a consumer anti-virus or a corporate EDR, chances are whatever solution you run treats kernel drivers with a significant amount of trust. Here I have two examples from Malwarebytes in Carbon Black, specifically their pre-processed thread callback functions. These functions are called whenever a handle is created to a process or a thread. In the case of Malwarebytes, they check to see if the process ID is less than 8, and if the operation is for a kernel handle, then it will not continue processing that handle creation. In the case of Carbon Black, if your handle is for a kernel handle, or if the caller is not from user mode, it will not process that handle being created. Let's talk a little bit about loading a root kit. So the first option you have is to abuse legitimate drivers. There are a lot of publicly available vulnerable drivers out there. And with some reversing knowledge, finding your own zero-day in one of these drivers can be trivial. Examples of vulnerable drivers include Capcom's anti-cheat driver, Intel's NAL driver, and even Microsoft themselves. Now the reason I put vulnerable in zero-day in quotes is because often times these drivers require administrative privileges to communicate with them. Following Microsoft's standards, ring three administrator to ring zero is not a security boundary, so technically they're not vulnerabilities. But that doesn't mean we can't abuse them. Abusing legitimate drivers has quite a few benefits as well. You only need a few primitives to escalate privileges. Finding a vulnerable driver is relatively trivial. A good place to start is OEM drivers. And it's difficult to detect due to compatibility reasons. For example, let's say a driver had a vulnerable component that could potentially be exploited. The problem antivirus face is that if the legitimate application also uses that vulnerable component, how do you detect a malicious program requesting this vulnerable component versus the legitimate application requesting that vulnerable component? That becomes a tricky problem. There are some drawbacks for this method as well, though. The biggest issue I've had with this method is compatibility. Oftentimes the primitives you get might not be a lot, and even then it can have stability issues such as race conditions. Now this is a significant threat if you're writing red teaming tooling because you do not want your malware to blue screen a victim. That's probably the worst case scenario. It would be a pretty big indicator of compromise and you're probably going to get caught. This is why I generally stay away from this method. Now another method is to just buy a code signing certificate. This is completely valid for some red teams, especially for targeted attacks and primarily because you're not going to have any stability issues. It's the normal way of getting a driver signed is just to buy your own certificate under your company's name. But it will reveal your identity and also it can be blacklisted. Now this blacklisting doesn't happen so much nowadays. This is something that I know is being worked on by antivirus vendors and even Microsoft themselves. It's not widely spread yet, so you're probably not going to see too much blacklisting happening nowadays. Another option is just to use someone else's certificate. You'd be surprised at the number of publicly available leak certificates online. A good place to start if you're looking for one is game hacking forums. Now leak certificates have almost all of the benefits of just buying your own certificate without the anonymization. But the leak certificate you use can be blacklisted in the future. It goes back to the same point about buying your own certificate. And if the certificate was issued after July 29th, 2015, you can't use it on secure boot machines running Windows 10, 1607 or higher unless it is an EV certificate which is probably not going to happen. In most cases, Windows doesn't even care if the certificate appears to be expired or even revoked. This is because what you see in the digital signature section of a driver is not what the kernel checks when determining whether or not to load a driver. So this view you see here when it says a certificate was explicitly revoked, that's what when verified trusses are returning, not necessarily what the kernel carries about. So if you do come across a certificate that's expired or revoked, chances are you can still use it for signing a driver. Now if you don't want to use one of the publicly available certificates out there, you can also try to find your own. One method I found interesting was to use open S3 buckets to search for private keys. I used a site called Greyhawk Warfare which searches these open S3 buckets and I was able to find over 6,000 results for the PFX and P12 extensions which is definitely a place to look at if you're trying to get your own certificate. And the best part about this method is that it's undetected by the bulk of antivirus. Now I don't understand if antivirus had trouble detecting a certificate that was released a month ago because that's recent, but some of these certificates have been out there for years and yet even the most next generation cutting edge solutions fail to detect this basic threat. Let's talk about communicating with a rootkit. A tried and true method of communicating with your malware is to just call home to a C2. Firewalls can block or flag these requests that go to suspicious IPs reports and even for more complex methods such as DNS exfiltration, there are some solutions being developed such as advanced network inspection that can try to catch some of these methods that attempt to blend in with the noise. Somehow we're taste a root that the C2 connects to the victim directly to control it. Now this is extremely easy to set up, but at the same time it's extremely easy for a firewall to block this and it's very difficult to blend in with the noise given that you're using one port exclusively. A more advanced method I've seen is to hook into an applications network communications directly. Now this is very hard to detect, especially if you're mimicking a legitimate protocol, but it's not very flexible because in a lot of environments you're going to have different ports exposed, different services running and so if that one port or service isn't exposed you can't use that line of communication, which isn't very reliable. So what I wanted when choosing a communication method was limited detection vectors, flexibility for a variety of environments, while making the assumptions that some services will be exposed, which is especially true in corporate environments that have active domain services running, but inbound and outbound access may be monitored. So it's going to have to be a method that has a low detection vector opportunity. Now application specific hooking was perfect for my needs, except for the flexibility. Is there any way we could change application specific hooking to where it isn't dependent on any single application? Well, it turns out there might be a way. So what if instead of hooking one application directly, we hooked network communication similar to tools such as Wireshark. So this means that we would hook every packet that reaches any port on the system and we're able to inspect it. And then what we did is on our C2 we created a custom packet that had a magic constant value and right after that magic constant there was information we want to pass to our malware. So what we could do is we could send this malicious packet to any port on the victim machine and then the victim machine running our malware would check every packet incoming and see, hey, there's a packet with a magic constant and we'll be able to use it as a reference point to extract information from our C2. So using this method, we could communicate from our C2 to our malware by abusing any legitimate port on the victim machine. Let's talk about how we would actually do this and specifically hooking the user mode network stack. So a significant amount of services on Windows rely on user mode. And how can we globally intercept this traffic? Well, networking relating to Winsock is handled by AFD.Sys, otherwise known as the ancillary function driver for Winsock. Reversing a few functions inside of MSWsock.dll revealed that a bulk of the communication was performed through ioctals. If we could somehow intercept these ioctals, we could snoop in on the data being received. So how do ERPs know where to go? So how does a kernel determine what function to call of what driver? Well, first, it'll obtain the device object associated with the file object by calling ioGetRelatedDeviceObject. Now for our purposes, this will just be retrieving the device object member of the file object structure. If the driver associated with the device supports FastIO, it'll dispatch to request using the FastIO dispatch table part of the driver object structure. If the driver does not support FastIO, then it'll allocate and fill out an IRP and forward that IRP to the driver by calling ioCallDriver. Now there's a few standard methods of detecting, of intercepting IRPs. And the first method is to replace the major functions table or array that is part of the driver object structure. Now this major function array contains pointers to dispatch functions, and the index for this array directly corresponds to the major function code, which is relevant for that dispatch function. So for example, what we can do is if we wanted to hook a certain major function code, we could replace that index in the array with a pointer to our own function, which would redirect IRPs for that major function to our driver instead. Another option is just to perform a code hook directly onto dispatch function itself. So when picking a method, there's a few questions you want to ask yourself. How many detection vectors am I exposed to? How usable is the method, both from a compatibility and stability perspective? And how expensive would it be to detect the method? Well, for hooking a driver object in memory, you're going to be exposing yourself to memory artifacts. And from a usability perspective, you're going to be quite stable mostly because driver objects are well documented. And from a detections perspective, though, it wouldn't be crazy difficult for antivirus to detect because there's only going to be a handful of driver objects out there because there's only a handful of drivers loaded at once. And all antivirus we need to do is enumerate these driver objects and check the major functions array for signs that the dispatch function is outside of the driver's bounds. For hooking a driver's dispatch function directly with a code hook, you're going to be exposing yourself to memory artifacts. But unless the function is exported, you're going to need to find it yourself. And there's a couple of ways to do this, but it's something to especially consider if the underlying driver file might change between Windows operating system versions, which AFD.SIS does. And also, not all drivers are going to be compatible with this method due to patch guard. And this is also HVCI incompatible. Now how expensive would it be to detect? Well, that can vary. So there's quite simple ways of detecting hooking that are varying inexpensive. And there's other methods that can get pretty expensive. Now one inexpensive method could be if they know that you're hooking this one function and this one driver, they could check the bytes of that driver function to check for tampering. That's pretty inexpensive. But another method is if the antivirus can enumerate every driver loaded and specifically the executable sections of that driver module to check for differences between what's on disk and what is on the actual, what's in memory. Now I wanted a method that was undocumented, stable, yet relatively expensive to detect. So what if instead of hooking the original driver object, we hooked the file object structure instead? Well, if you recall to one of our previous slides, the way that the kernel determines what device is associated with a file object is by calling io get related device object. And for our purposes, this is the device object member of the file object structure. Now what is stopping us from overriding this device object pointer inside of the file object with our own device? Well, it turns out absolutely nothing. So what we can do is we can create our own driver and device objects, patch our copy of the driver object using a common method such as replacing the major function table. And then we can replace the device object pointer inside of the file object with our own device. So let's talk about how we would do this. Well, first we need to find file objects that are to the device AFD device object. But how can we actually find these objects? Well, the Windows Entos kernel exposes this great function called ZWQuerySystemInformation, which allows you to query a lot of different information about the system. And one of the classes you can query for is called System Handle Information, which allows us to enumerate every handle opened on the system, including the process ID that the handle is associated with, and the pointer to the kernel object associated with that handle. Now if we open the AFD device ourselves, we can easily determine if a file object is for the AFD device by comparing the device object member with the previously opened AFD device. So then we can see, OK, this file object is associated with the AFD device. Now before we can override the device object member of the file object, we need to do some preparation first. And specifically, we need to create our fake objects. Fortunately, the kernel exports the function we can use to create our own objects. We can call obcrate object, passing in an IO driver object type and IO device object type respectively, and then copy over the existing object data using a function like mem copy. Now with our fake objects created, we're almost ready to set the device object of the file object. But first, we need to hook our driver object. And the way we can do this is by using the standard major function hook method, except remember, we're performing this on our own copy of the driver object, not what normal antivirus could retrieve. So it's actually relatively safe. Now to prevent race conditions between our hook function and the device object member, we need to replace it using an interlocked exchange. Now, one thing to remember here is that when you replace the file object's device object, you can have a normal, the Windows kernel, call that file object at any time. So you're going to want to use an interlocked exchange in order to make sure the device object you use in your hook is set at the same time the device object is replaced inside of the file object. So now that we've actually hooked the file object, there's not much work left. Inside of our dispatch hook, we need to check to see if the major function code being called is hooked, and if so, we need to pass the original dispatch function, the original device object, and the ERP to our hook function. Now the trick here is also if we receive the major function code cleanup, we need to replace the device object member of the file object with the original device object. This is to prevent issues during tear down. So from a detection perspective, we're going to be exposing ourselves to memory artifacts. And from a usability perspective, most of the functions we're going to be using for this is semi-documented and unlikely to change significantly. And finally, how expensive is it to detect the method? It's going to be pretty expensive, because an antivirus would need to replicate our hooking procedure. And so they would have to enumerate every file object to detect if the device object has been tampered with, which also adds some complications. So now let's talk about how to spectrook it. The rocket I wrote abuses the user mode network stack. So now using our file object hook, we can intercept ERPs to the AFD driver. This will allow us to inspect all user mode traffic and send and receive our own data over any socket. To review our existing plan, we're going to hook user mode communication, similar to tools such as Wireshark, and then we're going to use our C2 to place a special indicator inside of a packet we sent to any port on the victim machine. And this way, our C2 can then communicate with our malware through any port, any service that is exposed. Now how can we actually retrieve the contents of the packets that are received when the WSA receive function is called? Well, when you call that function, it's going to send an iOctl called iOctlAFTReceive. And specifically, it'll pass the AFDReceive info structure in the input buffer. Now this structure contains some flags, but what we really care about is the WSA buffers, which is essentially an array of arrays. And these buffers actually contain the bytes that are received from the packet, which we can use then to look into. So let's talk a little bit more about how to inspect your kit was designed. Starting with our packet structure, you can prepend any data you'd like as long as it doesn't contain the magic constant at the start of your packet. And after any prepended data you'd like, you can place a magic constant, which will act as a reference point for the malware. And after magic constant, you can add a base packet structure, which will have basic information, as you can see on the right, about the packet length and the type of operation being requested. After the base packet structure, you have an optional custom structure. Now this, again, custom structure is optional, so it might not be there in the first place. But this will vary depending on the operation being requested. And after this optional custom structure, you can have any data you'd like. Now this model allows for quite a bit of flexibility because you can first send any packets you'd like, even one that doesn't contain the magic. And then you can send a packet that has any prepended data or any appended data with the magic content inside, malicious packet inside. And then you can, after that packet, send any packet you'd like. So the key, what I was trying to get at here is that the spectrookit design allows for quite a bit of flexibility in how you structure your packets. So what happens when the spectrookit receives a packet? Well, it's pretty simple. First, it'll search the buffers for the magic constant. If the buffer contains the magic, it'll go ahead into the processing stage. And if the buffer does not contain the magic, then it'll just ignore the packet. Now before dispatching the packet, there's a few steps we need to take. First, if we have enough bytes to fill out a base packet structure, we'll try to fill out a custom structure as well, using the bytes we already received. In any case, if we do not have enough bytes for either the base packet or the optional custom structure, we'll receive the rest. And finally, we'll dispatch the packet. Now before we go any further, let's talk about the concept of packet handlers inside of the spectrookit. So the spectrookit contains this general packet handler class that exposes a virtual process packet function. Now this base packet handler class has a default constructor that receives a pointer to the current packet dispatcher instance. And the process packet function receives a pointer to the packet itself. We'll talk more about the dispatcher later. Now an example of a packet handler, included with the spectrookit, is the ping packet handler. This handler is quite simple. It is just used to determine if a port slash machine is infected. All it is is a bare bone magic and then base packet structure. And this base packet structures has its type or operation set to ping. And when the spectrookit or specifically the ping packet handler receives as a packet, all it will do is send back an empty base packet with the operation set to ping. Now once a packet is completely populated, the packet dispatcher will allocate a packet handler depending on the requested operation. Then it'll call that packet handler's process packet function. And finally, it'll free the packet handler. Now the reason the packet dispatcher model is really nice is because by passing a pointer to itself to any packet handler, any packet handler can then recursively dispatch a brand new packet. Now before we get into how this dispatching works and recursively dispatching, to give you an example of how the flow of a ping packet goes, is first a spectrookit receives that packet. It'll recognize that there's a magic constant there. It'll then fill out the base packet and the optional custom structure for that packet. Now since it's a ping packet, there's no optional custom structure. It'll only make sure to fill out the base packet. Then during the dispatching phase, it'll allocate the ping packet handler. It'll call the ping packet handler process packet function and then it'll free it. Now the process packet function of the ping packet handler will send back just a simple base packet containing the ping operation, which will indicate to the C2 that this port is infected with its spectrookit. So the best way I can explain the recursive nature of the packet dispatcher is through another example called the ZOR packet handler. Now the ZOR packet handler takes in an optional custom structure called the ZOR packet structure and this ZOR packet structure has the ZOR key and a ZOR content array. Now what happens is that if the C2 requests a certain operation, but it doesn't wanna send the exact same packet as it did previously, it can take the base packet structure it wants to send over and actually put it into the ZOR content array. Then it'll generate a random byte key which it'll use to perform a ZOR operation on every byte of the ZOR content array. Then it'll send this ZOR packet to the spectrookit. When the ZOR packet handler receives this packet, it'll use the ZOR key in order to de-offusicate the ZOR content array and then it'll take that ZOR content and call the dispatch function recursively to dispatch that brand new packet. Essentially this model of the spectrookit allows you to create infinite layers of encapsulation or layers of obfuscation allowing you to create variants even if you're requesting the same operation by applying a layer of ZOR obfuscation. So next let's talk about executing commands, a common future scene and a ton of Windows malware. And before we get into starting a process from a kernel driver, we need to understand how do you actually do that from user mode in order to see what functions we have to reimplement in the kernel? So in user mode, the first thing we need to do is create an unnamed pipe. We'll use this pipe in order to obtain the output of our process. Then we'll set the startup infrastructure and specifically the standard out and standard error handles to our names pipe. And we'll set here, we can also set window flags such as hide the window so that the victim doesn't see the process being created. Next, after the startup infrastructure is populated, we can forward this to create process, to actually create the command prompt process. And then we can wait for it to exit using wait for a single object. And finally, we can read the output of the command by simply calling read file on the unnamed pipe we created before. Now let's talk about how we would do this from kernel mode. Something important to remember is inside of a kernel driver, you don't have access to many of the same function you have access to in user mode because the kernel 32 DLL doesn't exist in kernel mode. Instead, you have to call the NT or ZW variance of functions and kernel 32 DLL inside of user mode also calls these functions but it acts as a simplified layer. So first we need to actually create our pipe and the way we can do this is by replicating what the kernel 32 DLL does itself. So what create pipe does is first it'll open the names pipe device if it hasn't done it before. Then it'll use this named pipe device and set the root directory object attribute to the handle to the names pipe device. Then it'll call NT create names pipe file to actually create that pipe. And here it'll only create a read handle to pass in as a handle strictly used for reading from the pipe. It'll then call NT open file on that read handle in order to open a write handle as well. So now that we've re-implemented the create pipe function in kernel mode we need to create the actual process and we'll use the same function ZW create user process that the kernel base uses itself. Now we're gonna need to replicate the entire process of entire process that kernel base does itself. And so we'll need to first pass in an attribute list and the only attribute we need to really pass is the PS attribute image name attribute which will specify the image file name for the new process. Next, we have to fill out an RTL process parameters structure for the process. In this structure, we need to set our window flags and the output handles to our pipes similar to what we did with the user mode startup info structure. But we also need to specify the current directory, the command line arguments, the process image path and the default desktop name. From there, all it takes is a call to ZW create user process in order to start the command prompt process. Once the process has exited, we can easily read the output of the command by calling ZW read file on the read handle we obtain for the pipe. Now let's talk about what you can do to hide your root kit. So introduction to many filters. So many filters driver allow you to attach to volumes and intercept certain file operations. This is performed by registering your mini filter with the filter manager driver. So this reference from Microsoft documentation shows an example of user requesting file IO. First, the IO manager forwards the request to the file system. While this request is being forwarded, the filter manager driver intercepts this request and calls the registered mini filters that I've registered with it in order to example, modify the request being performed before being sent to the file system itself. So many filters essentially allow you to edit file operations before they actually happen. And mini filters can be used to mask the presence of our root kit on the file system. For example, a mini filter can redirect all access to a certain file to another file. We can use this to redirect access to our root kit file to another legitimate driver file. Now, again, going back to picking a method, there's a few questions you wanna ask yourself. How many detection vectors am I exposed to? How usable is the method from both a stability and compatibility perspective and how expensive would it be to detect the method? Well, the easiest way to abuse the functionality of a mini filter is to follow the documented procedure and just become a mini filter yourself. So here are the requirements for the function FLT register filter. You're gonna need to create a instances registry key under your service key. And under that instances key, you need to create a instance name key, which can be whatever name you'd like. This is going to be the name for your filter instance. Then under the instances key, you need to add a default instance value, which is a string and is set to the instance name you created in step two. Then under your instance name key, you need to add the altitude and flags values. And the altitude is pretty much what the ordering is that mini filters get called. The higher your altitude is, the more first in line you are for your mini filter to get called. So how many detection vectors are you exposed to? You're gonna be exposing yourself to registry and memory artifacts. How usable is the method? Well, you don't really have concerns from a stability or usability perspective because this is just how legitimate drivers register as a mini filter. But at the same time, it's pretty easy to detect this method because besides the registry artifacts and antivirus could easily enumerate registered filters and their instances through documented functions such as FLT enumerate filters or FLT enumerate instances. So another option is to just a hook in existing mini filter. And there's a couple of routes you can take. First, you can just do a basic code hook on an existing mini filters function. You can override the FLT registration structure which is passed into FLT register filter before the victim driver calls it. Or you can edit a existing mini filter instance through DCOM to replace the function that gets called with your own. Now, one of the easiest ways to intercept callbacks to an existing mini filter is just, again, a code hook. Now, this can be as simple as a jump hook, but it comes with quite a few drawbacks similar to what we saw when we were talking about intercepting herbs. You're gonna be exposing yourself to memory artifacts and unless the function is exported, you're gonna need to find yourself which can be difficult if the underlying image changes between operating system versions. Not all drivers are compatible with this method due to patch card. And finally, this is also HVCI incompatible. Now, similar to what we saw with hooking herbs, it's gonna be potentially inexpensive because if a antivirus knows you're hooking a specific function in a driver, it can easily check the bytes of that driver function to determine if there's any patching performed. Now, another way of obtaining access to a mini filter, existing mini filters through Decom. Now, what you can do is you can enumerate filters and instances, as I mentioned before, by calling the functions flt enumerate filters and flt enumerate instances. Now, the function that gets called for a certain filtered operation, offer a mini filter, is specified in an array called callback nodes, which is part of the filter instance structure, which you can find by using the previously mentioned functions. Now, the callback nodes array index is directly associated with the operation that is being filtered. If you find a callback node array entry for the operation you want to target, you can replace the pre-operation and post-operation pointers to your own function. Now, one note here is that you will also want to replace the flt registration structure that is part of the flt filter structure. Now, this is because the flt registration structure will contain the original function pointer. And what an anti-virus could then do is check to see if there's any discrepancies between the callback nodes array and the flt registration structure part of the filter instance, sorry, the filter itself. Now, from a detection vectors perspective, you're gonna be exposing yourself to primarily memory artifacts. And from a stability perspective, the only concern you should have is that the flt instance structure itself is undocumented. Now, finding an flt instance is easy through documented functions, but the flt instance structure itself may change across operating system versions. Now, how expensive would it be to detect the method? Well, it would be inexpensive because all an anti-virus would need to do is occasionally enumerate registered filters and their instances into callback nodes array. So, let's say you wanna protect a certain file, such as our Rukit file. What's an example of redirecting access to it? Well, first, I'm assuming that you hooked a mini-filters pre-create operation callback. And side of that pre-create operation callback, you can use flt get file name information in order to get a normalized path to a file being accessed. Then, if the path contains a protected file, such as the path to our Rukit, you can replace the file path being accessed by calling IO replace file object name on that file object. Then, you need to set the status being returned to status repars and return flt pre-operation complete so that the file system will then redirect access to the legitimate driver file. So, for example, you could use this to change file access to your Rukit file to another legitimate driver that might be in the same directory so that when a user program inspects it, it'll actually be looking at the contents of a completely different file. Okay, wrapping up, I'd like to give thanks to Alex Ionescu, a longtime mentor who's very experienced with Windows Internals. The ReactOS is an amazing reference for Windows Internals, especially for undocumented functions and structures. And thanks to Nemanja Malazmajic and Vlad Ionescu for helping me review this presentation. Thanks for sticking around. You'll be able to ask questions in my DEFCON Q&A session, which will be later today. Make sure to follow me on Twitter and look at my blog. And you can check out the spec to Rukit, assuming everything went well, at the URL below. Thanks for coming out to my talk again. Have a great one.