 Thank you everybody for coming to game changing advances and windows shellcode analysis So shellcode it's always out there lurking somewhere causing problems and problem Until very very recently is there just has not been adequate tooling To address shellcode. Well, we have aimed and succeeded in Changing that and we're thrilled to show you how So a little bit about ourselves. My name is dr. Bramble prison dying I am a professor the former director of the Vrona lab and I've created quite a number of different tools the job rocket shell wasp Co-creator of Sharon and also Rob rocket, which will be presenting on Sunday I do have a PhD in cyber operations from Dakota State, which is a highly technical degree And now we will let one of my former students Jake introduce himself Hey everyone, I'm Jake. I'm a recent college grad now working as a reverse engineer at the Johns Hopkins University applied physics laboratory I worked on the Sharon project as a graduate researcher in dr. Brisenzein's barona lab, and I'm still an active contributor to the project now Here's max. Oh Hello, everybody. My name is Max Kirsten. I go by the nickname of Libra I work for trellis in the advanced research center and I like to write blogs about the research I do I'm a malware analyst and reverse engineer by trade. I do that both as a personal hobby and for work And in here I'm representing trellis and presenting the jitter script that relates to this So first let's talk about why shellcodes a topic of interest to us and why we decided to develop the Sharon project in the first Place So we see shellcodes quite frequently in the world of malware and exploitation The common objective of malware is to remain hidden and obfuscated So shellcode makes a great tool for these objectives. It's an inherent obscure nature makes it difficult to statically analyze An example of such a challenge is Windows API identification in shellcode When examining a normal piece of malware like a PE file executable, you'd expect to see labeled Windows API calls But this isn't a luxury have when we're dealing with shellcode shellcode has to manually go in and resolve API addresses manually So what you'll wind up with is something like a call EDI instead of a nice clean call WinExec So this call EDI isn't much for us to go off of so we have to go in and perform a bunch of static or dynamic analysis And then this challenge doesn't scale well either as we had if there's many shellcode API is called then this is going to take a ton of time for us to go in and find out So issues like this and some of the other headaches when dealing with shellcode are why we decided there's a need for more tooling to solve these challenges Which is where Sharon comes in our game-changing solution to shellcode analysis So what is Sharon Sharon aims to be the complete all-in-one comprehensive solution to Windows shellcode analysis It's loaded with a bunch of features that we'll cover in this talk We have a shellcode emulator that's built on unicorn. This helps us deal with some of those Windows API identification problems I was mentioning earlier and we have a few other unique features such as complete code coverage that helps us unravel the unexplored paths of shellcode We also include a fully accurate disassembler that's labeled and enhanced by our emulation results Then we have a few other features that are helpful for us on unraveling deceptive shellcode encoding and obfuscation This tool is for Windows shellcode only but we include support for both 64-bit and 32-bit shellcodes And now we're revealing for the first time at this presentation a brand-new Ghidor plugin that was developed by my Max at Trellix All right, so let's talk about Sharon's emulator Sharon's emulator is built with unicorn CPU emulation framework It's able to hook and log over 20,000 Windows APIs across over 60 DLLs And it wouldn't be a shellcode emulator unless we also supported syscalls as well We baked in a support about for about 99% of those Sharon does analysis on raw.bin files as well as the hex byte representation of shellcode One of the highlights of this tool is its use for high-level indicator of compromised analysis Running a shellcode in emulation provides a detailed summary of what the shellcode is doing You'll be able to see things like what Windows APIs were used, what file paths were accessed, or what URLs were reached out to Also great is that all of the knowledge we gather from emulation is leveraged elsewhere in the Sharon tool So with the high-level details out of the way I want to dig a step deeper and talk about some of the implementation details of our emulator So unicorn's bare bones framework that kind of just gives us the essentials needed to emulate code The shellcode wouldn't get very far into emulation if you just tried to run with vanilla unicorn There's a bunch of stuff we had to do to set up to make sure our emulation looked and behaved like an actual Windows process So an example of this is pebb walking The pebb is an internal Windows structure that is used to resolve DLL address of base addresses But as the name applies this technique is heavily reliant on the pebb structure So structures like the pebb had to be implemented in high fidelity for our code to emulate a code to get anywhere So now their shellcode is able to walk the pebb and find a DLL address The next step is to parse the DLL and loop through the tables until we find the DLL we're looking API we're looking for So in our emulation, what's the shellcode actually parsing as a DLL? The answer is actual DLLs We load the DLL straight into memory from disk And apply a special formula to get them to execution size as we're loading them into disk We create references that to them in a dictionary So we'll be able to look them up at runtime once an API is called Which gets us into API emulation so now our emulation is able to locate Windows API What happens once the API is actually called our emulator will log the invoked API along with all the parameters that were passed into it Once emulation finishes this information is printed out to the screen for the user to see This information is very useful as it can give you a really good idea of what the shellcode is doing In the example on the right here, we have a metasploit shellcode And it's really obvious to see here that this is a bind shell code that's opening a bind interface or port 8721 So now what goes underneath the hood during an emulated API call? For some APIs, it's not enough to just simply log the call then return an arbitrary value Take virtual alloc for example the virtual alloc API goes and creates a usable region of memory But the shellcodes probably expecting to actually do something with that memory probably read write and or execute If the emulation doesn't actually go out and create that usable region of memory the shellcode will probably fail to emulate properly So for essential functions like this we wrote out handwritten hooks that will go out and perform whatever steps necessary to keep the high-fidelity representation of shellcode emulation alive So for our virtual alloc example here our API hook will use unicorns mem map to actually go out and create more memory space for us Currently we have support for over 600 of these API hooks Now doing manual written hooks for this for over 20,000 APIs is a big undertaking So we've implemented as many as we could for all the security relevant APIs that shellcode would likely want to play with We built dictionaries that will look up the required parameters for each API as well as their types Also determined through this lookup dictionary is an appropriate default return value based on the return type for the API Default return values can also be specified through config options So now turning over to dr. Brison dine for some more information on how we emulate syscalls All right. Thank you very much Jake. So windows syscalls are something that until very recently Had not been supported whatsoever in terms of these types of What we're doing so windows syscalls Have Become very popular very trendy In the last few years, but up until about 2018 that had not been the case previously The virtually the only Windows syscall had in terms of shellcode had been a hunters a Hunters were where you can go and search the process memory for something like a tag such as woot woot Once you find that tag then you can redirect control fall to that particular location So we had assumed that naturally that there would be many other Windows syscalls out there being used in shellcode But the reality was that was simply not the case Instead it was just simply egg hunters now as part of this research. We actually received a $300,000 NSA research grant to develop this and so we wanted to make sure that we were able to successfully emulate Windows syscalls. So since that really hadn't been done we went and developed our own techniques and Tooling to help facilitate that that was a previous defcon talk, which we also expanded Very much so it hack in the box 2023 So how exactly do we? emulate these Windows syscalls Now in terms of something like stat cleanup when we have Windows API is it is necessary to Internally do the stat cleanup, but Windows syscalls The opposite is true. So the shellcode author is responsible for that cleanup. So we do not do that ourselves And for those of you who are familiar with Windows syscalls the way in which we invoke them as we will provide a Particular value in the EAX register such as hex 18 for anti-allocate virtual memory And for every different OS build of Windows these SSNs will change and So in order to emulate them we need to Specify the particular windows OS that we are wanting to do to make sure we get accurate results In this example here, we see that we have anti-allocate of virtual memory and The particular OS build is identified With that we will turn it over to Jake So in analyzing charms output we try to make things as easy on the user as possible to understand the various parameters and values So for example, it may not be immediately obvious what a value of 0x40 means for an FL protect parameter But page execute read-write is a lot more useful Sure, we'll also try to parse structures and strings whenever necessary. So here in the bottom screenshot We have a call to WinExec with calc.exe You'll see that calc.exe is printed instead of this the address that points to it Another thing we do to make parsing the output easier is include some artifacts that are extracted from regular expressions Under the artifacts section of output you can find IOC information such as commands run files and path references and URL addresses Here we have another shellcode example And this one you can see that a bunch of commands were runed one to stop Windows Defender a couple other malicious actions And then we can see a Chrome updates dot back artifacts interacted with as well So one optional feature we also include is the ability to pull live artifacts from over the web Suppose the malicious shellcodes reaches out over HTTP to grab some sort of second stage or other external dependency You may want to actually go and grab this file as well This option is there for that to support that if successful an MD5 hash will be automatically be performed on this download as well Here's an example of that in action here We have a shellcode that pulls down an executable file called sum payload dot exe Sharon will actually issue a get request to pull down that file then load it into memory if necessary if it's interactively later You can also see that an MD5 hash of the file is included as part of the output Another nice part about sherms emulation output is its ability to apply structure to parameters in APIs For parameters that pass entire structures instead of just standalone values Sharon will enumerate through and label all members of the structure and here's an example of that So instead of just a pointer value for these two structures You'll see an expanded list of all the members of the structure included and not only does it enumerate and label structures But it also labels nested structures while it's recursive So for this LP time zone information structure, you'll see the standard date and daylight date nested structures as well So there's also instances where APIs will use unions, which is essentially a parameter that shares the same memory space Sharon's verbose enough to delineate this level of information in its output So the windows registry is another area of interest for shellcode, which is why we also developed a registry manager for our emulation The sherm registry manager helps us track and stub out actions for our emulated registry Registry manager maps hex values to actual paths So when registry addresses are printed as output, you'll see the full registry path instead of just some random address We have a bunch of custom hooks written for the registry related APIs and try to simulate some sort of success action whenever possible This includes having our sherm registry manager update various things in its faked-out registry that the shellcode can interact with For output related to a registry We also have some special parking mechanisms based on the MITRE framework to pick out any sort of registry actions that are security relevant to us Here we have an example of one that does registry persistence for an executable file Again, like the rest of our output, it's formatted in such a way that it's easy to parse for the end user So for all references to H key instead of just hex values You'll actually see the full registry keys and paths and data and all of these registry related things don't just apply to APIs There's also support included for syscalls as well This is an example of the NT create syscall for current version slash run to set up persistence Again, this is immediately obvious due to the way that the output is labeled So one final part of sherm's output is timeless debugging information This is essentially just a log of every CPU instruction that we emulated along with register values This is useful if you want to go back and review any of the instructions that were executed and evaluate the registers at that state There's also an option to view a selected number of bytes from the stack as well So this is what that timeless debugging log looks like here You can see all of the instructions and all of the registry values before and after each execution So when developing this tool, we conducted a lot of researches from public sandboxes and shellcode repositories We found that 64-bit shellcode is a lot less common than 32-bit But rest assured we also have support for 64-bit shellcode emulation as well So we have different but similar DLLs and internal Windows structures for this So yeah, anything you can do in 32-bit you can do in 64-bit for our tool All right now turning back over to Dr. Brisen Dine for complete code coverage Thank you for that. So this is one of the most important game-changing aspects of all of this Now some of you may be familiar with the idea of code coverage But this is at a very different twist on that concept. So with complete code coverage We are going to all but guarantee that virtually every single code path is going to be executed How do we achieve this functionality? Well, we instantiate it at the assembly level and it works very well so Whenever we encounter an opportunity where we could go left, we could go right. We have a jump table Whatever the case may be we can capture metadata at that particular location including a snapshot of the stack And we will maintain a list of all of these locations Once the shellcode is about to terminate If there are any unvisited code paths, then the shellcode will simply restart and because we were able to save the CPU register state Alongside a snapshot of the stack we can restore that information By doing that we were able to capture API's and Log them that we otherwise would not have been able to to do So to give an example of why that is important so this could be considered a form of Dynamic analysis so we could do dynamic analysis with a debugger. We possibly could tweak things a little bit to maybe cause it to Just go on this direction that it wouldn't otherwise or we could detonate something in a piece of or detonate a shellcode in a C-style harness and a sandbox and just see what happens, but with complete code coverage We can guarantee that we will see all of the possible code paths not simply those that would have been For that particular instance in the sandbox So to the left there you can see a shellcode that was Written by a student of mine and there are only two API's that are identified now one of these is get computer name a So it's looking for a very specific computer name And if it's able to to get that computer name then It's going to do reg Set key value, and it's going to establish terminal server So unless we have that particular computer name We should never be able to see that but with complete code coverage. We are able to capture not only the API but all of the Correct Parameters that would have been there otherwise So self-modifying code is also an important aspect of shellcode so a lot of times Our shellcode may be encoded and the shellcode will simply decode itself in memory I'll have a decoder stub to help facilitate that process and we can actually identify that through the use of fuzzy hashing in particular SS deep and so if we are able to identify that then we will simply take the Decoded form and then that decoded form is what we will then perform our analysis on and then we will also send that to the disassembler so at the bottom there you can see the Sheram has identified successfully That this is indeed self-modifying code And this is an example of a shellcode that is actually encoded But lo and behold we are seeing the actual API's that are being called. We are seeing the parameters So really this is very much a game-changer if you're dealing with encoded shellcode You know, maybe somebody gives you a piece of shellcode and you're not quite sure What that is now your options could be I'm going to take that I'm going to put it in a C style harness I'm going to debug it now some parts of shellcode may be very Repetitive so there could be loops that may occur hundreds or even thousands of times And also that was the fact that it's decoding itself in memory. It can be very very tedious But with Sheram we can instantly see what is going on inside of that shellcode So maybe a friend gave you a piece of shellcode. You're going to do that for some particular type of Exploitation, but they snuck a little something extra in there. Well, you could easily identify that And so this is the exact same shellcode in Ida Pro and it's absolutely correct But what we're seeing is a series of encrypted bytes So it's pretty much absolutely meaningless for the human analyst who is trying to go through this and figure out What's going on accurate, but just not very helpful, unfortunately So the decoder stub is going to be a part of the encoded shellcode They will decode itself in memory and they could perform one or even multiple Or many operations to decode it bite by bite To the left you can see Ida Pro and to the right you have Sheram which provides a little bit more information So at this time we're going to take a brief demo Okay, so we're in Sheram right now. Let's go ahead and So we are in Sheram right now. Let's go ahead and Emulate this so it'll take just a small amount of time to to emulate it right now as a breaking out of some very long loops We got some output. We have been able to identify a number of APIs as well as the parameters and Various other artifacts have been identified. Sheram identifies itself modifying code. So that's very useful And then now we've generated disassembly and so we can see that printed to the screen So that can be very useful. We have our APIs identified. We have our data down below at the bottom there So another important aspect of Sheram is it's disassembler Now when I started this research, I was severely disappointed by the quality of the disassembly Provided by tools like Ida Pro or Ghidra. They were to put it simply very much inadequate Sometimes there'd be as much as 60 or 70 percent only that was correct. So meaning 40 percent that was wrong And the root cause of a lot of this was simply misclassification of data is Instructions or Alternatively a cascading effect of that is some instructions were would start disassembling it in an incorrect offset Simple things like strings would be just disassembled is instructions So in order to address this I came up with some static analysis methods to And to call that disassembly analysis engine So in x86 you can have instructions and data that are freely intermixed and also With shell code you can play fast and loose with certain conventions So in order to help deal with this Sheram will actually utilize multiple analysis phases in order to Try to achieve more accurate disassembly of shell code and if we're able to accurately Distinguish between instructions and data then We should get vastly superior disassembly hypothetically even perfect disassembly So what Sheram will do is it will maintain a complex metadata about each and every bite of shell code Now Sheram does work exclusively with With shell code and so our approach here has been very Empirical very much based on experimentation with actual true shell code So I have a large collection of shell code with Where I actually possessed the the source code and the process was to try to scrutinize it very closely And if I notice in one particular location that something was incorrect I would try to identify the root cause as to why that was the case and if I could identify the root cause then I could try to then Remediate it and I would remediate it not for that particular that one instance, but for all other similar types The end result was Very much improved disassembly now it could never be perfect, but it was Markedly better than what would be produced by by Ida Pro or by Ghidra. So some of these can be a little Bit much to discuss there is a white paper that does discuss them in more detail If you wish but just very briefly a few of the the things that we do to Help with this if we can find repeating Data bytes then we can label those as data for instance long Repeating instances of zeros or f's we also might check for valid jump Destinations of a shell code is trying to jump to offset 3000 and it's only 200 bytes long. Well, guess what offset 3000 doesn't exist. So That's incorrect. So we try to then address that can locate hidden calls in jumps and In this particular case, we're looking for the particular Opcodes or bytes that produce these and then we make sure that they are formed. They form the correct disassembly if there is a Valid branching destination that exists Something like strings we can easily identify those with Unicode or ASCII so you can see one of the most important aspects of this is we are able to identify functions in our shell code and This is really the only way in which you can do that. So if you were to open this up in Ida Pro Or Ghidra, you would not see any of these functions identified You would simply see call eax call edx And some type of variation to that effect Now share them we are able to identify More than 20,000 of these windows apis as well as virtually all Windows syscalls We also are able to identify disassembly annotations things such as get PC To self-locate in memory push rats heavens gate and those can be labeled for us Peb identification. So walking the process environment block. That's something that each and every shell code needs to do It's one of the first steps of dynamically resolving runtime API addresses And so we will call out all those particular peb features API tables so one common thing the shell code does is it's able to Identify a particular Location is being a place where a shell code or a windows API pointer will be written So for instance delete file a that may always be at that particular offset. So you might have call edx plus some offset to access that So let's look at Ida Pro versus share them in terms of disassembly Ida Pro It cannot determine apis how tragic you pay thousands of dollars and It just doesn't have that information, but we look inside of share them and no cost, but you're able to have the apis identified Strings Thank you Now strings are pretty obvious ask you Unico. We will identify those and also things like push stack strings We can have those identified very nicely with our comments now one thing that we do do is We utilize Emulation data as a way to enhance our disassembling so Now the way in which we do this is very Unique so if we were able to start emulating a shell code at a particular offset Then we know definitively at this location We have this instruction in its size two bytes so we will preserve that information and When we go and produce the disassembly then to put it simply That would override what would be determined statically and for things like data the data will also Be identified with memory reads and writes and so that information can be clearly labeled for us now Self-modifying code you might say well gee golly Self-modifying code each byte is going to be both data and instructions. How can we cope with that? Well, what you actually could do is you could say Okay, we recognize that data everything is going to be data At least once so in this case we will not classify it as data unless it's accessed read to or written to more than once Distinguishing between data and instructions so at the top there. That's the actual disassembly of Instructions we have called EAX. We're calling virtual alak. We're creating a region of memory and The page execute read write is labeled for us and then down below everything there is data now if this was in Ghidra or Ida Pro it would just be simply misclassified is is instructions But instead we have our API pointers that are labeled We have strings that are labeled and then other d words which we can surmise maybe things like Check sums which could be helped help us to resolve those API is So the way in which we integrate our emulation data Now we take three different forms we have the starting form so something is encoded We'll take a snapshot of that and then we do it at a byte-by-byte basis every time a particular set of bytes are executed is is Is instructions we will take a snapshot of that and then finally We'll take a snapshot of the fine in the final form of the shell code after it has decoded itself in memory and Then we will merge these together. It's a very novel way of merging it So if I was a very clever shellcode author and I wanted to try to conceal what I was doing I might have the shell code Re-encoded self after it decoded itself now that doesn't typically happen but it could and if somebody were trying to Protect the intellectual property of their shellcode by by doing that We would still be able to see what that was because we are going to prioritize the executed form of the shellcode when we Merge these together next we then prioritize the the final form and then Finally, we prioritize the the starting form So there is the the result of all that madness We can see a decoded shellcode with the API is very clearly labeled The parameters are immediately easy for us to see And it's not just hex values, but the the human readable equivalence that we can see things like Well in this particular case we can see that it's downloading an evil dot hta and then doing WinExec on that in Terms of reporting Sharon has a lot of verbals reporting of all kinds of very useful information and It has many different types of outputs that it provides for us ASCII representation of bytes raw binary text format JSON So if you want to run share them headless And maybe integrate that into some kind of web server or web service then absolutely you could do so I don't also generate a C style a tester So if you want to then compile that and run that in a debugger, it's just very easy easy to do All right, so at this time we will Hand it off to let me adjust this so Since I'm a major user of jidra in my day-to-day work during malware analysis, and I also encounter shellcode I like the Sharon framework. I figured I would combine the both best of both worlds now. It's a kind of an open secret. I Unironically like Java to write in it's kind of an open secret that the rest of world kind of hates Java to write in So I consider is a win-win where I wrote the script in Java for everybody to use so you don't have to be can still use it now for those unaware jidra is a framework for reverse engineering published by the NSA its open source and You can also analyze shellcode with this Now You can extend this framework after you've loaded your shellcode and analyzed it you can go to the display script manager Which is the green play button in the top bar, and it will open the Script manager for you now you can go to the hamburger menu Once you open that you will see the folders where your scripts are located, and you can select any location Alternatively your home directory will contain a folder called jidra scripts. You can put the script in there And afterwards you can simply double-click it by running Now once you do this it will call Sherman in the background, and it will allow you to see the script in console Which is located by default? jidra's bottom, and it will see where comments are placed. You'll see the hexadecimal address afterwards and given that the disassembly view and decompiler view from jidra are linked and Double-clicking on these addresses will jump towards it, and we'll highlight that in your overview now the Comments in here allow you to have a quick and easy overview as to what was added by the script and navigate towards interesting parts of the shellcode Now it's a free flowchart as to how does this work? You run the script which runs sherm in the background sherm can run headless the script for jidra itself is written in Situated it requires no graphical user interface elements, so you can also run the script headless in jidra Which then runs sherm headless It loads the output once sherm is done in its execution Which is the json default this and json file, and it will load that json converted into plain java objects It will iterate over all all of those At comments wherever possible based on the output it will update the disassembly We just saw the explanation as to why Sherm is able to catch more than for example jidra, and it will also assign or reassign If there was already a data type so it's preferring sherms output over jidra's own analysis In any time along the scripts execution it will log those changes in the script console at the bottom And it will finish the execution Now how does it look like so I have a overview here of a piece of shellcode that I received from Brembo and This is the default way that jidra shows it to you now if you were to run the script Then you will get information now based on the emulation You suddenly see that specific offsets or memory locations referred to in this case for example on line 25 6 and 7 You see load library a but not only do you know this function and based on MSDN you know the parameters and the Use case of sad function you also know the value of the argument So not only the type of argument, but the actual value so in this case We know the url mon is loaded by the shellcode and we can see that get proc address is used to get url download to file a as for data types jidra tends to try and will Guestimate whatever you have in front of you in this case you have Unknown bytes these are well. They're nothing Constructible however during emulation and you will see That the API pointers checksums and strings reside here now the strings themselves might be fine by found by jidra initially but maybe the Ending no bite is or is not included or used by the shellcode Whereas jidra generally assumes it is and then bite might be useful for the next part Maybe it's alignment or padding afterwards kind of depends on what you're looking for So this gives you an easy overview As to what you're looking for if you see any checksums or magic values You might try and find those online to get more information and if you see references in the code to war Towards that data. You also know the size of the data rather than just being one bite That's for the disassembly It wasn't really clear in screenshots if I did more than just one instruction So I manually patched the first line offset 5 pop ex once you run the script this changes to pop edx Which was originally but this shows that the emulation from jidra and the changes in there are reworked in whatever you see in here Now as for a demo we can see here that we started offset 5 Now you can also see it starts at offset 5 Otherwise, it's just me and you can see that it has pop ex Once we go down we can see that function calls for example Are to a memory address. We don't know what we're looking at necessarily and if we go all the way down to the bottom We can see the bytes that are not classified and we can see the strings that are found by jidra Opening up the script manager and running it. We'll see in the bottom the log Showing us what we can use and what has been added So this allows us to in one single overview see what we need. We can see that the pop ex has changed into pop edx and Once we go down to the function call we again see the annotation in both the disassembly listing and in the decompiler view Where we have both the API but also the value of the arguments and we can see that in both views Moving down we see the checksums Located at the bottom This also marks the end of the presentation you can download and try sharing on the address on The top and you can find the jidra address on the tralex GitHub located at the bottom There was an NSA grant were used for the research for Sharon, which is listed at the bottom. Thank you very much for your attention